According to foreign media reports, due to security flaws, TensorFlow, an open source machine learning and artificial intelligence project provided by Google, has given up support for YAML. In the latest version, Google has removed YAML to address the execution of untrusted deserialization vulnerabilities.
It is reported that the vulnerability, numbered CVE-2021-37678, was submitted to Google by researcher Arjun Shibu, and its severity is relatively high, with a CVSS score of 9.3.
YAML is a more readable format for representing data serialization. Researchers found that when the TensorFlow code loads the yaml.unsafe_load() function, an attacker can use this vulnerability to execute arbitrary code on the Keras model provided by the application deserializing the yaml format.
Generally, deserialization vulnerabilities are likely to occur when applications read malformed or malicious data from untrue sources. At this time, deserialization vulnerabilities in TensorFlow may cause DoS crashes or denial of service. Worse, this vulnerability can even execute arbitrary code.
The above is why the CVSS score of the "yaml.unsafe_load()" function vulnerability is as high as 9.3 points (out of 10 points), and it is "notorious".
As we all know, the "unsafe_load" function deserializes YAML data in a very loose way, and can parse all tags, including those that are known to be untrusted.
This means that in an ideal situation, unsafe_load should only be called on trusted source inputs without any malicious content. Otherwise, the attacker can use the deserialization mechanism to inject malicious payloads into YAML data that has not been serialized to execute the code they want to execute.
A PoC example of vulnerability consulting on the concept of vulnerabilities confirms this:
Import model payload from tensorflow.keras =''' !!python/object/new:type args: ['z', !!python/tuple [], {'extend': !!python/name:exec} ] listitems: "__import__('os').system('cat /etc/passwd')"''' models.model_from_yaml(payload)
Because of this, after the researchers notified Google of the vulnerability, the maintainers of TensorFlow decided to completely abandon the use of YAML and use JSON deserialization instead.
"Given the YAML format support requires a lot of work, and now we have removed it," in the same project consulting service personnel, he said, "caused RuntimeError
of Model.to_yaml()
and keras.models.Model_from_yaml
have been replaced because they could be misused and lead to arbitrary code execution." At the same time, the maintainer explained the release notes related to the repair.
At the same time, the maintainer also gave examples of other vulnerabilities and fixes caused by YAML. The maintainer recommends that developers use JSON deserialization instead of YAML, or use a better alternative such as H5 serialization.
It is worth noting that TensorFlow is not the only project that uses YAML insecure loading functions. Searching on Github can find that a large number of Python projects use this unsafe function.
In view of potential security issues, the researchers suggest that these projects should solve this problem in time, and developers who use these projects should also pay attention to safety.
It is understood that TensorFlow is expected to solve the vulnerability problem in version 2.6.0, which is to remove YAML support. At that time, the earlier versions of 2.5.1, 2.4.3 and 2.3.4 will also be repaired, and developers who use the project should also upgrade to the latest version in time.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。