TensorFlow has deprecated YAML due to code execution flaws, it is recommended to use JSON instead

According to foreign media reports, due to security flaws, TensorFlow, an open source machine learning and artificial intelligence project provided by Google, has given up support for YAML. In the latest version, Google has removed YAML to address the execution of untrusted deserialization vulnerabilities.

It is reported that the vulnerability, numbered CVE-2021-37678, was submitted to Google by researcher Arjun Shibu, and its severity is relatively high, with a CVSS score of 9.3.

YAML is a more readable format for representing data serialization. Researchers found that when the TensorFlow code loads the yaml.unsafe_load() function, an attacker can use this vulnerability to execute arbitrary code on the Keras model provided by the application deserializing the yaml format.

Generally, deserialization vulnerabilities are likely to occur when applications read malformed or malicious data from untrue sources. At this time, deserialization vulnerabilities in TensorFlow may cause DoS crashes or denial of service. Worse, this vulnerability can even execute arbitrary code.

The above is why the CVSS score of the "yaml.unsafe_load()" function vulnerability is as high as 9.3 points (out of 10 points), and it is "notorious".

As we all know, the "unsafe_load" function deserializes YAML data in a very loose way, and can parse all tags, including those that are known to be untrusted.

This means that in an ideal situation, unsafe_load should only be called on trusted source inputs without any malicious content. Otherwise, the attacker can use the deserialization mechanism to inject malicious payloads into YAML data that has not been serialized to execute the code they want to execute.

A PoC example of vulnerability consulting on the concept of vulnerabilities confirms this:

Import model payload from tensorflow.keras =''' !!python/object/new:type args: ['z', !!python/tuple [], {'extend': !!python/name:exec} ] listitems: "__import__('os').system('cat /etc/passwd')"''' models.model_from_yaml(payload)

Because of this, after the researchers notified Google of the vulnerability, the maintainers of TensorFlow decided to completely abandon the use of YAML and use JSON deserialization instead.

"Given the YAML format support requires a lot of work, and now we have removed it," in the same project consulting service personnel, he said, "caused RuntimeError of Model.to_yaml（） and keras.models.Model_from_yaml have been replaced because they could be misused and lead to arbitrary code execution." At the same time, the maintainer explained the release notes related to the repair.

At the same time, the maintainer also gave examples of other vulnerabilities and fixes caused by YAML. The maintainer recommends that developers use JSON deserialization instead of YAML, or use a better alternative such as H5 serialization.

It is worth noting that TensorFlow is not the only project that uses YAML insecure loading functions. Searching on Github can find that a large number of Python projects use this unsafe function.

In view of potential security issues, the researchers suggest that these projects should solve this problem in time, and developers who use these projects should also pay attention to safety.

It is understood that TensorFlow is expected to solve the vulnerability problem in version 2.6.0, which is to remove YAML support. At that time, the earlier versions of 2.5.1, 2.4.3 and 2.3.4 will also be repaired, and developers who use the project should also upgrade to the latest version in time.

TensorFlow has deprecated YAML due to code execution flaws, it is recommended to use JSON instead

MissD

引用和评论

第十六届中国大数据技术大会五大分论坛顺利举办！

警惕！AI组件ComfyUI易被黑产盯上

MCP 协议为何不如你想象的安全？从技术专家视角解读

在线考试答题系统（Web+H5+小程序）开发方案与实现附源代码

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

一个PHPer的偷懒哲学：如何用两套模板跳过重复造轮子