Content source: Huawei Developer Conference 2021 HMS Core 6 AI Technology Forum, the keynote speech "MindSpore Federal Learning Framework Solves the Problem of Data Island under Privacy Compliance".
Speaker: Huawei MindSpore Federal Learning Engineer
Everyone knows that the development of artificial intelligence is inseparable from extensive data support. Data is the foundation and the key. However, small and medium-sized, fragmented, and large-scale, high-quality data are difficult to obtain in the industry, which involves many aspects of engineering, supervision, and privacy compliance. This has also led to the challenge of data silos in the artificial intelligence industry. For example, it is more and more difficult for enterprises to obtain user data, it is difficult for different departments within the enterprise to cooperate, it is difficult for enterprises in the same industry to share data, and it is difficult for cross-industry data to exert value.
Federated learning: break data islands and build a new generation of technology ecology
Facing data islands, how should artificial intelligence develop? Federated learning is an effective solution that can ensure data privacy compliance and model performance.
Federated learning was first proposed by Google in 2016. On the one hand, it is a machine learning framework that can effectively help multiple organizations to perform data usage and machine learning modeling under the requirements of user privacy protection, data security, and government regulations. On the other hand, federated learning is also a business model, more like a strategy of "common prosperity", which can drive cross-field enterprise-level data cooperation and give birth to new formats and models based on joint modeling.
Generally, the industry divides federated learning into three types: horizontal federated learning, vertical federated learning and federated transfer learning . Horizontal federated learning is suitable for scenarios where there is less user overlap and more data features. For example, Google was first applied to the joint modeling of smart phone input methods; vertical federated learning is suitable for scenarios where there is more user overlap and less data feature overlap. For example, some industries with strong business verticality; for scenarios where user overlap and data feature overlap are relatively small, we can use federated migration learning to model.
So, what challenges will federated learning generally encounter when it implements enterprise-level applications?
first of all privacy security . At present, there are still many security risks in federated learning, such as poisoning attacks, confrontation attacks, and privacy leaks.
followed by model accuracy . Issues such as unbalanced samples and lack of data labels in the security business will lead to unsatisfactory results of federated aggregation. In addition, applications in industries such as autonomous driving and medical treatment also place higher requirements on model accuracy.
again the communication efficiency . When faced with the deployment of tens of millions of large-scale heterogeneous terminals, it is necessary to deal with complex scenarios such as network instability and sudden load changes. The upload of a large number of local model updates will cause a huge bandwidth burden on the communication network. Although the compression algorithm can significantly reduce the size of communication data, it will seriously affect the accuracy of the model. The balance between communication efficiency and model accuracy has become a major challenge.
MindSpore federated learning framework: device-cloud collaboration, unified architecture for all scenarios
In June 2021, the Federal Learning Framework will be open sourced. The MindSpore federated learning framework focuses on horizontal federated learning, supports tens of millions of large-scale heterogeneous terminal deployment scenarios, and provides high-performance, highly available distributed federated aggregate computing. In terms of privacy and security, local training can be completed without the data coming out of the device. Before uploading model parameters, we will also provide multi-party secure calculation and encryption. In terms of federation efficiency improvement, we provide two federation modes, synchronous and asynchronous. In addition, the MindSpore federated learning framework is flexible and easy to use. One line of code can switch between stand-alone training and federated learning modes. Below, I will introduce the core technology of MindSpore federated learning framework in detail from three dimensions——
1. Security algorithm enhances privacy protection . Although the data is not available in the traditional federated learning framework, there is still a risk of privacy leakage when the model is shared in plain text. The MindSpore federated learning framework supports efficient federated security aggregation based on multi-party secure computing and differential privacy, and enhances privacy protection capabilities. At present, these two algorithms have their own advantages, and developers can choose according to specific application scenarios.
2. The mixed federal training program improves accuracy . In actual application scenarios, user data often has no tags on the client device, which affects the accuracy of the final model training. In this regard, we provide a hybrid federated training scheme, which is divided into two types: horizontal semi-supervised learning and fine-grained parameter decomposition. The former combines unsupervised learning and supervised learning with horizontal federated learning to solve the pain points of no label data on the end side while protecting user privacy; the latter decomposes the parameters into different parts according to the parameter function and scale of the model and optimizer. Then use different transmission and training strategies and methods to reduce the problem of large communication overhead.
3. Time-limited communication device solves the long tail effect . In the massively parallel scenario, the number of clients for cross-device federation learning is large and highly unreliable. Therefore, each training iteration has a long tail effect caused by the client's untimely response or even "being left behind", which in turn affects the overall training of federated learning performance. For this, we provide time-limited communication devices. A timing device is added in each training iteration to ensure that requests within the timing window can be processed normally, eliminate the long tail effect, reduce waiting time, and improve training efficiency. In addition, the time window can be dynamically adjusted according to the actual situation.
Two application scenarios of MindSpore federated learning framework
MindSpore Federal Learning Framework is suitable for personalized recommendation scenarios for terminal advertisements . Traditional advertising scenarios will face many problems and challenges. For example, on the user portrait, the cloud side cannot obtain the richer features of the mobile phone; on the privacy coupling, due to the GDPR and other laws on the control of user data, the data cannot be uploaded to the central server. The traditional link cannot be opened up; in terms of recommendation efficiency, many links are required from the advertisement request to the final advertisement display, which requires a powerful engineering framework to improve the timeliness and stability of the service.
The Cross-Device federated learning framework in MindSpore's end-cloud collaboration solution can break the data barriers between users and the advertising platform, and joint modeling can be realized without going to the cloud. At the same time, we use small sample learning algorithms to make full use of end-user characteristic data and resources to optimize the PCVR estimation model and increase the advertising conversion rate. Under the premise of privacy compliance, we also support joint modeling of end-cloud collaboration to realize user tag mining; on the basis of advertising orientation, perform secondary recommendations on the end side to improve the effect of advertising conversion.
where companies collect and upload large amounts of pictures and video data. Suppose a company’s urban pipe gallery project needs to deploy some cameras on the site for security monitoring. The traditional method is to upload the video data collected by the camera to the sub-control center. After the sub-control center finishes the data preprocessing, it is transmitted to the main control center. Two problems may arise in this process: a large amount of data upload will cause a lot of bandwidth overhead, and the cost will also increase; the data often contains sensitive information such as faces and vehicles, and there is a risk of data leakage.
How to solve the problem? The Cross-silo federated learning framework of MindSpore's end-cloud collaboration solution can do local model training and inference at each site, which can not only ensure user data security, but also control bandwidth costs.
Finally, I hope that developers can continue to pay attention to the MindSpore federated learning framework and build an ecological technology of federated learning with us. Thank you!
Learn more >>
Visit Huawei Developer Alliance official website
Obtain development guide document
Huawei mobile service open source warehouse address: GitHub , Gitee
and learn about the latest technical information of HMS Core for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。