Machine learning (ML) and deep learning (DL) are becoming more and more important in many real application scenarios. These models are trained using known data and deployed in scenarios such as image classification and content recommendation to process new data. In general, the more data, the more complete the ML/DL model. But hoarding and processing massive amounts of data also brings privacy, security, and regulatory risks.
Privacy Preserving Machine Learning (PPML) can help resolve these risks. It uses encryption technology, differential privacy, hardware technology, etc., and aims to protect the privacy of sensitive user data and training models while processing machine learning tasks.
Based on Intel® Software Guard Extensions (Intel® SGX) and Occlum, Ant Group's memory-safe multi-process user-mode operating system for Intel® SGX, Ant Group cooperated with Intel to build a PPML platform. In this blog, we will introduce this solution running on Analytics Zoo, and show that this solution is accelerated by Intel® Deep Learning (Intel® DL Boost) on the third-generation Intel® Xeon® Scalable processor. ) Performance advantage when technology is assisted.
Intel® SGX is Intel's Trusted Execution Environment (TEE), which provides hardware-based memory encryption to isolate specific application code and data in the memory. Intel® SGX allows user-level code to allocate protected areas in memory, known as "enclaves", which are not affected by the execution of higher-privilege programs (as shown in Figure 1).
Figure 1: Enhanced protection with Intel® SGX
Compared with homomorphic encryption and differential privacy, Intel® SGX can still help defend against software attacks when the operating system, driver, BIOS, virtual machine manager, or system management model is paralyzed. Therefore, Intel® SGX can still enhance the protection of private data and keys even when the attacker has complete control of the platform. The third-generation Intel® Xeon® Scalable processor can increase the CPU trusted memory area to 512GB, enabling Intel® SGX technology to lay a solid foundation for privacy protection machine learning solutions.
The Ant Group, which was formally established in 2014, serves more than 1 billion users and is one of the world's leading financial technology companies. Ant Group has been actively exploring the field of privacy protection machine learning, and launched the open source project Occlum. Occlum is a memory-safe multi-process user-mode operating system (LibOS) for Intel® SGX. After using Occlum, machine learning workloads can run on Intel® SGX with minimal (or even no need to modify) source code modification, which protects the confidentiality and integrity of user data in a highly transparent manner. The Occlum architecture for Intel® SGX is shown in Figure 2.
Figure 2 Occlum architecture for Intel® SGX (Image source: Occlum · GitHub)
Analytics Zoo empowers end-to-end PPML solutions
Analytics Zoo is a unified big data analysis and artificial intelligence platform for distributed TensorFlow, Keras and PyTorch based on Apache Spark, Flink and Ray. After using Analytics Zoo, the analysis framework, ML/DL framework and Python library can be run as a whole in a protected manner on Occlum LibOS. In addition, Analytics Zoo also provides security functions such as secure data access, security gradient and parameter management, and empowers privacy protection machine learning use cases such as federated learning. The end-to-end Analytics Zoo PPML solution is shown in Figure 3.
Figure 3 The end-to-end PPML solution provides secure distributed computing for financial services, healthcare, cloud services and other application fields
On the Analytics Zoo PPML platform, Ant Group and Intel jointly created a more secure distributed end-to-end reasoning service pipeline (as shown in Figure 4).
The pipeline is built using Analytics Zoo Cluster Serving, which is a lightweight distributed real-time service solution that supports a variety of deep learning models, including TensorFlow, PyTorch, Caffe, BigDL and OpenVINOTM.
Analytics Zoo Cluster Serving includes web front end, memory data structure storage Redis, inference engine (such as TensorFlow or OpenVINO™ tool suite optimized for Intel® architecture), and distributed stream processing framework (such as Apache Flink).
The inference engine and stream processing framework run on Occlum and Intel® SGX "enclaves". The web front end and Redis are encrypted by the Transport Layer Security (TLS) protocol, so the data (including user data and models) in the inference pipeline is more protected during storage, transmission, and use.
Figure 4 Reasoning service pipeline
Create a better future: Intel® DL Boost accelerates end-to-end PPML solutions
1. The solution implements the following end-to-end reasoning pipeline:
The RESTful http API receives user input, and the Analytics Zoo pub/sub API converts the user input into an input queue, which is managed by Redis. User data is protected by encryption.
2. Analytics Zoo grabs data from the input queue. It uses an inference engine for inference on a distributed stream processing framework (such as Apache Flink). Intel® SGX uses Occlum to protect the inference engine and distributed stream processing framework. The Intel® oneAPI Deep Neural Network Library (oneDNN) uses Intel® DL Boost that supports the Int8 instruction set to improve the performance of the distributed inference pipeline.
3.Analytics Zoo collects inference output from the distributed environment and sends it back to the output queue managed by Redis. Subsequently, the solution uses the RESTful http API to return the inference result as a prediction to the user. The data in the output queue and http communication content are encrypted.
Performance analysis
The performance of the Analytics Zoo PPML solution was verified.
Table 1 Test configuration
Figure 5 shows the test results. Compared with the inference pipeline that is not protected by Intel® SGX, when the inference solution is protected by Intel® SGX, the throughput of the ResNet50 inference pipeline will suffer a little loss. With Intel® DL Boost supporting the INT8 instruction set, the throughput of the inference pipeline protected by Intel® SGX doubled.
Figure 5 Intel® SGX, Intel® DL Boost and the third-generation Intel® Xeon® Scalable processors provide high-performance security capabilities
The Analytics Zoo PPML solution based on Intel® SGX inherits the advantages of the Trusted Execution Environment (TEE). Compared with other data security solutions, its security and data utility are very prominent, and its performance is only slightly inferior to plain text. Intel® DL Boost and Intel® oneDNN further improve the performance of the Analytics Zoo PPML inference solution. Table 2 summarizes the advantages of this solution (TEE) over homomorphic encryption (HE), differential privacy (DP), secure multi-party computing (MPC) and plain text.
Table 2 Comparison of Analytics Zoo PPML solution (TEE) with other solutions
to sum up
In an increasingly complex legal and regulatory environment, protecting customer data privacy is more important than ever for enterprises and organizations. With the help of privacy protection machine learning, companies and organizations can continue to explore powerful artificial intelligence technologies while facing a large amount of sensitive data processing to reduce security risks.
The Analytics Zoo privacy protection machine learning solution is based on Occlum, Intel® SGX, Intel® DL Boost and Analytics Zoo, and provides a platform solution to help ensure data security and big data artificial intelligence workload performance. Ant Group and Intel have jointly created and verified this PPML solution, and will continue to cooperate to explore best practices in the field of artificial intelligence and data security.
Test configuration
system configuration : 2 nodes, dual Intel® Xeon® Platinum 8369B processors, 32 cores per channel, hyper-threading enabled, turbo frequency enabled, total memory 1024 GB (16 slots/64GB/3200 MHz), EPC 512GB, SGX DCAP driver 1.36.2, microcode: 0x8d05a260, Ubuntu 18.04.4 LTS, 4.15.0-112-generic kernel, Intel's test as of March 20, 2021.
software configuration : LibOS Occlum 0.19.1, Flink 1.10.1, Redis 0.6.9, OpenJDK 11.0.10, Python 3.6.9.
workload configuration : model: Resnet50, deep learning framework: Analytics Zoo 0.9.0, OpenVINOTM 2020R2, data set: Imagenet, BS=16/instance, 16 instances/dual-channel, data type: FP32/INT8.
are obtained from tests in a laboratory environment.
Recommended reading this week
- ant cloud native application runtime exploration and practice-ArchSummit Shanghai
- takes you into cloud native technology: exploration and practice of cloud native open operation and maintenance system
- Stability is greatly improved: SOFARegistry v6 new features introduced
- Financial-level capabilities become core competitiveness, service grid drives enterprise innovation
For more articles, please scan the QR code to follow the "Financial-level Distributed Architecture" public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。