Authors: Che Yang, Liu Zhan, Jing Qi
With the gradual evolution of IT infrastructure from physical machines to virtual machines, to container environments represented by Kubernetes, and even to serverless, today's computing and application forms are developing at a rapid pace. In particular, the serverless application of data intelligence applications such as AI and data analysis has become a trend. Gartner has predicted that by 2023, 70% of AI applications will be developed based on containers and serverless technologies. This is the future that everyone believes in, but the road to application serverless is full of hardships, such as high threshold for existing application migration, vendor lock-in, insufficient infrastructure responsiveness, lack of observability, and cumbersome storage access .
These are the rigid requirements of users for serverless basic platform providers. Who can be more open, who can better provide the infrastructure and ecosystem that serverless applications run on, and who can become the final choice of customers, not just who provides The flexibility of computing resources is stronger or the cost is lower, which is actually the internal strength competition of cloud computing giants.
(The picture comes from the Internet)
This article first focuses on the biggest challenge of serverless applications such as AI and big data: the huge challenge of data access delay and remote data bandwidth pull caused by the separation architecture of computing and storage. Especially in GPU deep learning training scenarios, iterative remote reading of large amounts of training data will seriously slow down GPU computing efficiency.
Serverless data access challenges
The core capability provided by the serverless cloud platform to users is extreme elasticity, but the elasticity discussed here is not just resource elasticity, but application elasticity or business elasticity from the perspective of users, that is, from the time when the user decides to expand the application until the application is truly ready to provide services end-to-end time. Resource elasticity time is a very important part of this. Computing resources can be expanded in seconds or even milliseconds, and the related infrastructure will soon face huge pressure. The most common infrastructure is storage. If the IO throughput capacity of the storage system cannot keep up with the speed of instance changes, for example, for a business, the container instance can be expanded in 2 seconds, but it still takes tens of seconds or even several minutes to wait for data to be downloaded from the storage, then extreme elasticity There is no way to talk.
It should be said that the technical system of serverless containerization poses new challenges to traditional storage systems:
1. High-density access:
Computing resources have no storage capacity, and data is completely sunk to the storage system, resulting in a sharp increase in concurrent data access pressure. On the one hand, it affects the stability of the storage system, and on the other hand, it also fills up the service bandwidth of the storage system.
2. Network delay:
The compute-storage separation architecture lengthens the storage access link, and additional latency for communicating data and metadata access across the network.
3. The expansion cost of IO throughput is high:
The bandwidth and throughput of traditional distributed storage are only proportional to the data usage capacity, while the serverless platform will create a large number of containers to access the data of the storage system concurrently, which will trigger the storage system access limit. This creates a conflict between the extreme flexibility of computing resources and the limited bandwidth of storage systems.
In other words, serverless big data and AI scenarios require data offloading to shorten data read links and provide elastic IO throughput. However, considering the stability, cost and versatility of the existing cloud storage system, the evolution of this architecture will be a long way to go.
Fluid: An abstraction for the process of using data for applications
Fluid is the official sandbox open source project under the Cloud Native Computing Foundation (CNCF). This open source project is aimed at data-intensive application acceleration and orchestration in cloud-native scenarios. The main developers and maintainers of the community are from Alibaba, Nanjing University and many other well-known enterprises and universities. Fluid's idea is to layer the capabilities of the storage system, decompose them into data storage and data access capabilities, and transfer part of the data access capabilities to the computing cluster; and in the big data (AI) scenario, "computing tasks" Abstracting the process of using data", the concept of elastic data set Dataset is proposed, and the data distributed caching technology is combined with cloud-native automatic elasticity (Autoscaling), portability (Portability), and scheduling (Scheduling) capabilities.
Fluid project address:
The advantage of shifting the data access capability upwards is that it can utilize the idle resources in the computing cluster to provide data offloading through the distributed data caching capability to reduce the pressure on the central storage, and on the other hand, utilize the data locality, VPC network performance advantages and data access multiplexing Through the abstraction of Dataset, in a specific context, the scope, characteristics and elasticity of data access are defined according to the needs of the application, and specific performance optimization is carried out according to these characteristics.
All problems in computer science can be solved by another level of indirection
In fact, Fluid still follows the principle that all problems in computing science can be solved by adding a layer of abstraction. In Kubernetes, it provides the data access and orchestration layer Dataset abstraction for the representative computing cluster, and realizes Dataset management (CRUD operation), data preheating, With capabilities such as data backup/restore permission control and access acceleration, Fluid's architecture can be extended and compatible with a variety of distributed cache services through the CacheRuntime plugin. Currently, Alibaba Cloud EMR's JindoFS, open source Alluxio, Juicefs and other caching engines have been supported, and this is a transparent experience for users.
There is no silver bullet, the essence of Fluid is to use the idle resources (CPU, Memory, Disk) of the computing cluster and the abstract assumptions of specific scenarios to simplify the problem, and reduce the pressure on the central storage through data offloading ; the nearest cache (Tiered Locality Cache ) Cache Locality Scheduling improves data access performance; when computing resources are concurrently accessing data, the cache cluster is automatically expanded to provide elastic IO throughput .
On this basis, Fluid supports data elasticity and operability, and can achieve according to computing throughput requirements:
- Scalability of datasets - Fluid supports Runtime elastic scaling mode, which can achieve elastic scaling by controlling the number of cache workers.
- The operability of data collections - Fluid supports multiple modes of pre-warming through Dataload CRD, including pre-warming the metadata of specified file lists and folders.
From bottom to top, Fluid embraces Alibaba Cloud ECI
Therefore, how to implement the distributed cache engine to the serverless containerization platform has become a challenge. The key issue is how to run it in a serverless containerized platform safely, openly and flexibly . Combining Fluid open source software with a serverless platform to accelerate application access data performance is the first attempt in the industry. It is necessary for both parties to meet each other halfway and jointly define the interface and rights and responsibilities of collaboration, otherwise the change will not happen. Fluid needs to respect the conceptual model and life cycle of the serverless platform (such as a Pod occupying a virtual machine model), and the serverless platform needs to provide minimal openness based on certain standards.
There are three main problems behind this:
- The interface of protocol and division of labor is unclear : The serverless platform is a black box system, and the runtime support is developed by the platform provider, and the support for the Kubernetes ecosystem is not perfect. For example, most Serverless do not support the CSI Plugin and Device Plugin mechanisms that are widely supported by the community. The traditional method is to inject various storage system clients into the virtual machine images run by the Serverless Pod, but this requires serious binding, difficult upgrades, and bloated images in the management of virtual machine images. At the same time, software troubleshooting Time boundaries are unclear.
- Balance between security risk and openness : Serverless platforms must be multi-tenant, so platform security is the lifeblood. However, to enable more scenarios with serverless, more open source software needs to be supported, and more attack surfaces will definitely be introduced at this time; therefore, the relationship between the two needs to be carefully weighed. This is a test of the hard core strength of the platform and also requires the control capability of the software supply chain.
- Conflict between distributed and serverless : Serverless applications assume that the execution of simple and independent tasks does not communicate with each other, and the computing tasks are completely stateless; life cycle management should also be different. However, Fluid itself is to be accelerated through a distributed cache system. The distributed (network communication), state (cache), and life cycle consistency involved here is another contradiction.
In order to support serverless scenarios, Alibaba Cloud's container service team, basic software and operating system team, elastic computing ECI team, and data lake storage team work together to provide a complete bottom-to-top solution. The basic idea of solving problems is to divide and conquer different problems, and the principle is:
- Embrace existing standards, such as using Sidecar, Device Plugin and other standards in Kubernetes as open interfaces. Ensure consistent user experience.
- Fine-grained Linux privilege control
- Separation of concerns allows different roles to focus on their own capabilities, and storage and computing can evolve in parallel.
The final solution is to transparently replace the Perstistent Volume Claim through the Sidecar mechanism, and to reinforce the security of the existing FUSE solution; at the same time, the distributed cache and elastic capabilities are run on the ServerFull node (ECS); in addition, special tasks for applications including Sidecar are implemented. support.
value
open standard
For storage capability providers, the storage components do not need to be directly connected to the serverless platform, and the open source Fluid plays the role of adaptation. The standard disclosure also reduces the complexity of maintaining storage clients on the computing platform. For users, running the storage client in sidecar mode provides basically the same experience on the cloud and off the cloud, and there is no need to worry about platform binding.
separation of concerns
Separation of concerns is what Fluid uses to solve the dilemma of the lack of layering of responsibilities between storage capability providers and serverless platform teams.
For the storage capability provision team , there is no need to load the storage client into the virtual machine in the form of an installation package, so that it can independently iterate its own capabilities without relying on the serverless platform team's virtual machine image release rhythm; at the same time, there is no need to wait for the serverless platform when troubleshooting. The team collects logs, which improves the efficiency of problem diagnosis and repair.
For the serverless platform team , the storage component exists as a sidecar, which reduces the complexity and size of the software in the virtual machine image, avoids the stability risk introduced by the frequent release of virtual machine images, and reduces the complexity of troubleshooting.
elasticity
It can be combined with distributed storage to achieve throughput capacity expansion on demand by using the resources of the user's own computing cluster. On the one hand, it provides users with the option of flexibly paying according to IO demand, and also reduces the cost pressure of storage.
Safety
Through refined privilege control, a certain level of capability layering under Serverless is achieved, and security guarantees are provided for the underlying platform under the premise of opening.
observable
The traditional storage client runs in the serverless platform as a process, and the health status and resource consumption are black boxes; the troubleshooting of the storage client also needs to rely on the operation and maintenance of the serverless platform. In the new mode, the storage client runs in the form of a container , which provides full observability.
Quick experience
This experiment simulates downloading and initializing the model when the model inference service starts.
precondition
- Have an Alibaba Cloud Container Service ACK Pro cluster, and the K8s version number ≥ 1.18
- There are virtual nodes in the cluster. If there is no reference to ACK Virtual Node installation, see: https://help.aliyun.com/document_detail/118970.htm
- Activate the cloud native AI suite, and install and upgrade Fluid to the latest version, see: https://help.aliyun.com/document_detail/201997.html
Steps
- Check if Fluid is installed successfully and the version is up to date:
$ helm list -n fluid-system
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ack-fluid fluid-system 1 2022-08-13 16:05:47.540883158 +0800 CST deployed ack-fluid-0.8.1 0.8.0-50edf67
- Fluid relies on the Webhook mechanism to inject serverless dependent sidecars
$ kubectl label namespace default fluid.io/enable-injection=true
- Check if the label of the namespace is successfully opened
$ kubectl get namespace default --show-labels
NAME STATUS AGE LABELS
default Active 5h28m fluid.io/enable-injection=true
- Create dataset and cache runtime
This example uses JindoFS as the data set cache engine and mounts an OSS Bucket. The OSS Bucket stores the data sets that the application needs to access. The OSS Bucket is located in the Beijing region, and the bucket name is large-model-sh. In actual use, please modify it according to the attributes of the OSS Bucket actually used.
4.1 Create Secret Encrypted Access Key
$ cat << EOF > secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: access-key
stringData:
fs.oss.accessKeyId: <YOUR_ACCESS_KEY_ID>
fs.oss.accessKeySecret: <YOUR_ACCESS_KEY_SECRET>
EOF
4.2 The Access Key in the above example can be queried on the RAM access control page
$ kubectl create -f secret.yaml
4.3 Create Dataset and JindoRuntime
$ cat << EOF > dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: serverless-data
spec:
mounts:
- mountPoint: oss://large-model-sh/
name: demo
path: "/"
options:
fs.oss.endpoint: oss-cn-beijing-internal.aliyuncs.com
encryptOptions:
- name: fs.oss.accessKeyId
valueFrom:
secretKeyRef:
name: access-key
key: fs.oss.accessKeyId
- name: fs.oss.accessKeySecret
valueFrom:
secretKeyRef:
name: access-key
key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: serverless-data
spec:
replicas: 1
tieredstore:
levels:
- mediumtype: MEM
path: /dev/shm
quota: 10Gi
high: "0.95"
low: "0.7"
EOF
Related parameters:
4.4 Perform the following steps
$ kubectl create -f dataset.yaml
4.5 View Dataset status:
$ kubectl get dataset serverless-data
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
serverless-data 1.16GiB 0.00B 10.00GiB 0.0% Bound 80s
4.6 Corresponding will create pvc
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
serverless-data Bound default-serverless-data 100Gi ROX fluid 91s
- Create an ECI-based Deployment Access Dataset
$ cat << EOF > serving.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-serving
spec:
selector:
matchLabels:
app: model-serving
template:
metadata:
labels:
app: model-serving
alibabacloud.com/fluid-sidecar-target: eci
alibabacloud.com/eci: "true"
annotations:
k8s.aliyun.com/eci-use-specs: ecs.s6-c1m2.xlarge
spec:
containers:
- image: fluidcloudnative/serving
name: serving
ports:
- name: http1
containerPort: 8080
env:
- name: TARGET
value: "World"
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: serverless-data
EOF
- alibabacloud.com/eci: "true" in labels indicates that the container is started as an ECI container
- In labels, alibabacloud.com/fluid-sidecar-target "eci" tells Fluid that the Pod is started in serverless mode, and Fluid will use Webhook to automatically inject the Fuse container into the application Pod, so that the application Pod can access the dataset through the POSIX interface
- k8s.aliyun.com/eci-use-specs in annotations indicates the ECI instance specification used
- Create Deployment
$ kubectl create -f serving.yaml
- Looking at the startup log, you can see that the time to start loading data is 64s
$ kubectl logs model-serving-546578c447-5x6fm -c serving
Begin loading models at 16:35:38
real 1m4.999s
user 0m0.000s
sys 0m1.143s
Finish loading models at 16:36:43
2022-08-13 16:36:43 INFO Hello world sample started.
8. Delete the service
$ kubectl delete -f serving.yaml
- Warm up the cached data, this step is not required
$ cat<<EOF >dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: serverless-data-warmup
spec:
dataset:
name: serverless-data
namespace: default
EOF
Execute Dataload, and check the cache effect. It can be seen that the 1.2G data has been completely cached.
$ kubectl create -f dataload.yaml
dataload.data.fluid.io/serverless-dataload created
$ kubectl get dataload
NAME DATASET PHASE AGE DURATION
serverless-dataload serverless-data Complete 2m43s 34s
$ kubectl get dataset
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
serverless-data 1.16GiB 1.16GiB 10.00GiB 100.0% Bound 19m
- Create the service again
$ kubectl create -f serving.yaml
At this time, looking at the startup time, it is found that the current startup time to load data is 6.263s , which becomes 1/10 of the time without preheating, and the speed is increased by 10 times.
kubectl logs model-serving-546578c447-pkcpf -c serving
Begin loading models at 17:18:54
real 0m6.263s
user 0m0.000s
sys 0m0.998s
Finish loading models at 17:19:00
2022-08-13 17:19:04 INFO Hello world sample started.
For more usage, please refer to the documentation:
https://help.aliyun.com/document_detail/440049.html
Outlook
The combination of Fluid and Serverless scenarios is just the beginning, with the Serverless (With the Serverless), with the Serverless (On the Serverless), serving the Serverless (For the Serverless). In the first stage, we have realized the seamless connection of serverless with JindoFSX Runtime, and we will also support JuiceFS and Alluxio in the open source community in the future; at the same time, our modifications at the kernel and container level will also contribute to the community to promote the changes of the serverless platform together .
Thanks
Fluid's work in supporting Serverless is grateful to our friends from Alibaba Cloud, Nanjing University, Juicedata, and Bilibili for their joint efforts. This is the first step for Serverless to support data scenarios. Let's work together to make the cloud world a better place. If you are interested, you can also search the group number: 32850151, and join the Fluid open source community technical exchange group. If you find Fluid useful, welcome and thank you for giving the Fluid project a star.
Fluid's code repository address is:
https://github.com/fluid-cloudnative/fluid
About the Author:
Che Yang, Senior Technical Expert of Alibaba Cloud Container Service
Liu Zhen, Senior Technical Expert of Alibaba Cloud Operating System Department
Jing Qi, Alibaba Cloud Elastic Computing Technology Expert
How to efficiently schedule AI and big data jobs? How to improve resource efficiency and elasticity such as GPU? Come and try the ACK cloud native AI kit ! Based on standard Kubernetes, it provides componentized, scalable, flexibly combined and customized cloud-native AI capabilities, optimizes AI performance, efficiency and cost across the stack, and helps enterprise-level users to quickly customize and build an AI platform. Click here to claim your free trial seat in 2022 !
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。