Author: Dongden

background

The importance of data to today's Internet business is self-evident, it permeates almost every corner of the world today. But data alone is not enough. What really makes data valuable is the intensive analysis and calculation of large amounts of data for various business scenarios, such as machine learning, big data analysis, OLAP aggregation analysis, and so on. In recent years, with the increase of data scale, these data-intensive applications with higher resource requirements have naturally led to cloud services known for their elastic resources.

Serverless does not seem to be an obvious beneficiary of this trend of data-intensive applications going to the cloud. Although almost everyone praises this architecture with unlimited expansion of computing resources, flexible and agile delivery, and low operation and maintenance costs, because it pushes the architecture of separation of computing and storage to a purer extreme, it has strong data dependencies. of data-intensive applications becomes extremely difficult to run efficiently in a serverless environment.

For example, if we want to deploy the AI inference service application under the serverless architecture, the AI model stored in the external storage system must be pulled into the local memory before each service application is started. Considering that AI large models have become mainstream in the industry in recent years, let us assume that the size of this AI model is 30GB and is stored in the OSS object storage service. If 100 such AI inference service applications need to be started at the same time, then the total data read The amount taken is 3000GB. The default data access speed limit of OSS storage is 10Gbps, which means that 100 applications need to wait for 2400 seconds (3000GB 8 / 10Gbps) before they can actually start to provide external services. If we create an ecs.gn7i-c32g1.16xlarge instance for each service (converted to 0.008 yuan per second per hour unit price), it means that 1920 yuan has been spent on waiting for data (0.008 yuan / 2400 seconds per second) 100) . In conclusion, a significant amount of our fees are not being spent on value-generating computations , which is clearly not what we want. (The actual price is subject to the display on Alibaba Cloud's official website)

So, is there any way to optimize the above process? This brings us to the protagonist of this article: Fluid . Fluid is a Kubernetes-native distributed dataset orchestration and acceleration engine. The original intention of Fluid was to provide cloud-native solutions for application data access latency problems. For the above-mentioned related problems that plague serverless data-intensive applications, Fluid provides a new data access architecture solution for serverless environments on the premise of ensuring an easy-to-use user experience to help users improve data access efficiency.

 title=

This article will introduce the running example of Fluid step by step, help you understand how to use Fluid in the Alibaba Cloud ASK (Alibaba Serverless Kubernetes) environment, and show how to use Fluid to achieve zero to zero (from zero resource usage to the end of all resource release) ) large-scale data-intensive task execution mode, and achieve the effect of reducing cost and speeding up.

Fluid on ASK run example

The function of Fluid data orchestration to accelerate Serverless Kubernetes is still in the public beta stage, you can click to read the original text to apply for a trial seat.

Fluid deployment

Before running the following example, you first need to build an ASK cluster and configure Kubeconfig to connect to the cluster. For related steps, please refer to the document " How to Create an ASK Cluster" at the end of this article. Before using the functions of Fluid, you need to deploy the control plane components of Fluid to the ASK cluster. This deployment process can be easily completed through the Alibaba Cloud Container Service-Kubernetes console.

As shown below:

 title=

  1. Select the "Apply-Helm" sub-panel to the right of the ASK cluster panel
  2. Click the "Create" button
  3. Search for ack-fluid in the Chart market to find the Helm Chart corresponding to Fluid, and fill in the "application name" (eg: ack-fluid).
  4. After clicking "Next", choose to use the default fluid-system as the deployment namespace
  5. Then, without any modification to Chart Values, just click "OK" to deploy Fluid to the ASK cluster.

After configuring the Kubeconfig information corresponding to the ASK cluster, enter the following command:

 $ kubectl get pod -n fluid-system

It can be observed that several components of Fluid have been running normally:

 NAME                                  READY   STATUS    RESTARTS   AGE
dataset-controller-d99998f79-dgkmh    1/1     Running   0          2m48s
fluid-webhook-55c6d9d497-dmrzb        1/1     Running   0          2m49s

in:

  • Dataset Controller: Responsible for maintaining the complete life cycle of Dataset CRs introduced by each Fluid.
  • Fluid Webhook: Responsible for automatic transformation (Mutation) of application Pods that users need to access data, and help users realize the data access function of serverless scenarios without perception.

In addition to the two components described above, the control surface of Fluid still includes some controller components closely related to various caching systems (eg: JindoFS, JuiceFS, Alluxio, etc.), which will not be created in the initial deployment state, Only when the user specifies that a certain cache system needs to be used, the associated cache system controller component Pod will expand on demand, thus helping users save costs as much as possible in a pay-as-you-go ASK environment.

Data cache deployment

Everything in the Fluid world is centered on the Dataset, a custom resource : whether it is abstract management of existing data in external storage or data access from application Pods, users need to interact with the Dataset resource. Whenever a user creates a Dataset CR and specifies its cache system backend, Fluid will automatically deploy the data cache to the Kubernetes cluster.

In the following practice process, we take Alibaba Cloud OSS object storage as an example of an external storage system to simulate a complete standard data usage process of " cache deployment-data access-resource recovery ".

  • data file preparation

First, prepare a file to be accessed. For example, here we use the dd command to quickly create a file about 30G in size:

 $ cd $(mktemp -d)

$ dd if=/dev/zero of=./largefile-30G bs=10M count=3072

3072+0 records in
3072+0 records out
32212254720 bytes (32 GB) copied, 108.189 s, 298 MB/s

$ ls -lh ./largefile-30G 
-rw-r--r-- 1 root root 30G Sep  7 21:11 ./largefile-30G

Next, upload the file to be accessed created above to the OSS Bucket. Here, an OSS Bucket named fluid-demo located in the Beijing Region is used as an example.

 $ ossutil cp -i <access_key_id> -k <access_key_secret> -e oss-cn-beijing-internal.aliyuncs.com ./largefile-30G oss://fluid-demo/
  • Create Fluid Dataset and Runtime resources

After the data is prepared and uploaded, the above data to be accessed can be declared in Fluid. Specifically, we need to submit a Dataset CR and a Runtime CR. The Dataset CR describes the URL location of the data in the external storage system, and the Runtime CR describes the cache system and its specific configuration.

First, store the identity credential information required to access the OSS Bucket in the Secret:

 $ kubectl create secret generic oss-access-key \
  --from-literal=fs.oss.accessKeyId=<access_key_id> \
  --from-literal=fs.oss.accessKeySecret=<access_key_secret>

Next, define Dataset CR and Runtime CR. Here we choose JindoFS as the backend of the cache system, and the Fluid Runtime resource is JindoRuntime:

 apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts: 
    - mountPoint: oss://fluid-demo # OSS Bucket URL
      name: demo
      path: /
      options:
        fs.oss.endpoint: oss-cn-beijing-internal.aliyuncs.com # OSS Bucket内网访问端点
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: oss-access-key
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: oss-access-key
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  # 缓存Worker节点数量
  replicas: 5
  podMetadata:
    annotations:
      # 选择JindoFS Pod使用的实例规格
      k8s.aliyun.com/eci-use-specs: ecs.d1ne.6xlarge
      # 启用实例镜像缓存,加速Pod启动过程
      k8s.aliyun.com/eci-image-cache: "true"
  tieredstore:
    levels:
      # 以40GiB的内存作为每个缓存Worker节点的缓存介质
      - mediumtype: MEM
        volumeType: emptyDir
        path: /dev/shm
        quota: 40Gi
        high: "0.99"
        low: "0.99"

Create the above Dataset CR and JindoRuntime CR to the ASK cluster:

 $ kubectl create -f dataset.yaml
  • View Dataset Deployment Status

After the Dataset CR and JindoRuntime CR are created, the Dataset will be deployed in about 1 to 2 minutes, and then you can view the information related to the data in the cache system and the back-end storage system.

 $ kubectl get dataset demo-dataset 
NAME           UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
demo-dataset   30.00GiB         0.00B    200.00GiB        0.0%                Bound   2m9s

For example, the example above shows information about the Fluid Dataset

  • Total dataset size in OSS (UFS TOTAL SIZE): 30.00GiB
  • Current cache amount (CACHED): 0.00B
  • Cache system capacity (CACHE CAPACITY): 200.00GiB
  • Dataset cache percentage (CACHED PERCENTAGE): 0.0%
  • Dataset status (PHASE): Bound, indicating that it has been deployed successfully.

Data cache warm up

The data access acceleration in the Kubernetes cluster implemented by Fluid is not Magic Trick: its essence is to reduce the access pressure on the central storage through data offloading: Fluid will cache the data that needs to be accessed to the distributed cache closer to the application Pod In the system (for example: JindoFS, JuiceFS, Alluxio, etc.), the application Pod located in the same VPC network as the cache system can access data with the VPC intranet bandwidth far higher than the direct access center storage bandwidth . Furthermore, since the distributed cache system is connected, when the bandwidth provided by a single cache system worker node is insufficient, the distributed cache system can be expanded to achieve elastic scaling of data access bandwidth and match the flexibility of computing resources in serverless scenarios.

Therefore, in order to achieve high-bandwidth data access through data offloading, it is a necessary operation to perform data cache preheating before applying Pod for data access.

  • Create DataLoad CR

To perform data cache warmup in Fluid simply create a DataLoad CR as follows:

 apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: demo-dataset-warmup
spec:
  # 指定需要预热的Dataset
  dataset:
    name: demo-dataset
    namespace: default
  loadMetadata: true
  target:
    - path: / # 指定预热的数据子路径,“/”表示预热整个数据集
      replicas: 5 # 预热后数据在缓存系统中的副本数量
 $ kubectl create -f dataload.yaml
  • View the Dataset status after warming up

Check the DataLoad CR status until its PHASE changes to Complete status:

 $ kubectl get dataload demo-dataset-warmup
NAME                  DATASET        PHASE      AGE     DURATION
demo-dataset-warmup   demo-dataset   Complete   2m38s   2m20s

The duration of data preheating can be viewed through Duration. In the above example, the preheating time is 2m20s .

After the data cache is warmed up, the relevant cache status on the Dataset is also updated:

 $ kubectl get dataset demo-dataset     
NAME           UFS TOTAL SIZE   CACHED      CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
demo-dataset   30.00GiB         150.00GiB   200.00GiB        100.0%              Bound   8m27s

It can be seen that after the data preheating is completed, the entire data set has been cached in the distributed cache system, and the cache ratio is 100.0%. Since the number of preheated cache copies is specified to be 5 during preheating, the cache occupies after preheating. The total amount is 5 times the dataset size.

Note: Increasing the number of cache copies can help reduce the performance bottleneck of single-point access to distributed cache Worker nodes during data access.

data access

Next, try to create an application Pod to access data in OSS. We will pull up 100 application Pods at a time, and let these Pods access data files in OSS at the same time. Such data reading modes are very common in specific scenarios such as elastic expansion of AI inference services and automatic driving simulation.

  • Create a data access application

For example: Here is an example of an Argo Workflow application:

 apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: parallelism-fluid-
spec:
  entrypoint: parallelism-fluid
  # Argo Workflow Task最大并发数,即同时启动100个Pod
  parallelism: 100
  podSpecPatch: '{"terminationGracePeriodSeconds": 0}'
  podMetadata:
    labels:
      # 添加如下label打开Fluid对Serverless场景的数据访问支持
      alibabacloud.com/fluid-sidecar-target: eci
    annotations:
      # 启用实例镜像缓存,加速Pod启动过程
      k8s.aliyun.com/eci-image-cache: "true"
      # 选择Argo Workflow Pod使用的实例规格
      k8s.aliyun.com/eci-use-specs: ecs.g6e.4xlarge
  templates:
  - name: parallelism-fluid
    steps:
    - - name: domd5sum
        template: md5sum
        withSequence:
          start: "1"
          end: "100"
  - name: md5sum
    container:
      imagePullPolicy: IfNotPresent
      image: alpine:latest
      # 每个pod计算待读取文件的md5sum
      command: ["/bin/sh", "-c", "md5sum /data/largefile-30G"]
      volumeMounts:
      - name: data-vol
        mountPath: /data
    volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset # claimName须与Fluid Dataset名字一致
 $ argo submit workflow.yaml
  • View data access application status

The Argo Workflow of the above example will pull up 100 Pods at a time for parallel data access. After the above 100 Pods are all completed, you can see the following results:

 $ argo list
NAME                      STATUS      AGE   DURATION   PRIORITY
parallelism-fluid-x677t   Succeeded   8m    5m         0

To view the specific information of the task:

 $ argo get parallelism-fluid-x677t                                                                                                                               
Name:                parallelism-fluid-x677t                                                                                                                      
Namespace:           default                                                                                                                                      
ServiceAccount:      unset (will run with the default ServiceAccount)                                                                                             
Status:              Succeeded                                                                                                                                    
Conditions:                                                                                                                                                       
 PodRunning          False
 Completed           True
Created:             Wed Sep 07 21:25:30 +0800 (7 minutes ago)
Started:             Wed Sep 07 21:25:30 +0800 (7 minutes ago)
Finished:            Wed Sep 07 21:31:28 +0800 (1 minute ago)
Duration:            5 minutes 58 seconds
Progress:            100/100
ResourcesDuration:   8h10m22s*(1 alibabacloud.com/vfuse),24h15m6s*(1 cpu),24h15m6s*(100Mi memory)

STEP                        TEMPLATE           PODNAME                             DURATION  MESSAGE
 ✔ parallelism-fluid-x677t  parallelism-fluid                                                   
 └─┬─✔ domd5sum(0:1)        md5sum             parallelism-fluid-x677t-2855796525  5m          
   ├─✔ domd5sum(1:2)        md5sum             parallelism-fluid-x677t-1226856655  5m          
   ├─✔ domd5sum(2:3)        md5sum             parallelism-fluid-x677t-2858910973  5m          
   ├─✔ domd5sum(3:4)        md5sum             parallelism-fluid-x677t-2609269875  4m          
   ├─✔ domd5sum(4:5)        md5sum             parallelism-fluid-x677t-616770109   5m          
   ├─✔ domd5sum(5:6)        md5sum             parallelism-fluid-x677t-3071900311  5m          
   ├─✔ domd5sum(6:7)        md5sum             parallelism-fluid-x677t-3841084237  5m          
   ├─✔ domd5sum(7:8)        md5sum             parallelism-fluid-x677t-120540963   5m          
   ├─✔ domd5sum(8:9)        md5sum             parallelism-fluid-x677t-1353329645  5m          
   ├─✔ domd5sum(9:10)       md5sum             parallelism-fluid-x677t-2391364586  5m          
   ├─✔ domd5sum(10:11)      md5sum             parallelism-fluid-x677t-4083824607  5m          
   ├─✔ domd5sum(11:12)      md5sum             parallelism-fluid-x677t-258640575   5m          
   ├─✔ domd5sum(12:13)      md5sum             parallelism-fluid-x677t-3913466863  5m          
   ├─✔ domd5sum(13:14)      md5sum             parallelism-fluid-x677t-1949266799  5m          
   ├─✔ domd5sum(14:15)      md5sum             parallelism-fluid-x677t-214569823   5m          
   ├─✔ domd5sum(15:16)      md5sum             parallelism-fluid-x677t-684353087   5m

It can be seen that the running time of the entire task is 5m58s.

Recycle

  • Data cache resource recycling

When the data cache is no longer needed, users can reclaim the cache from the ASK cluster to save cluster resources and reduce costs. Recycling the cache system in Fluid only needs to delete the associated Fluid Dataset, for example:

 $ kubectl delete dataset demo-dataset

After executing the above delete command, wait for a while (Fluid will perform some cleaning work), and the related Pods of the cache system will be recycled.

  • Fluid Control Surface Assembly Recycling

After the Pods related to the cache system are reclaimed, users can also try to reclaim the resources occupied by the control plane components. Execute the following script to scale down the control plane components.

 $ kubectl get deployments.apps -n fluid-system | awk 'NR>1 {print $1}' | xargs kubectl scale deployments -n fluid-system --replicas=0

When you need to use Fluid again, execute the following expansion command to create a new control plane component Pod:

 $ kubectl scale -n fluid-system deployment dataset-controller --replicas=1

$ kubectl scale -n fluid-system deployment fluid-webhook --replicas=1

Scheme effect

Running the above example multiple times, and adjusting the number of workers in the cache system (5 or 10) or choosing to directly access the OSS object storage, we get the following effect data:

Effect 1: Elastically scalable data access bandwidth

 title=

Figure 1 Comparison of effective data access bandwidth provided by cache/storage systems

Based on the overall time-consuming of the data access application, the size of the data files accessed, and the number of pods accessed by the data, we can calculate the "effective data access bandwidth*" performance in Figure 1. From Figure 1, compared to the default bandwidth (10Gbps) provided by the OSS storage system, Fluid's data offloading mechanism can provide serverless applications with greater effective access bandwidth, and this bandwidth can be elastically improved by increasing the number of cache Worker nodes .

*Effective data access bandwidth = number of serverless data access pods x data volume accessed by each pod / overall data access application time

Effect 2: Cost reduction due to accelerated data access

 title=

Figure 2 Cost comparison of direct access to OSS vs. Fluid

In the above example we use the following ECI instance specifications:

  • Argo Workflow task Pod: ecs.g6e.4xlarge (unit price 0.0012 yuan per second)
  • Cache system Pod: ecs.d1ne.6xlarge (unit price 0.0056 yuan per second)

From this, it can be calculated that "Figure 2 Cost comparison of direct access to OSS vs. Fluid". Looking at Figure 2, it is not difficult to find that accessing data in OSS through Fluid can reduce the cost to about one-sixth to one-eighth of the original cost . In addition, under the premise of using Fluid to access data, using more cache Worker nodes can save more costs. The main reason behind this is that Fluid provides greater data access bandwidth, which improves data access performance and shortens the time the application spends on data reading (see Figure 3), so that the purchased serverless elastic computing power can truly achieve Make the best use of it.

 title=

Figure 3 Time-consuming comparison of Argo Workflow tasks

Summarize

This article shows a complete data access example of running Fluid in the ASK environment, hoping to help you understand the experience of using Fluid, its running effect, and the feasibility of combining serverless and data-intensive applications. Specifically, we see:

  • It is easy for users to use Fluid to access data : users only need to modify the PVC that originally accessed the OSS to the PVC corresponding to the Fluid Dataset.
  • Fluid can provide elastically scalable data access bandwidth to help large-scale data access applications improve data read efficiency.
  • Due to the improvement of data reading efficiency, Fluid can help large-scale data access applications to significantly reduce costs.

Reference link

[1] How to create an ASK cluster:

https://help.aliyun.com/document_detail/86377.html

[2] ACK Cloud Native AI Kit Details:

https://help.aliyun.com/document_detail/201994.html

[3] Fluid project github address:

https://github.com/fluid-cloudnative/fluid

Click here to apply for a free trial seat of the ACK Cloud Native AI Suite!


阿里云云原生
1k 声望302 粉丝