Author: Chen
Preface
KubeDL is Alibaba's open source Kubernetes-based AI workload management framework. It is taken from the abbreviation of "Kubernetes-Deep-Learning". It hopes to rely on Alibaba's scenarios to feed back the experience of large-scale machine learning job scheduling and management to the community. At present, KubeDL has entered the CNCF Sandbox project incubation. We will continue to explore the best practices in cloud-native AI scenarios to help algorithm scientists implement innovation simply and efficiently.
In the latest KubeDL Release 0.4.0 version, we have brought the model version management (ModelVersion) ability, AI scientists can easily track, mark and store model versions like managing mirroring. More importantly, in the classic machine learning pipeline, the two stages of "training" and "inference" are relatively independent. The "training -> model -> inference" pipeline from the perspective of algorithm scientists lacks faults, while the "model" is used as two stages. The intermediate product of the author can just play the role of "connecting the past and the future".
Github:https://github.com/kubedl-io/kubedl
Website: https://kubedl.io/model/intro/
Status of model management
The model file is the product of distributed training and is the essence of the algorithm retained after full iteration and search. In the industry, the algorithm model has become a valuable digital asset. Typically different distributed framework model file output in different formats, such as the normal output job training Tensorflow the CheckPoint ( .ckpt), GraphDef ( .pb), SavedModel formats, and is generally .pth PyTorch suffix, different frameworks When loading the model, it will parse the runtime data flow diagram, running parameters and their weights and other information carried in the model. For the file system, they are all files in a special format (or a group of), just like JPEG and PNG formats The same as the image file.
Therefore, the typical management method is to treat them as files and host them in a unified object store (such as Alibaba Cloud OSS and AWS S3). Each tenant/team is assigned a directory, and their respective members store the model files in their own corresponding sub-files. In the directory, SRE is responsible for the unified management and control of read and write permissions:
The advantages and disadvantages of this management method are obvious:
- The advantage is that the user’s API usage habits are retained. Specify your own directory as the output path in the training code, and then mount the corresponding directory of cloud storage to the container of the inference service to load the model;
- But this puts forward higher requirements on SRE. Unreasonable authorization of read and write permissions and misoperation may cause file permissions to be leaked, or even large areas of mistaken deletion; at the same time, file-based management is not easy to implement model version management, which is usually required The user himself marks the file according to the file name, or the upper platform itself undertakes the complexity of version management; in addition, the correspondence between the model file and the algorithm code/training parameters cannot be directly mapped, and even the same file will be overwhelmed in multiple training sessions. Overwrite, difficult to trace history;
Based on the above current situation, KubeDL fully combines the advantages of Docker image management and introduces a set of Image-Based image management APIs, which makes the integration of distributed training and inference services closer and natural, and also greatly simplifies the complexity of model management. .
Starting from the mirror
Image is the soul of Docker and the core infrastructure of the container era. The mirror itself is a hierarchical immutable file system. The model file can naturally be used as an independent mirror layer. The combination of the two will also spark other sparks:
- Users no longer need to face the file management model, but directly use the ModelVersion API provided by KubeDL, and the training and inference services are bridged through the ModelVersion API;
- Like mirroring, the model can be tagged for version traceability, and pushed to the unified mirrored Registry storage, and authenticated through the Registry. At the same time, the storage backend of the mirrored Registry can be replaced with the user's own OSS/S3, and the user can smoothly transition ;
- Once the model image is built, it becomes a read-only template, which can no longer be overwritten or tampered with, and implements the serverless "immutable infrastructure" concept;
- The image layer (Layer) reduces the cost of model file storage and speeds up the efficiency of distribution through compression algorithms and hash deduplication;
On the basis of "model mirroring", you can also fully combine the open source image management components to maximize the advantages of mirroring:
- In large-scale inference service expansion scenarios, Dragonfly can be used to accelerate the efficiency of image distribution, and stateless inference service instances can be quickly popped up in the face of sudden traffic scenarios, while avoiding large-scale instances that may appear when cloud storage volumes are mounted Current limit problem during concurrent reading;
- For daily reasoning service deployment, ImagePullJob in OpenKruise can also be used to warm up the model image on the node in advance to improve the efficiency of expansion and release.
Model and ModelVersion
KubeDL model management introduces two resource objects: Model and ModelVersion. Model represents a specific model, and ModelVersion represents a specific version in the iterative process of the model. A set of ModelVersion is derived from the same Model. The following is an example:
apiVersion: model.kubedl.io/v1alpha1
kind: ModelVersion
metadata:
name: my-mv
namespace: default
spec:
# The model name for the model version
modelName: model1
# The entity (user or training job) that creates the model
createdBy: user1
# The image repo to push the generated model
imageRepo: modelhub/resnet
imageTag: v0.1
# The storage will be mounted at /kubedl-model inside the training container.
# Therefore, the training code should export the model at /kubedl-model path.
storage:
# The local storage to store the model
localStorage:
# The local host path to export the model
path: /foo
# The node where the chief worker run to export the model
nodeName: kind-control-plane
# The remote NAS to store the model
nfs:
# The NFS server address
server: ***.cn-beijing.nas.aliyuncs.com
# The path under which the model is stored
path: /foo
# The mounted path inside the container
mountPath: /kubedl/models
---
apiVersion: model.kubedl.io/v1alpha1
kind: Model
metadata:
name: model1
spec:
description: "this is my model"
status:
latestVersion:
imageName: modelhub/resnet:v1c072
modelVersion: mv-3
The Model resource itself only corresponds to the description of a certain type of model, and tracks the latest version of the model and its mirror name to inform the user. The user mainly customizes the configuration of the model through ModelVersion:
- modelName: used to point to the corresponding model name;
- createBy: creates the ModelVersion entity to trace upstream producers, usually a distributed training job;
- imageRepo: mirroring Registry. After the model mirroring is completed, the mirroring will be pushed to this address;
- storage: model files. Currently, we support three storage media: NAS, AWSEfs and LocalStorage. In the future, we will support more mainstream storage methods. The above example shows two model output methods (local storage volume and NAS storage volume). Generally, only one storage method is allowed to be specified.
When KubeDL monitors the creation of ModelVersion, it triggers the workflow of model construction:
- Listen to the ModelVersion event and initiate a model building;
- Create the corresponding PV and PVC according to the type of storage and wait for the volume to be ready;
- Created Model Builder for user-mode image construction. For Model Builder, we adopted kaniko's solution. The construction process and image format are exactly the same as standard Docker, but all this happens in user mode and does not rely on any host. Docker Daemon;
- Builder will copy the model file (either a single file or a directory) from the corresponding path of the volume, and use it as an independent image layer to build a complete Model Image;
- Push the generated Model Image to the mirror Registry specified in the ModelVersion object;
- End the entire construction process;
At this point, the model corresponding to the ModelVersion is solidified in the mirror warehouse and can be distributed to subsequent reasoning services for consumption.
From training to model
Although ModelVersion supports independent creation and initiation of construction, we expect to automatically trigger the construction of the model after the successful completion of the distributed training job, which is naturally connected into a pipeline.
KubeDL supports this submission method. Take the TFJob job as an example. When the distributed training is initiated, the output path of the model file and the pushed warehouse address are specified. When the job is successfully executed, a ModelVersion object will be automatically created, and the createdBy points to the upstream job name, and the creation of ModelVersion will not be triggered when the job execution fails or terminates prematurely.
The following is an example of a distributed mnist training, the model output file to the local node /models/model-example-v1
path, when the building after the end of the smooth operation of the trigger model:
apiVersion: "training.kubedl.io/v1alpha1"
kind: "TFJob"
metadata:
name: "tf-mnist-estimator"
spec:
cleanPodPolicy: None
# modelVersion defines the location where the model is stored.
modelVersion:
modelName: mnist-model-demo
# The dockerhub repo to push the generated image
imageRepo: simoncqk/models
storage:
localStorage:
path: /models/model-example-v1
mountPath: /kubedl-model
nodeName: kind-control-plane
tfReplicaSpecs:
Worker:
replicas: 3
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubedl/tf-mnist-estimator-api:v0.1
imagePullPolicy: Always
command:
- "python"
- "/keras_model_to_estimator.py"
- "/tmp/tfkeras_example/" # model checkpoint dir
- "/kubedl-model" # export dir for the saved_model format
% kubectl get tfjob
NAME STATE AGE MAX-LIFETIME MODEL-VERSION
tf-mnist-estimator Succeeded 10min mnist-model-demo-e7d65
% kubectl get modelversion
NAME MODEL IMAGE CREATED-BY FINISH-TIME
mnist-model-demo-e7d65 tf-mnist-model-example simoncqk/models:v19a00 tf-mnist-estimator 2021-09-19T15:20:42Z
% kubectl get po
NAME READY STATUS RESTARTS AGE
image-build-tf-mnist-estimator-v19a00 0/1 Completed 0 9min
Through this mechanism, other "Artifacts files that are output only when the job is successfully executed" can also be solidified into the mirror together and used in subsequent stages.
From model to inference
With the previous foundation, directly refer to the built ModelVersion when deploying the reasoning service, then the corresponding model can be loaded and the reasoning service can be provided directly to the outside world. At this point, the various stages of the algorithm model life cycle (code->training->model->deployment online) are connected through model-related APIs.
When deploying an inference service through the Inference resource object provided by KubeDL, you only need to fill in the corresponding ModelVersion name in a predictor template. When the Inference Controller creates the predictor, it will inject a Model Loader, which will pull the image that carries the model file to Locally, and mount the model file to the main container by sharing Volume between containers to load the model. As mentioned above, in combination with OpenKruise's ImagePullJob, we can easily realize model mirroring preheating to speed up model loading. For the consistency of user perception, the model mounting path of the inference service and the model output path of the distributed training job are the same by default.
apiVersion: serving.kubedl.io/v1alpha1
kind: Inference
metadata:
name: hello-inference
spec:
framework: TFServing
predictors:
- name: model-predictor
# model built in previous stage.
modelVersion: mnist-model-demo-abcde
replicas: 3
batching:
batchSize: 32
template:
spec:
containers:
- name: tensorflow
args:
- --port=9000
- --rest_api_port=8500
- --model_name=mnist
- --model_base_path=/kubedl-model/
command:
- /usr/bin/tensorflow_model_server
image: tensorflow/serving:1.11.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9000
- containerPort: 8500
resources:
limits:
cpu: 2048m
memory: 2Gi
requests:
cpu: 1024m
memory: 1Gi
For a complete reasoning service, multiple predictors of different model versions may be served at the same time. For example, in a common search recommendation scenario, it is expected that the A/B Testing experiment can be used to compare the effects of multiple model iterations at the same time. Inference+ModelVersion can be very useful. Easy to do. We refer to different versions of models for different predictors and allocate reasonable weighted traffic to achieve the purpose of comparing the effects of models of different versions of Serve under one reasoning service at the same time:
apiVersion: serving.kubedl.io/v1alpha1
kind: Inference
metadata:
name: hello-inference-multi-versions
spec:
framework: TFServing
predictors:
- name: model-a-predictor-1
modelVersion: model-a-version1
replicas: 3
trafficWeight: 30 # 30% traffic will be routed to this predictor.
batching:
batchSize: 32
template:
spec:
containers:
- name: tensorflow
// ...
- name: model-a-predictor-2
modelVersion: model-version2
replicas: 3
trafficWeight: 50 # 50% traffic will be roted to this predictor.
batching:
batchSize: 32
template:
spec:
containers:
- name: tensorflow
// ...
- name: model-a-predictor-3
modelVersion: model-version3
replicas: 3
trafficWeight: 20 # 20% traffic will be roted to this predictor.
batching:
batchSize: 32
template:
spec:
containers:
- name: tensorflow
// ...
Summarize
KubeDL introduces two resource objects, Model and ModelVersion, and combines it with standard container mirroring to realize the functions of model construction, marking and version tracing, immutable storage and distribution, and liberate the extensive model file management mode. It can be combined with other excellent open source communities to realize image distribution acceleration, model image preheating and other functions, and improve the efficiency of model deployment. At the same time, the introduction of the model management API well connects the two originally separated stages of distributed training and inference services, significantly improving the automation of the machine learning pipeline, as well as the experience and efficiency of algorithm scientists' online models and experimental comparisons. We welcome more users to try KubeDL and provide us with valuable opinions. We also look forward to more developers' attention and participation in the construction of the KubeDL community!
KubeDL Github address:
https://github.com/kubedl-io/kubedl
Stamp here , immediately understand KubeDL project!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。