Kubernetes Node Elastic Scaling Practice Component Amazon Karpenter: Deploying GPU Inference Applications

turn out to be

Amazon Karpenter is one of the ever-expanding components for Kubernetes clusters officially released at the Amazon Cloud Technology 2021 re:Invent conference. Components in a Kubernetes cluster can automatically create new nodes of nodes and join them according to the needs of Unscheduleable Pods. Since the Cluster AutoScaler (CA) component is used in the Kubernetes cluster to perform elastic scaling of nodes, the relative size is determined by dynamically adjusting the size of the node group (node group, which is implemented as EC2 Auto Scaling Group on Amazon Cloud Technology). In general, Amazon prefers to choose an implementation that ditches nodes entirely, and the EC2 Fleet API handles node management directly, so there is flexibility in choosing the right Amazon EC2 model, availability zone, and option group. (like Demand or SPOT) etc, Carpenter in the cluster.

Amazon Karpenter is now ready for production, and some users have begun to use Amazon Karpenter, an Amazon Cloud Technology EKS cluster, for node management on a GPU. In the blog, we will use the GPU inference scenario as an example to explain in detail the working principle, configuration process and test effect of Amazon Karpenter.

Architecture description

In this blog we will use EKS to build a Kubernetes cluster:

It can be seen that in the Kubernetes group and, we will first create a node deployment management component, deploy CoreDNS, Amazon Load Balancer Controller Amazon Karpenter and other management components. Generally speaking, the resources required by the management components are relatively fixed. We can set the resources required by the related components in advance, and consider the factors such as cross-availability zone and high availability, and then determine the type and number of node groups.

We automatically manage these primary instances that are required for these live services. The task of Amazon Karer is the node created by Unsh Amazon Pod, which will be used when these Pod ECs use Amazon EC to schedule nodes. Affinity, etc.) to calculate the node type and number that meet the requirements. In this example working group, group node type planning in advance is done by us by creating a separate group for GPU instances and configuring Group C resource autoscaler, and Karpenter will do it automatically. .

We will deploy an inference service (resnet server), which is finally composed of multiple pods, each of which can work independently and be exposed to the outside of EKS through the service. Through a Python client, we can submit a picture to the time of automatic server deployment, and configure the front-end changes. HPA Po automatically scales the changes in size to truly realize the elastic miniaturization of nodes.

Consider, considering that we will expose the service as a load balancer in advance, so in the example we will deploy the Amazon load balancer controller to create a network load balancer based on the service.

Deploy and test

We will deploy according to the above structure. The deployment process of the overall command is as follows:

1. Create EKS GPU Inference Association

Use the eksctl tool to create the command EKS.

First, when configuring information such as category name, region, variable, etc. to the environment, it can be used for the configuration of various ID names, regions, variables and other information:

 export CLUSTER_NAME="karpenter-cluster"
export AWS_REGION="us-west-2"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"

Swipe left to see more

Prepare the group's configuration file for use by the eksctl tool:

 cat << EOF > cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
 name: ${CLUSTER_NAME}
 region: ${AWS_REGION}
 version: "1.21"
 tags:
   karpenter.sh/discovery: ${CLUSTER_NAME}

iam:
 withOIDC: true

managedNodeGroups:
 - name: ng-1
   privateNetworking: true
   instanceType: m5.large
   desiredCapacity: 3
EOF

Swipe left to see more

Create an EKS cluster with the above configuration file:

eksctl 创建集群 -f cluster.yaml

eksct is able to configure device groups and device groups as part of 5 and creates controller groups and usage groups, Amazon load balancer deployment nodes, and GPU task groups.

Endpoint environment variables for configuring using Amazon Karpenter

 export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"

Swipe left to see more

Whether the node status is ready to view the status:

2. Install Amazon Karpenter

At the time of blog writing, the Karpenter version is the latest version 6.3. The installation process for deploying the Amazon Karpenter version is organized here. Readers can follow the installation process guide on the Amazon Karpenter official website as needed:

First, you need to create a configuration file and configure the corresponding permissions, so that the instance launched by Amazon Karpenter can have possible permissions to perform network configuration and image pull actions:

 TEMPOUT=$(mktemp)
curl -fsSL https://karpenter.sh/v0.6.3/getting-started/cloudformation.yaml > $TEMPOUT \
&& aws cloudformation deploy \
 --stack-name "Karpenter-${CLUSTER_NAME}" \
 --template-file "${TEMPOUT}" \
 --capabilities CAPABILITY_NAMED_IAM \
 --parameter-overrides "ClusterName=${CLUSTER_NAME}"

Swipe left to see more

Have permission to configure access permissions so that instances created by Amazon Karpenter can have permission to connect to the EKS cluster:

 eksctl create iamidentitymapping \
 --username system:node:{{EC2PrivateDNSName}} \
 --cluster "${CLUSTER_NAME}" \
 --arn "arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
 --group system:bootstrappers \
 --group system:nodes

Swipe left to see more

Create an IAM role and service account for the Karpenter controller:

 eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} –approve
eksctl create iamserviceaccount \
 --cluster "${CLUSTER_NAME}" --name karpenter --namespace karpenter \
 --role-name "${CLUSTER_NAME}-karpenter" \
 --attach-policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}" \
 --role-only \
 --approve

Swipe left to see more

Configure environment variables for use by installing Amazon Karpenter:

 export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"

Swipe left to see more

Finally, install karpenter through Helm:

 helm repo add karpenter https://charts.karpenter.sh/
helm repo update
helm upgrade --install --namespace karpenter --create-namespace \
 karpenter karpenter/karpenter \
 --version v0.6.3 \
 --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
 --set clusterName=${CLUSTER_NAME} \
 --set clusterEndpoint=${CLUSTER_ENDPOINT} \
 --set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
 --set logLevel=debug \
 --wait

Swipe left to see more

Check if Karenter Controller is up and running:

3. Configure Karpenter Provisioner

When Kubernetes needs to have Unscheduleable Pods in it, Amazon Karpenter uses the visioner to determine the EC2 instance and size it creates. The time Amazon Karpenter is installed defines a custom resource named Provisioner. Because Karpenter Provisioner handles multiple Pods, Amazon Karpent scheduler groups based on different attributes such as Pods and provisioning points, use Amazon Karpent to additionally manage multiple nodes.

Create Provisioner's configuration file:

 cat << EOF > provisioner.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
 name: gpu
spec:
 requirements:
   - key: karpenter.sh/capacity-type
     operator: In
     values: ["on-demand"] 
   - key: node.kubernetes.io/instance-type
     operator: In
     values: ["g4dn.xlarge", "g4dn.2xlarge"]
 taints:
   - key: nvidia.com/gpu
     effect: "NoSchedule"
 limits:
   resources:
     gpu: 100
 provider:
   subnetSelector:
     karpenter.sh/discovery: karpenter-cluster  
   securityGroupSelector:
     kubernetes.io/cluster/karpenter-cluster: owned  
 ttlSecondsAfterEmpty: 30
EOF

Swipe left to see more

In this configuration file, we specify the purchase option and instance type we create at 规范中 . We specified as on-demand in spec.requirements Lieutenant General karpenter.sh/capacity-type . At the same time, since this is a machine learning inference application, we also define the instance type here as G4, and we choose g4dn.xlarge and g4dn .2x large size. In addition, we have also set taints nvidia.com/gpu you can see that Pod 2 may not be able to deploy the inference of these Pods, and an instance of an instance needs to be provided in this Provisioner. A GPU instance will be created.

Alternatively, we can set our gpu settings in this environment through the resource configurator deployed by this environment. You can also create tabs for the 100 subnet selector and security group selector to find out how Povisioner sets up new instances. VPC subnets and security groups.

In Emp tttSecondsAfterEmpty to tell the Provisioner to finally clean up the spare idle instance 30 of Amazon2, through this when the created Amazon EC2 instance is removed from the configuration 30 seconds later, there is no cost when the EC is running, and the Provisioner will ask the EC to run without any cost resources and proceed directly Recycling, that is, terminating the EC2 instance.

For detailed configuration of Provisioners, please refer to the documentation on the official website.

We use this configuration file to create an authorized Provisioner:

kubectl apply -f provisioner.yaml

4. Install Amazon Loadbalancer Controller

In this demonstration, we will use the Amazon Loadbalancer Controller to automatically create NLB for the service. You can refer to the Amazon Cloud Technology official website for detailed installation steps. Here is a brief record of the installation process of version 2.4.0:

 curl -o iam_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.0/docs/install/iam_policy.json

 aws iam create-policy \
 --policy-name AWSLoadBalancerControllerIAMPolicy \
 --policy-document file://iam_policy.json

 eksctl create iamserviceaccount \
 --cluster=${CLUSTER_NAME} \
 --namespace=kube-system \
 --name=aws-load-balancer-controller \
 --attach-policy-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:policy/AWSLoadBalancerControllerIAMPolicy \
 --override-existing-serviceaccounts \
 --approve

 helm repo add eks https://aws.github.io/eks-charts
 helm repo update
 helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
 -n kube-system \
 --set clusterName=${CLUSTER_NAME} \
 --set serviceAccount.create=false \
 --set serviceAccount.name=aws-load-balancer-controller

Swipe left to see more

5. Deploy machine learning inference applications

This example will deploy a learning inference application we use as an example. This example is from a vGPU deployment on an EKS. This example will deploy a ResNet-based blog image inference service and publish a service address through load balancing. A client written in Python will upload to this inference service to obtain inference results. Here, the image configuration is performed, instead of discussing the GPU of the vGPU, and the inference service directly uses the GPU of the G4 instance.

Here is the modified inference service's deployment/service's deployment file:

 cat << EOF > resnet.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
 name: resnet
spec:
 replicas: 1
 selector:
   matchLabels:
     app: resnet-server
 template:
   metadata:
     labels:
       app: resnet-server
   spec:
     # hostIPC is required for MPS communication
     hostIPC: true
     containers:
     - name: resnet-container
       image: seedjeffwan/tensorflow-serving-gpu:resnet
       args:
       - --per_process_gpu_memory_fraction=0.2
       env:
       - name: MODEL_NAME
         value: resnet
       ports:
       - containerPort: 8501
       # Use gpu resource here
       resources:
         requests:
           nvidia.com/gpu: 1
         limits:
           nvidia.com/gpu: 1
       volumeMounts:
       - name: nvidia-mps
         mountPath: /tmp/nvidia-mps
     volumes:
     - name: nvidia-mps
       hostPath:
         path: /tmp/nvidia-mps
     tolerations:
     - key: nvidia.com/gpu
       effect: "NoSchedule"
---
apiVersion: v1
kind: Service
metadata:
 name: resnet-service
 annotations:
   service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
   service.beta.kubernetes.io/aws-load-balancer-type: external
   service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
spec:
 type: LoadBalancer
 selector:
   app: resnet-server
 ports:
 - port: 8501
   targetPort: 8501
EOF

Swipe left to see more

It can be seen that the GPU resources of the inference service are set in the deployment, and we need to create instance 4 based on this information for the GPU of the tainted nvidia.com/gpu. In addition, the corresponding annotations are also defined in the service to use the Amazon Load Balancer Controller to automatically create a Network Load Balancer accessible to the public network.

After the deployment is complete, you can check whether the corresponding resources are running normally:

Check the auto-generated NLB, the address can be accessed by the client:

View GPU container status through kubectl exec -it <podname> -- nvidia-smi

We deploy a code client, the following is the corresponding Python output, saved as: resnet_client.py

 from __future__ import print_function

import base64
import requests
import sys

assert (len(sys.argv) == 2), "Usage: resnet_client.py SERVER_URL"
# The server URL specifies the endpoint of your server running the ResNet
# model with the name "resnet" and using the predict interface.
SERVER_URL = f'http://{sys.argv[1]}:8501/v1/models/resnet:predict'
# The image URL is the location of the image we should send to the server
IMAGE_URL = 'https://tensorflow.org/images/blogs/serving/cat.jpg'

def main():
 # Download the image
 dl_request = requests.get(IMAGE_URL, stream=True)
 dl_request.raise_for_status()

 # Compose a JSON Predict request (send JPEG image in base64).
 jpeg_bytes = base64.b64encode(dl_request.content).decode('utf-8')
 predict_request = '{"instances" : [{"b64": "%s"}]}' % jpeg_bytes

 # Send few requests to warm-up the model.
 for _ in range(3):
   response = requests.post(SERVER_URL, data=predict_request)
   response.raise_for_status()

 # Send few actual requests and report average latency.
 total_time = 0
 num_requests = 10
 for _ in range(num_requests):
   response = requests.post(SERVER_URL, data=predict_request)
   response.raise_for_status()
   total_time += response.elapsed.total_seconds()
   prediction = response.json()['predictions'][0]

 print('Prediction class: {}, avg latency: {} ms'.format(
     prediction['classes'], (total_time*1000)/num_requests))

if __name__ == '__main__':
 main()

Swipe left to see more

Whether the output result of the Python client image is obtained through the running parameters to the automatic execution of the server, download the example and upload it to the inference, and finally the inference.

 py $(kubectl get svc resnet-service -o=jsonpath='{.status.loadBalancer.ingress[0].hostname}')

You can see the inference results and the corresponding speech:

6. Karpenter node expansion and contraction capacity test

Karpenter Controller will schedule the information of unschedulable Pods in the background, and arrange Pod requirements and different tags to continuously monitor/Taints and other information, and generate suitable instance types and quantities. , the user does not need to match the instance type of the group nodes in advance, and Karpenter automatically determines the type and number of nodes according to the required resources, which is more flexible. Therefore, planning also needs to reduce additional cluster AutoScaler components for node scaling.

We add Pods to trigger node scaling:

kubectl 规模部署 resnet --replicas 6

Through the query log of Karpenter Controller, we can view the specific operation logic during expansion:

 kubectl 日志 -f -n karpenter -l app.kubernetes.io/name=karpenter -c 控制器

The demand status of each pod, so 5 pods are not creating GPUs on the G4 instance and are waiting for Amazon. At this time, Karpenter summarizes the resource requirements of the five Pods, and judges that the requirements or state (or multiple) types of instances used are suitable. Combined with our previous Provisioner configuration, Amazon Karpenter will automatically create additional 5g4.xlarge on-demand instances to run these 5 Pods. For on-demand instances, Provisioner will select the instances that can meet the demand with the lowest price automatically.

The way AutoScaler is created will immediately show the Pod, and then start binding and pulling on the node. When ready, the Kubernetes Scheduler state will schedule the node and start creating activities, Amazon Karpenter will be more efficient.

Delete all Pods and watch how Amazon Karpenter handles node scaling scenarios:

kubectl 规模部署 resnet --replicas 0

As can be seen from the previous configuration, after the configuration of our Karpenter, Amazon will automatically eliminate the previous exception.

summary

In this blog, we learned about a machine-managed application deployer, so how Karpenter was introduced in EKS to do different types of elastic scaling using Amazon. Amazon Karpenter is a completely open source component. Currently, it supports a node expansion management for EKS and user-built Kubernetes clusters on Amazon Cloud. Through its open cloud provider plug-in function, it can support non-Amazon cloud technology clouds.

Learn more about how managing with Amazon Karpenter entities will give us a live discussion on the blog you follow

author of this article

Qiu Meng

Amazon Cloud Technology Solution Architect

Responsible for consulting and design based on Amazon cloud technology solution architecture, and promoting Amazon cloud technology platform technology and solutions. Before joining Amazon Cloud Technology, he worked in the enterprise cloud, Internet entertainment, media and other industries to explore business, and has a deep understanding of public cloud and architecture business.

Lin Jun

Amazon Cloud Technology Solution Architect

Mainly responsible for solution consulting and architecture design optimization of enterprise customers.

Kubernetes Node Elastic Scaling Practice Component Amazon Karpenter: Deploying GPU Inference Applications

turn out to be

Architecture description

Deploy and test

1. Create EKS GPU Inference Association

2. Install Amazon Karpenter

3. Configure Karpenter Provisioner

4. Install Amazon Loadbalancer Controller

5. Deploy machine learning inference applications

6. Karpenter node expansion and contraction capacity test

summary

亚马逊云开发者

引用和评论

准确率从 19% 提升至 95%！文本审核模型优化的三个阶段实践（下）

Jenkins 企业级 CI/CD 实践：安装、配置与 Kubernetes & Docker 集成

k8s集群部署（一主两从）

k8s实战基础

使用kubeadm部署高可用IPV4/IPV6集群---V1.32

基于k3s部署Nginx、MySQL、PHP和Redis的详细教程

k8s之yaml详解