Karpenter : A new generation of Kubernetes auto scaling tools

During the Amazon Cloud Technology 2021 re:Invent conference, Amazon Cloud Technology announced that Karpenter, an open source auto-scaling project built for Kubernetes, has been upgraded to version 0.5 and GA, which can be used in production environments. Auto scaling on Kubernetes has always been a concern, and current kubernetes provides Cluster Autocaler for making scheduling and scaling decisions. So, now that you have the Kubernetes Cluster Autoscaler, why do you have to reinvent the wheel to develop new auto scaling tools? This article will introduce the role and principle of Karpenter, compare the familiar Cluster Autoscaler, and demonstrate the basic operation of Karpenter.

What is Karpenter?

Karpenter is an open source autoscaling project built for Kubernetes. It increases the availability of Kubernetes applications without manual or over-provisioning of computing resources. Karpenter is designed to minimize scheduling delays by observing aggregated resource requests from unschedulable Pods and making decisions to start and terminate nodes, providing suitable compute resources to satisfy in seconds (rather than minutes) the needs of your application.

Karpenter and Cluster Autocaler

Tools that can implement automatic cluster scaling on Kubernetes include Cluster Autoscaler, Escalator, and Cerebral. Unfortunately, some tools have been neglected or discontinued development. Cluster Autoscaler is now the preferred cluster autoscaler component, with common benefits including: enabling the cluster to automatically scale as demand grows; reducing infrastructure costs by effectively consolidating and utilising resources and terminating unnecessary nodes; its development It is neutral and supports all mainstream public cloud vendors; it is widely used and has passed the test of actual combat; it supports clusters of about 1,000 nodes.

Let's take a look at how the Cluster Autoscaler works, as shown in the figure below: When a Pod is pending due to insufficient resources, the Cluster Autoscaler's expansion mechanism will be triggered. The desired number of instances, thereby deploying a new Amazon EC2 instance to join the node group through Auto Scaling Group, as a provisioning new node, the pod in the pending state will be scheduled to the new node for deployment.

There are some problems and caveats in the use of Cluster Autoscaler. The automatic scaling of the node group by the Cluster Autoscaler depends on the launch template and the Auto Scaling group, so the maximum and minimum values of the Auto Scaling group will also limit the maximum and minimum number of nodes in the node group. Sometimes the Cluster Autoscaler needs to create a node group for each instance type or separate zone of the Availability Zone in order to perform some specified scaling operations. Failure to create a node in a node group is not handled immediately because the Cluster Autoscaler's mechanism for handling errors is based on timeouts. In the official performance test report of Cluster Autocaler, 1000 nodes were used in the test, and each node had 30 pods. There is no official test feedback for more than this scale. The operation of Cluster Autocaler is also more complicated, with 78 command line parameters. And the user cannot customize the scheduler. Based on the above problems, Zalando made modifications based on Cluster Autocaler, and forked a branch in github. They improved the node processing code, supported Auto Scaling Group of multiple instance types, and used a more reliable backoff logic. But these improvements may not be enough, can we have a scaling tool that is simpler, scales faster, and supports larger clusters? This is Karpenter.

Karpenter does away with the concept of node groups, which is fundamentally different from the Cluster Autoscaler, and node groups are usually one of the reasons for lower efficiency. Karpenter provides computing resources directly, and dynamically computes what size Amazon EC2 instance types are required for pods as nodes. From Cluster Autocaler's static templates to Karpenter's dynamically generated templates, there is no need to create node groups to determine various attributes of the instance, thereby reducing configuration complexity. The API load of the Cloud Provider will also be greatly reduced. In Cluster Autocaler, the Auto Scaling group will always request the Cloud Provider to confirm the status. After the cluster is large, it is likely to encounter the API call limit, causing the entire system to stop responding. While Karpenter only calls the API when creating and deleting capacity, this design can support higher API throughput. Without node groups, the complexity of scheduling is also reduced, because different node groups of the Cluster Autoscaler have different attributes, and it is necessary to consider which node group the pod is scheduled to.

Let's take a look at the working process of Karpenter, as shown in the following figure: When there is a pod in the pending state, if there is still capacity, it will be scheduled by the kube scheduler. If there is no capacity and the pod cannot be scheduled, Karpenter bypasses the kube scheduler and binds the pod to the newly created preconfigured node.

There are many ways to launch Amazon EC2 instances, Cluster Autoscaler uses Auto Scaling groups, Karpenter uses Amazon EC2 Fleets, both of which can launch Amaozn EC2 instances. The reason why Amaozn EC2 Fleets is chosen is because it is more powerful and flexible. For example, when deciding to create nodes for a group of pods, it is not limited by the availability zone and instance type. We can specify the instance type and availability in Cloud Providers. Attributes such as zones, constrain Karpenter, and Karpenter can find the most suitable instance type in the specified attributes to deploy pods. Amazon EC2 fleet also chooses the cheapest instance type among the specified instances. Also, it takes about 30 seconds to spin up a node using an Auto Scaling group, much less than 30 seconds with Amazon EC2 Fleets instead.

Scheduling is also very important. Karpenter also optimizes the scheduling. Once the capacity expansion decision is made, the request to create an instance will be obtained immediately, and the instance ID will be immediately obtained. node. This enforces a scheduling decision outside of the kube scheduler. There are two benefits to this, first, it reduces latency by around 5 seconds for pod deployments, and second, it doesn't have to match the version between Karpenter and kube scheduler.

Through nodeSelector, we can use kubernetes Well-Known Label: https://kubernetes.io/docs/reference/labels-annotations-taints/ to specify properties to start the instance, the properties that can be specified include availability zone, instance type, capacity Type, CPU architecture, OS, etc.

How to choose the right Amazon EC2 instance for a pod, a Bin Packing problem. Karpenter adopts the First Fit Descending algorithm. We sort the pods from largest to smallest, and first adapt the largest pod to the instance. If it fails, change to a smaller pod. In this process, the instances are getting smaller and smaller until the The smallest pod finds a suitable instance. The advantage of this is that large pods often leave some gaps on the instance, which can be filled in by smaller pods later, which can make more efficient use of resources. For the priority of pod sorting, it can be by CPU, memory or euclidean.

Karpenter is not only responsible for adding nodes, it is also responsible for terminating nodes. There is a controller dedicated to terminating nodes. By default, if a node has no pods within 5 minutes, Karpenter will terminate it. Also, when an Amazon EC2 instance is in an unhealthy state for some reason or a spot instance is about to be reclaimed, it will send an event, and Karpenter will respond to these events by creating a new node to redeploy the pod on it. In addition, Karpenter can also set a TTL value for the node. For example, configure the node life cycle to be 90 days. This function is very useful during upgrades to ensure that the node has been upgraded in a rolling manner.

Karpenter has also optimized the node startup process. Before, there were a lot of steps in node startup. Generally, it takes about 2 minutes for a node to start in the cloud. In fact, this is caused by over-configuration. Now this delay has been reduced to 15 to 50 by Karpenter. second.

Karpenter Getting Started Guide

Karpenter automatically configures new nodes in response to unschedulable pods. Karpenter does this by monitoring events within the Kubernetes cluster and then sending commands to the underlying cloud providers. In this example, the cluster is running on Amazon Cloud's Elastic Kubernetes Service (EKS). Karpenter is designed to be cloud provider agnostic, but currently only supports Amazon Cloud Technologies cloud providers. Welcome to join https://github.com/awslabs/karpenter to develop other versions of cloud providers. It takes less than 1 hour to complete this guide and costs less than $0.25. Remember to clean up and delete resources according to the method at the end of the article.

Getting Started Environment Preparation

We will use four tools in this experiment:

1. Amazon Cloud Technology CLI, you can refer to the following link for installation: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html , after the installation is complete, perform AKSK configuration, and enter the region us-west-2 .

2.Kubectl, refer to the following link to install:
https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/

3. Eksctl, refer to the following link to install:
https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html

4.Helm, refer to the following link to install:
https://helm.sh/docs/intro/install/

After installing the necessary tools, run the shell to set the following environment variables:

export CLUSTER_NAME=$USER-karpenter-demo
export AWS_DEFAULT_REGION=us-west-2
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

*Swipe left to see more

Use eksctl to create a cluster. This example configuration file specifies a basic cluster with one initial node and sets up an IAM OIDC provider for the cluster, which is used in the next steps to set up IAM Roles for Service Accounts(IRSA):

cat <<EOF > cluster.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "1.20"
managedNodeGroups:
  - instanceType: m5.large
    amiFamily: AmazonLinux2
    name: ${CLUSTER_NAME}-ng
    desiredCapacity: 1
    minSize: 1
    maxSize: 10
iam:
  withOIDC: true
EOF
eksctl create cluster -f cluster.yaml

*Swipe left to see more

In the experiment, we used the nodes of managed Node Groups in Amazon EKS to deploy Karpenter. Karpenter can also run on a self-built node, fargate.

Karpenter discovers that the subnet needs to have the specified tag: kubernetes.io/cluster/$CLUSTER_NAME . Add this tag to the associated subnet configured for your cluster.

SUBNET_IDS=$(aws cloudformation describe-stacks \
    --stack-name eksctl-${CLUSTER_NAME}-cluster \
    --query 'Stacks[].Outputs[?OutputKey==`SubnetsPrivate`].OutputValue' \
    --output text)
aws ec2 create-tags \
    --resources $(echo $SUBNET_IDS | tr ',' '\n') \
    --tags Key="kubernetes.io/cluster/${CLUSTER_NAME}",Value=

*Swipe left to see more

Amazon EC2 instances launched by Karpenter must be run with an InstanceProfile that grants the required permissions to run containers and configure networking. Karpenter discovers the InstanceProfile with the name KarpenterNodeRole-${ClusterName}.

First, create an IAM resource using Amazon CloudFormation.

TEMPOUT=$(mktemp)
curl -fsSL https://karpenter.sh/docs/getting-started/cloudformation.yaml > $TEMPOUT \
&& aws cloudformation deploy \
  --stack-name Karpenter-${CLUSTER_NAME} \
  --template-file ${TEMPOUT} \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides ClusterName=${CLUSTER_NAME}

*Swipe left to see more

Second, use the configuration file to grant the Amazon EC2 instance access to connect to the cluster. This command adds the Karpenter node role to your aws-auth configuration map, allowing nodes with this role to connect to the cluster.

eksctl create iamidentitymapping \
  --username system:node:{{EC2PrivateDNSName}} \
  --cluster  ${CLUSTER_NAME} \
  --arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME} \
  --group system:bootstrappers \
  --group system:nodes

*Swipe left to see more

Karpenter itself also needs permission to launch Amazon EC2 instances. Like Cluster Autoscaler, we implement it through IAM Roles for Service Accounts (IRSA), and configure it with the following command:

eksctl create iamserviceaccount \
  --cluster $CLUSTER_NAME --name karpenter --namespace karpenter \
  --attach-policy-arn arn:aws:iam::$AWS_ACCOUNT_ID:policy/KarpenterControllerPolicy-$CLUSTER_NAME \
  --approve

*Swipe left to see more

If you have not run an Amazon EC2 spot instance before, please run the following command. If you have run a spot instance before, the following command will report an error, please ignore it.

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

*Swipe left to see more

Karpenter is packaged with Helm, we need to use Helm to install it:

helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm upgrade --install karpenter karpenter/karpenter --namespace karpenter \
  --create-namespace --set serviceAccount.create=false --version 0.4.1 \
  --wait # for the defaulting webhook to install before creating a Provisioner

*Swipe left to see more

Open the debug log:

kubectl patch configmap config-logging -n karpenter --patch '{"data":{"loglevel.controller":"debug"}}'

*Swipe left to see more

Karpenter's Provisioner configuration

Karpenter Provisioners is a Kubernetes custom resource (CustomResourceDefinitions) that enables customers to configure Karpenter's constraints in their clusters, such as instance types, availability zones, etc. Karpenter Provisioner comes with a set of global defaults that will be overridden if you define your own provisioner. There can also be multiple Karpenter Provisioners in a cluster. By default, the pod will use the rules defined by the Provisioner named default. If you created a second Provisioner, use the node selector to specify karpenter.sh/provisioner-name:alternative-provisioner. At the same time, using the default Provisioner also requires explicitly specifying karpenter.sh/provisioner-name using the node selector.

Here is an example of a Provisioner:

cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: node.k8s.aws/capacity-type
      operator: In
      values: ["spot"]
  provider:
    instanceProfile: KarpenterNodeInstanceProfile-${CLUSTER_NAME}
    cluster:
      name: ${CLUSTER_NAME}
      endpoint: $(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output json)
  ttlSecondsAfterEmpty: 30
EOF

*Swipe left to see more

It can be seen that in the example only the spot limit is made to the capacity-type, and the ttlSecondsAfterEmpty is specified as 30. Using this Provisioner will only create spot type instances and will close 30 seconds after the instance is empty. For more Provisioner configuration items, please refer to: https://karpenter.sh/docs/provisioner-crd/

Karpenter's automatic expansion and shrinkage

We use the pause image to create a deployment with a replicas of 0.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
EOF

*Swipe left to see more

Then set replicas to 5 and observe the karpenter log:

kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)

*Swipe left to see more

According to the karpenter log output below, we can see that at 16:13:32, karpenter found that 5 pods were created. We saw that at 16:13:37, the node startup and the pod binding to the node were completed, and the whole process only took 5 seconds. We see that 5 vcpus are required for pod scheduling this time, and karpenter automatically selects the example of c5.2xlagre, which reflects karpenter's design concept of reducing complexity and pursuing speed, and will combine multiple pod scheduling on one node.

2021-10-31T16:13:30.536Z        INFO    controller.allocation.provisioner/default       Starting provisioning loop      {"commit": "c902206"}
2021-10-31T16:13:30.536Z        INFO    controller.allocation.provisioner/default       Waiting to batch additional pods        {"commit": "c902206"}
2021-10-31T16:13:32.050Z        INFO    controller.allocation.provisioner/default       Found 5 provisionable pods      {"commit": "c902206"}
2021-10-31T16:13:32.932Z        DEBUG   controller.allocation.provisioner/default       Discovered 318 EC2 instance types       {"commit": "c902206"}
2021-10-31T16:13:32.933Z        DEBUG   controller.allocation.provisioner/default       Excluding instance type t4g.nano because there are not enough resources for kubelet and system overhead     {"commit": "c902206"}
2021-10-31T16:13:32.935Z        DEBUG   controller.allocation.provisioner/default       Excluding instance type t3.nano because there are not enough resources for kubelet and system overhead      {"commit": "c902206"}
2021-10-31T16:13:32.939Z        DEBUG   controller.allocation.provisioner/default       Excluding instance type t3a.nano because there are not enough resources for kubelet and system overhead     {"commit": "c902206"}
2021-10-31T16:13:32.968Z        INFO    controller.allocation.provisioner/default       Computed packing for 5 pod(s) with instance type option(s) [c1.xlarge c3.2xlarge c4.2xlarge c6i.2xlarge c5a.2xlarge c5d.2xlarge c6g.2xlarge c5ad.2xlarge c6gd.2xlarge a1.2xlarge c6gn.2xlarge c5.2xlarge c5n.2xlarge m3.2xlarge m6g.2xlarge m4.2xlarge m5zn.2xlarge m5dn.2xlarge m5n.2xlarge m5d.2xlarge]       {"commit": "c902206"}
2021-10-31T16:13:33.146Z        DEBUG   controller.allocation.provisioner/default       Discovered subnets: [subnet-0a538ed8c05288206 subnet-07a9d3f4dbc92164c subnet-0b14f140baa9a38cb]    {"commit": "c902206"}
2021-10-31T16:13:33.262Z        DEBUG   controller.allocation.provisioner/default       Discovered security groups: [sg-0afb56113d9feb2e8]      {"commit": "c902206"}
2021-10-31T16:13:33.265Z        DEBUG   controller.allocation.provisioner/default       Discovered kubernetes version 1.20      {"commit": "c902206"}
2021-10-31T16:13:33.317Z        DEBUG   controller.allocation.provisioner/default       Discovered ami ami-0a69abe3cea2499b7 for query /aws/service/eks/optimized-ami/1.20/amazon-linux-2-arm64/recommended/image_id        {"commit": "c902206"}
2021-10-31T16:13:33.365Z        DEBUG   controller.allocation.provisioner/default       Discovered ami ami-088105bab8bfa2db6 for query /aws/service/eks/optimized-ami/1.20/amazon-linux-2/recommended/image_id      {"commit": "c902206"}
2021-10-31T16:13:33.365Z        DEBUG   controller.allocation.provisioner/default       Discovered caBundle, length 1066        {"commit": "c902206"}
2021-10-31T16:13:33.506Z        DEBUG   controller.allocation.provisioner/default       Created launch template, Karpenter-karpenter-demo-16982985708254790476      {"commit": "c902206"}
2021-10-31T16:13:33.507Z        DEBUG   controller.allocation.provisioner/default       Discovered caBundle, length 1066        {"commit": "c902206"}
2021-10-31T16:13:33.640Z        DEBUG   controller.allocation.provisioner/default       Created launch template, Karpenter-karpenter-demo-11290710479729449633      {"commit": "c902206"}
2021-10-31T16:13:36.898Z        INFO    controller.allocation.provisioner/default       Launched instance: i-0f38cb0ade09a537c, hostname: ip-192-168-132-54.us-west-2.compute.internal, type: c5.2xlarge, zone: us-west-2a, capacityType: spot      {"commit": "c902206"}
2021-10-31T16:13:37.050Z        INFO    controller.allocation.provisioner/default       Bound 5 pod(s) to node ip-192-168-132-54.us-west-2.compute.internal{"commit": "c902206"}
2021-10-31T16:13:37.050Z        INFO    controller.allocation.provisioner/default       Starting provisioning loop      {"commit": "c902206"}
2021-10-31T16:13:37.050Z        INFO    controller.allocation.provisioner/default       Waiting to batch additional pods        {"commit": "c902206"}
2021-10-31T16:13:38.050Z        INFO    controller.allocation.provisioner/default       Found 0 provisionable pods      {"commit": "c902206"}

*Swipe left to see more

Let's select a pod to see how long it took to create:

kubectl get pod <pod_name> -oyaml

*Swipe left to see more

As can be seen from the log below, the pod is scheduled at 16:13:36, and the pod is ready at 16:14:45. The whole process is 1 minute and 9 seconds. Considering that there are 5 pods, and this period is the time when the ec2 startup template is created, the ec2 instance is started and added to the cluster node and the pod is deployed to the node, the whole process is still very fast. If you want to speed up this process, you can consider using placeholder pods for Over-Provisioning.

……
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-10-31T16:14:17Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-10-31T16:14:45Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-10-31T16:14:45Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-10-31T16:13:36Z"
    status: "True"
type: PodScheduled
……

*Swipe left to see more

We open the Amazon EC2 Dashboard and check the spot requests. We can see that karpenter is using the spot fleet. Currently, we are using spot instances. If we use karpenter to start on demand instances, we can use the aws cli command aws ec2 describe-fleets to check.

Let's delete the Deployment we just created and observe the karpenter log:

kubectl delete deployment inflate
kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)

*Swipe left to see more

From the log, we can see that an empty instance was found at 16:40:16, and the ttlSecondsAfterEmpty was set to 30 in the Provisioner, and the instance was terminated after 30 seconds.

2021-10-31T16:13:39.549Z        INFO    controller.allocation.provisioner/default       Watching for pod events {"commit": "c902206"}
2021-10-31T16:40:16.040Z        INFO    controller.Node Added TTL to empty node ip-192-168-132-54.us-west-2.compute.internal    {"commit": "c902206"}
2021-10-31T16:40:46.059Z        INFO    controller.Node Triggering termination after 30s for empty node ip-192-168-132-54.us-west-2.compute.internal    {"commit": "c902206"}
2021-10-31T16:40:46.103Z        INFO    controller.Termination  Cordoned node ip-192-168-132-54.us-west-2.compute.internal      {"commit": "c902206"}
2021-10-31T16:40:46.290Z        INFO    controller.Termination  Deleted node ip-192-168-132-54.us-west-2.compute.internal       {"commit": "c902206"}

*Swipe left to see more

In addition to using the provisioner to automatically select the extension node type, we can also use the nodeSelector in the pod to specify the Well-Known Labs startup node. The following is an example of a Deployment:

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
      nodeSelector:
        node.kubernetes.io/instance-type: c5.xlarge
EOF

*Swipe left to see more

Looking at the karpenter log, we already have the launch template created by the first extension, so it only took 4 seconds from discovering pod provisionable to creating instance binding pod. But unlike the first time, this time we used nodeSelector to specify the use of c5.xlarge instances during Deployment, so karpenter created 2 c5.xlarge instances to deploy pods instead of the first c5.2xlarge instance.

kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)
……
2021-10-31T17:13:28.459Z        INFO    controller.allocation.provisioner/default       Waiting to batch additional pods        {"commit": "c902206"}
2021-10-31T17:13:29.549Z        INFO    controller.allocation.provisioner/default       Found 5 provisionable pods      {"commit": "c902206"}
2021-10-31T17:13:30.648Z        DEBUG   controller.allocation.provisioner/default       Discovered 318 EC2 instance types       {"commit": "c902206"}
2021-10-31T17:13:30.661Z        INFO    controller.allocation.provisioner/default       Computed packing for 3 pod(s) with instance type option(s) [c5.xlarge]      {"commit": "c902206"}
2021-10-31T17:13:30.675Z        INFO    controller.allocation.provisioner/default       Incremented node count to 2 on packing for 2 pod(s) with instance type option(s) [c5.xlarge]        {"commit": "c902206"}
2021-10-31T17:13:30.860Z        DEBUG   controller.allocation.provisioner/default       Discovered subnets: [subnet-0a538ed8c05288206 subnet-07a9d3f4dbc92164c subnet-0b14f140baa9a38cb]    {"commit": "c902206"}
2021-10-31T17:13:30.951Z        DEBUG   controller.allocation.provisioner/default       Discovered security groups: [sg-0afb56113d9feb2e8]      {"commit": "c902206"}
2021-10-31T17:13:30.955Z        DEBUG   controller.allocation.provisioner/default       Discovered kubernetes version 1.20      {"commit": "c902206"}
2021-10-31T17:13:31.016Z        DEBUG   controller.allocation.provisioner/default       Discovered ami ami-088105bab8bfa2db6 for query /aws/service/eks/optimized-ami/1.20/amazon-linux-2/recommended/image_id      {"commit": "c902206"}
2021-10-31T17:13:31.016Z        DEBUG   controller.allocation.provisioner/default       Discovered caBundle, length 1066        {"commit": "c902206"}
2021-10-31T17:13:31.052Z        DEBUG   controller.allocation.provisioner/default       Discovered launch template Karpenter-karpenter-demo-11290710479729449633    {"commit": "c902206"}
2021-10-31T17:13:33.150Z        INFO    controller.allocation.provisioner/default       Launched instance: i-04604513375c3dc3a, hostname: ip-192-168-156-86.us-west-2.compute.internal, type: c5.xlarge, zone: us-west-2a, capacityType: spot       {"commit": "c902206"}
2021-10-31T17:13:33.150Z        INFO    controller.allocation.provisioner/default       Launched instance: i-0e058845370c428ec, hostname: ip-192-168-154-221.us-west-2.compute.internal, type: c5.xlarge, zone: us-west-2a, capacityType: spot      {"commit": "c902206"}
2021-10-31T17:13:33.207Z        INFO    controller.allocation.provisioner/default       Bound 3 pod(s) to node ip-192-168-156-86.us-west-2.compute.internal{"commit": "c902206"}
2021-10-31T17:13:33.233Z        INFO    controller.allocation.provisioner/default       Bound 2 pod(s) to node ip-192-168-154-221.us-west-2.compute.internal{"commit": "c902206"}
……

*Swipe left to see more

Open the Amazon EC2 Dashboard, check the spot request, you can see two spot fleets, and you can see that even if the instances of the same model, karpenter will create fleets separately for fast scheduling.

delete the experimental environment

Execute the following command to delete the experimental environment to avoid extra charges.

helm uninstall karpenter --namespace karpenter
eksctl delete iamserviceaccount --cluster ${CLUSTER_NAME} --name karpenter --namespace karpenter
aws cloudformation delete-stack --stack-name Karpenter-${CLUSTER_NAME}
aws ec2 describe-launch-templates \
    | jq -r ".LaunchTemplates[].LaunchTemplateName" \
    | grep -i Karpenter-${CLUSTER_NAME} \
    | xargs -I{} aws ec2 delete-launch-template --launch-template-name {}
eksctl delete cluster --name ${CLUSTER_NAME}

*Swipe left to see more

Summary

As a new Kubernetes auto scaling tool, Karpenter has the advantage of being faster and more flexible. Better support for large-scale Kubernetes clusters. While having these advantages, it also greatly reduces the workload of operation and maintenance, making auto scaling more automatic.

Author of this article

Xia

Amazon Cloud Technology Solutions Architect

Currently focusing on containerized solutions. Before joining Amazon Cloud Technology, he worked for HP, IBM and other technology companies, engaged in data center infrastructure related work, and has more than ten years of technical service experience.

Karpenter : A new generation of Kubernetes auto scaling tools

What is Karpenter?

Karpenter and Cluster Autocaler

Karpenter Getting Started Guide

Getting Started Environment Preparation

Karpenter's Provisioner configuration

Karpenter's automatic expansion and shrinkage

delete the experimental environment

Summary

亚马逊云开发者

引用和评论

Amazon Bedrock 助力 SolveX.AI 构建智能解题 Agent，打造头部教育科技应用

🧩x tping (1) - 无需安装 tcping，轻松实现 TCP 端口 ping 测试与图形化展示

镜舟科技亮相 2025 中国移动云智算大会，展示数据湖仓一体创新方案

IPv6 支持度检测有意义吗？