Hi everyone, this is Zhang Jintao.
A small partner in the group asked me the question in the picture above, how to measure the time of the rolling upgrade process.
This problem can be abstracted as a general requirement, suitable for a variety of scenarios.
- For example, you are an administrator of a Kubernetes cluster, and you want to measure the time-consuming process in order to find optimization points;
- For example, you are doing CI/CD, and you want to give the time consumption of the CI/CD process by measuring the time consumption of this process;
Existing plan
Kubernetes has provided a very convenient way to solve this problem, which is what I talked about in my reply, which can be measured by event.
For example, we create a deployment in K8S and look at the event information in the process:
➜ ~ kubectl create ns moelove
namespace/moelove created
➜ ~ kubectl -n moelove create deployment redis --image=ghcr.io/moelove/redis:alpine
deployment.apps/redis created
➜ ~ kubectl -n moelove get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
redis 1/1 1 1 16s
➜ ~ kubectl -n moelove get events
LAST SEEN TYPE REASON OBJECT MESSAGE
27s Normal Scheduled pod/redis-687967dbc5-gsz5n Successfully assigned moelove/redis-687967dbc5-gsz5n to kind-control-plane
27s Normal Pulled pod/redis-687967dbc5-gsz5n Container image "ghcr.io/moelove/redis:alpine" already present on machine
27s Normal Created pod/redis-687967dbc5-gsz5n Created container redis
27s Normal Started pod/redis-687967dbc5-gsz5n Started container redis
27s Normal SuccessfulCreate replicaset/redis-687967dbc5 Created pod: redis-687967dbc5-gsz5n
27s Normal ScalingReplicaSet deployment/redis Scaled up replica set redis-687967dbc5 to 1
It can be seen that some of the events we are mainly concerned about have already been recorded. But you can't look at it through kubectl every time, it's a waste of time.
One way I did before was to write a program in K8S to continuously monitor & collect events in the K8S cluster, and write it to a system I developed for storage and visualization. But this method requires additional development and is not universal. Here I will introduce another better solution.
More elegant solution
These events in K8S correspond to one of our operations. For example, a deployment is created above, which generates several events, including Scheduled
, Pulled
, Created
etc. We abstract it, is it similar to the link tracing we do?
Here we will use a CNCF graduation project Jaeger . I have introduced it many times in the previous K8S Ecological Weekly . Jaeger is an open source, end-to-end distributed tracing system. But the focus of this article is not to introduce it, so we view its document , and quickly deploy a Jaeger. Another CNCF sandbox level project is OpenTelemetry is an observable framework for cloud native software, we can use it in combination with Jaeger. However, the focus of this article is not to introduce these two projects, so I will skip them here.
Next, we will introduce the main project used in this article. It is an open source project from Weaveworks called kspan . Its main method is to organize events in K8S as spans in the trace system.
Deploy kspan
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kspan
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: null
name: kspan-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kspan
namespace: default
---
apiVersion: v1
kind: Pod
metadata:
labels:
run: kspan
name: kspan
spec:
containers:
- image: docker.io/weaveworks/kspan:v0.0
name: kspan
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
serviceAccountName: kspan
You can directly use the YAML I provided here for deployment testing, but note that the above configuration file should not be used in a production environment. RBAC permissions need to be modified .
It will use otlp-collector.default:55680
pass span by default, so you need to make sure that this svc exists. After all the above is deployed, you will probably look like this:
➜ ~ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/jaeger-76c84457fb-89s5v 1/1 Running 0 64m
pod/kspan 1/1 Running 0 35m
pod/otel-agent-sqlk6 1/1 Running 0 59m
pod/otel-collector-69985cc444-bjb92 1/1 Running 0 56m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-collector ClusterIP 10.96.47.12 <none> 14250/TCP 60m
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 39h
service/otel-collector ClusterIP 10.96.231.43 <none> 4317/TCP,14250/TCP,14268/TCP,9411/TCP,8888/TCP 59m
service/otlp-collector ClusterIP 10.96.79.181 <none> 55680/TCP 52m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/otel-agent 1 1 1 1 1 <none> 59m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger 1/1 1 1 73m
deployment.apps/otel-collector 1/1 1 1 59m
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-6f77c67c44 0 0 0 73m
replicaset.apps/jaeger-76c84457fb 1 1 1 64m
replicaset.apps/otel-collector-69985cc444 1 1 1 59m
Hands-on practice
Here we first create a namespace for testing:
➜ ~ kubectl create ns moelove
namespace/moelove created
Create a deployment
➜ ~ kubectl -n moelove create deployment redis --image=ghcr.io/moelove/redis:alpine
deployment.apps/redis created
➜ ~ kubectl -n moelove get pods
NAME READY STATUS RESTARTS AGE
redis-687967dbc5-xj2zs 1/1 Running 0 10s
Check it out on Jaeger:
Click to see the details
As you can see, the events related to this deployment are grouped together, and detailed information such as time-consuming can be seen on the timeline.
to sum up
This article introduces how to use tracing with Jaeger to collect events in K8S, so as to better grasp the time-consuming points of all events in the K8S cluster, and find the direction of optimization and measurement results easier.
Welcome to subscribe to my article public account【MoeLove】
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。