background
With the development of business scale, more and more Kafka clusters are required, which brings great challenges to deployment and management. We expect to be able to take advantage of the excellent capacity expansion and rapid deployment capabilities of K8S to reduce the burden of daily work. Therefore, we conducted a research on the feasibility plan of K8S on kafka.
Like the Kafka cluster, there are many components involved, and they are all stateful clusters. The industry adopts the solution of custom operator. At present, there are several related repositories on GitHub. According to the comprehensive consideration of community activity and usage, Strimzi Github address is adopted this time.
kafka component interaction diagram
plan
- Deploy Strimzi using Alibaba Cloud K8S cluster
- Since the kafka used in the group is developed from the open source version, a custom Strimzi-kafka image needs to be maintained
- Strimzi manages the kafka cluster, which includes kafka, zk, kafka-exporter,
- Use zk in the zoo-entrance proxy cluster GitHub address
- Deploy prometheus and collect metrics of kafka and zk
- Open the service port to expose kafka and zk to the outside of the K8S cluster
actual combat process
Build a custom kafka image
- Pull the latest code strimzi-kafka-operator from the company's Git (slight changes from the open source version, you can use the open source version directly for experiments)
- In the docker-images folder, there is a Makefile file, execute the docker_build, it will execute the build.sh script; this step will pull the kafka installation package from the official website, we need to modify the package in this step to Our internal installation package.
- After building the image, the image is local, we need to upload the image to the harbor server inside the company
deploy operator
Only one operator needs to be deployed per K8S cluster
- Sufficient and necessary conditions: a healthy k8s cluster
- Create namespace, skip if it already exists, use kafka by default, kubectl create namespace kafka
- Pull the latest code from the company's Git (the address is at the front)
- At present, the namespace named kafka is monitored by default in the file. If you need to modify it, execute sed -i 's/namespace: . /namespace: kafka/' install/cluster-operator/ RoleBinding*.yaml (change the kafka/ replaced)
- Then apply all files kubectl apply -f install/cluster-operator/ -n kafka
- After a while, you can view the created custom resources and operator kubectl get pods -nkafka,
- View the creation of these resources and the operation of operators from Alibaba Cloud's k8s console
Deploy kafka cluster
sure that your operator has been deployed successfully, and the namespace deployed by kafka needs to be monitored by the operator above
- Or come to the latest code directory, where the files required for this deployment are under the examples/kafka directory.
- Deploy kafka and zk
- Check kafka-persistent.yaml, this file is the core file, this file deploys kafka, zk and kafka-exporter, some of the contents are as follows:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 2.8.1
replicas: 3
resources:
requests:
memory: 16Gi
cpu: 4000m
limits:
memory: 16Gi
cpu: 4000m
image: repository.poizon.com/kafka-operator/poizon/kafka:2.8.4
jvmOptions:
-Xms: 3072m
-Xmx: 3072m
listeners:
- name: external
port: 9092
type: nodeport
tls: false
- name: plain
port: 9093
type: internal
tls: false
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 1
default.replication.factor: 2
***
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/name
operator: In
values:
- my-cluster-kafka
topologyKey: "kubernetes.io/hostname"
storage:
type: persistent-claim
size: 100Gi
class: rocketmq-storage
deleteClaim: false
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: kafka-metrics-config.yml
zookeeper:
replicas: 3
resources:
requests:
memory: 3Gi
cpu: 1000m
limits:
memory: 3Gi
cpu: 1000m
jvmOptions:
-Xms: 2048m
-Xmx: 2048m
jmxOptions: {}
template:
pod:
affinity:
podAntiAffinity:
***
storage:
type: persistent-claim
size: 50Gi
class: rocketmq-storage
deleteClaim: false
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: zookeeper-metrics-config.yml
***
***
- The name of the kafka cluster can be modified. The name attribute in the fourth line is currently my-cluster by default.
- The number of pods of kafka can be modified, that is, the number of nodes, the default is 3
- Modifiable Pod configuration memory CPU
- The size of the heap memory started by the kafka JVM can be modified
- The configuration of kafka can be modified, in line 36 config configuration
- The disk type and size can be modified. The type is line 50. It can be modified to other storage types. Currently, high-efficiency cloud disk, SSD, and ESSD can be selected.
- zk modification is similar to kafka, the modifiable things are similar, and in the same file
- Below the file are the metrics that kafka and zk need to expose, which can be added, deleted or modified as needed
- After modifying the configuration, directly execute kubect apply -f kafka-persistent.yaml -nkafka to create
- deploy zk agent
- Since does not officially support external components to directly access zk, it uses a proxy to access
- For security reasons, the official is that intentionally does not support external programs to access zk: https://github.com/strimzi/strimzi-kafka-operator/issues/1337
- After deploying the zk agent, we need to create a loadbalance service on the k8s console to expose the agent to applications outside the cluster for connection. Specific operation: k8s console --> network --> service --> create (select loadbalance to create, and then find the zoo-entrance application)
- Deploy zk-exporter
- There is no zk-exporter in the official operator, we use https://github.com/dabealu/zookeeper-exporter
- In the zk-exporter.yaml file in the folder, we only need to modify the address of the monitored zk (spec.container.args)
- Execute kubectl apply -f zk-exporter.yaml to deploy
- Deploy kafka-jmx
- Since ingress does not support tcp connections, and the cost of loadbalance is too high, kafka's jmx uses nodeport to expose to the outside world
- You can create the corresponding nodeport on the Alibaba Cloud console, or you can use the kafka-jmx.yaml file to create it
apiVersion: v1
kind: Service
metadata:
labels:
strimzi.io/cluster: my-cluster
strimzi.io/name: my-cluster-kafka-jmx
name: my-cluster-kafka-jmx-0
spec:
ports:
- name: kafka-jmx-nodeport
port: 9999
protocol: TCP
targetPort: 9999
selector:
statefulset.kubernetes.io/pod-name: my-cluster-kafka-0
strimzi.io/cluster: my-cluster
strimzi.io/kind: Kafka
strimzi.io/name: my-cluster-kafka
type: NodePort
- Deploy kafka-exporter-service
- After deploying kafka earlier, exporter is enabled in our configuration. However, after the official opening of the exporter, a related service was not automatically generated. In order to make the Prometheus connection more convenient, we deployed a service
- In the folder kafka-exporter-service.yaml file
apiVersion: v1
kind: Service
metadata:
labels:
app: kafka-export-service
name: my-cluster-kafka-exporter-service
spec:
ports:
- port: 9404
protocol: TCP
targetPort: 9404
selector:
strimzi.io/cluster: my-cluster
strimzi.io/kind: Kafka
strimzi.io/name: my-cluster-kafka-exporter
type: ClusterIP
- Execute kubectl apply -f kafka-exporter-service.yaml to complete the deployment
- Deploy kafka-prometheus
- If Prometheus is deployed outside the k8s cluster, data collection will be more troublesome, so we directly deploy Prometheus into the cluster
- In the kafka-prometheus.yaml file in the folder, you can selectively modify the configuration of prometheus, such as the required memory CPU size, such as the monitoring data storage time, the size of the external cloud disk, and the kafka and zk addresses that need to be monitored.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka-prometheus
labels:
app: kafka-prometheus
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: kafka-prometheus
serviceName: kafka-prometheus
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: kafka-prometheus
spec:
containers:
- args:
- '--query.max-concurrency=800'
- '--query.max-samples=800000000'
***
command:
- /bin/prometheus
image: 'repository.poizon.com/prometheus/prometheus:v2.28.1'
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /status
port: web
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
name: kafka-prometheus
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 200m
memory: 128Mi
volumeMounts:
- mountPath: /etc/localtime
name: volume-localtime
- mountPath: /data/prometheus/
name: kafka-prometheus-config
- mountPath: /data/database/prometheus
name: kafka-prometheus-db
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
terminationGracePeriodSeconds: 30
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 0
volumes:
- hostPath:
path: /etc/localtime
type: ''
name: volume-localtime
- configMap:
defaultMode: 420
name: kafka-prometheus-config
name: kafka-prometheus-config
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kafka-prometheus-db
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: rocketmq-storage
volumeMode: Filesystem
status:
phase: Pending
- Execute kubectl apply -f kafka-prometheus.yaml to deploy
- After the deployment is complete, expose prometheus to grafana of the monitoring group, you can directly connect to the pod IP for verification, and then create an ingress in the network-->routing-->create of the k8s console, select the service of the Prometheus just deployed, and then Find the operation and maintenance to apply for a domain name.
Summarize
- advantage
- Rapid cluster deployment (in minutes), rapid cluster expansion (in seconds), and rapid disaster recovery (in seconds)
- Support rolling update, support backup and restore
- shortcoming
- The introduction of more components increases the complexity
- Not very friendly to access outside the K8S cluster
Text/ZUOQI
Pay attention to Dewu Technology and be the most fashionable technical person!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。