3

background

With the development of business scale, more and more Kafka clusters are required, which brings great challenges to deployment and management. We expect to be able to take advantage of the excellent capacity expansion and rapid deployment capabilities of K8S to reduce the burden of daily work. Therefore, we conducted a research on the feasibility plan of K8S on kafka.

Like the Kafka cluster, there are many components involved, and they are all stateful clusters. The industry adopts the solution of custom operator. At present, there are several related repositories on GitHub. According to the comprehensive consideration of community activity and usage, Strimzi Github address is adopted this time.

kafka component interaction diagram

plan

  1. Deploy Strimzi using Alibaba Cloud K8S cluster
  2. Since the kafka used in the group is developed from the open source version, a custom Strimzi-kafka image needs to be maintained
  3. Strimzi manages the kafka cluster, which includes kafka, zk, kafka-exporter,
  4. Use zk in the zoo-entrance proxy cluster GitHub address
  5. Deploy prometheus and collect metrics of kafka and zk
  6. Open the service port to expose kafka and zk to the outside of the K8S cluster

actual combat process

Build a custom kafka image

  • Pull the latest code strimzi-kafka-operator from the company's Git (slight changes from the open source version, you can use the open source version directly for experiments)
  • In the docker-images folder, there is a Makefile file, execute the docker_build, it will execute the build.sh script; this step will pull the kafka installation package from the official website, we need to modify the package in this step to Our internal installation package.
  • After building the image, the image is local, we need to upload the image to the harbor server inside the company

deploy operator

Only one operator needs to be deployed per K8S cluster

  • Sufficient and necessary conditions: a healthy k8s cluster
  • Create namespace, skip if it already exists, use kafka by default, kubectl create namespace kafka
  • Pull the latest code from the company's Git (the address is at the front)
  • At present, the namespace named kafka is monitored by default in the file. If you need to modify it, execute sed -i 's/namespace: . /namespace: kafka/' install/cluster-operator/ RoleBinding*.yaml (change the kafka/ replaced)
  • Then apply all files kubectl apply -f install/cluster-operator/ -n kafka
  • After a while, you can view the created custom resources and operator kubectl get pods -nkafka,
  • View the creation of these resources and the operation of operators from Alibaba Cloud's k8s console

Deploy kafka cluster

sure that your operator has been deployed successfully, and the namespace deployed by kafka needs to be monitored by the operator above

  • Or come to the latest code directory, where the files required for this deployment are under the examples/kafka directory.
  • Deploy kafka and zk
    • Check kafka-persistent.yaml, this file is the core file, this file deploys kafka, zk and kafka-exporter, some of the contents are as follows:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 2.8.1
    replicas: 3
    resources:
      requests:
        memory: 16Gi
        cpu: 4000m
      limits:
        memory: 16Gi
        cpu: 4000m
    image: repository.poizon.com/kafka-operator/poizon/kafka:2.8.4
    jvmOptions:
      -Xms: 3072m
      -Xmx: 3072m
    listeners:
      - name: external
        port: 9092
        type: nodeport
        tls: false
      - name: plain
        port: 9093
        type: internal
        tls: false
    config:
      offsets.topic.replication.factor: 2
      transaction.state.log.replication.factor: 2
      transaction.state.log.min.isr: 1
      default.replication.factor: 2
      ***
    template:
      pod:
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: strimzi.io/name
                      operator: In
                      values:
                        - my-cluster-kafka
                topologyKey: "kubernetes.io/hostname"
    storage:
      type: persistent-claim
      size: 100Gi
      class: rocketmq-storage
      deleteClaim: false
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
  zookeeper:
    replicas: 3
    resources:
      requests:
        memory: 3Gi
        cpu: 1000m
      limits:
        memory: 3Gi
        cpu: 1000m
    jvmOptions:
      -Xms: 2048m
      -Xmx: 2048m
    jmxOptions: {}
    template:
      pod:
        affinity:
          podAntiAffinity:
          ***
    storage:
      type: persistent-claim
      size: 50Gi
      class: rocketmq-storage
      deleteClaim: false
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: zookeeper-metrics-config.yml
    ***
    ***
    • The name of the kafka cluster can be modified. The name attribute in the fourth line is currently my-cluster by default.
    • The number of pods of kafka can be modified, that is, the number of nodes, the default is 3
    • Modifiable Pod configuration memory CPU
    • The size of the heap memory started by the kafka JVM can be modified
    • The configuration of kafka can be modified, in line 36 config configuration
    • The disk type and size can be modified. The type is line 50. It can be modified to other storage types. Currently, high-efficiency cloud disk, SSD, and ESSD can be selected.
    • zk modification is similar to kafka, the modifiable things are similar, and in the same file
    • Below the file are the metrics that kafka and zk need to expose, which can be added, deleted or modified as needed
    • After modifying the configuration, directly execute kubect apply -f kafka-persistent.yaml -nkafka to create
  • deploy zk agent
    • After deploying the zk agent, we need to create a loadbalance service on the k8s console to expose the agent to applications outside the cluster for connection. Specific operation: k8s console --> network --> service --> create (select loadbalance to create, and then find the zoo-entrance application)
  • Deploy zk-exporter
    • Execute kubectl apply -f zk-exporter.yaml to deploy
  • Deploy kafka-jmx
    • Since ingress does not support tcp connections, and the cost of loadbalance is too high, kafka's jmx uses nodeport to expose to the outside world
    • You can create the corresponding nodeport on the Alibaba Cloud console, or you can use the kafka-jmx.yaml file to create it
apiVersion: v1
kind: Service
metadata:
  labels:
    strimzi.io/cluster: my-cluster
    strimzi.io/name: my-cluster-kafka-jmx
  name: my-cluster-kafka-jmx-0
spec:
  ports:
    - name: kafka-jmx-nodeport
      port: 9999
      protocol: TCP
      targetPort: 9999
  selector:
    statefulset.kubernetes.io/pod-name: my-cluster-kafka-0
    strimzi.io/cluster: my-cluster
    strimzi.io/kind: Kafka
    strimzi.io/name: my-cluster-kafka
  type: NodePort
  • Deploy kafka-exporter-service
    • After deploying kafka earlier, exporter is enabled in our configuration. However, after the official opening of the exporter, a related service was not automatically generated. In order to make the Prometheus connection more convenient, we deployed a service
    • In the folder kafka-exporter-service.yaml file
apiVersion: v1
kind: Service
metadata:
  labels:
    app: kafka-export-service
  name: my-cluster-kafka-exporter-service
spec:
  ports:
    - port: 9404
      protocol: TCP
      targetPort: 9404
  selector:
    strimzi.io/cluster: my-cluster
    strimzi.io/kind: Kafka
    strimzi.io/name: my-cluster-kafka-exporter
  type: ClusterIP
    • Execute kubectl apply -f kafka-exporter-service.yaml to complete the deployment
  • Deploy kafka-prometheus
    • If Prometheus is deployed outside the k8s cluster, data collection will be more troublesome, so we directly deploy Prometheus into the cluster
    • In the kafka-prometheus.yaml file in the folder, you can selectively modify the configuration of prometheus, such as the required memory CPU size, such as the monitoring data storage time, the size of the external cloud disk, and the kafka and zk addresses that need to be monitored.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka-prometheus
  labels:
    app: kafka-prometheus
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: kafka-prometheus
  serviceName: kafka-prometheus
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: kafka-prometheus
    spec:
      containers:
        - args:
            - '--query.max-concurrency=800'
            - '--query.max-samples=800000000'
            ***
          command:
            - /bin/prometheus
          image: 'repository.poizon.com/prometheus/prometheus:v2.28.1'
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 10
            httpGet:
              path: /status
              port: web
              scheme: HTTP
            initialDelaySeconds: 300
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 3
          name: kafka-prometheus
          resources:
            limits:
              cpu: 500m
              memory: 512Mi
            requests:
              cpu: 200m
              memory: 128Mi
          volumeMounts:
            - mountPath: /etc/localtime
              name: volume-localtime
            - mountPath: /data/prometheus/
              name: kafka-prometheus-config
            - mountPath: /data/database/prometheus
              name: kafka-prometheus-db
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      terminationGracePeriodSeconds: 30
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 0
      volumes:
        - hostPath:
            path: /etc/localtime
            type: ''
          name: volume-localtime
        - configMap:
            defaultMode: 420
            name: kafka-prometheus-config
          name: kafka-prometheus-config
  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: kafka-prometheus-db
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        storageClassName: rocketmq-storage
        volumeMode: Filesystem
      status:
        phase: Pending
    • Execute kubectl apply -f kafka-prometheus.yaml to deploy
    • After the deployment is complete, expose prometheus to grafana of the monitoring group, you can directly connect to the pod IP for verification, and then create an ingress in the network-->routing-->create of the k8s console, select the service of the Prometheus just deployed, and then Find the operation and maintenance to apply for a domain name.

Summarize

  • advantage
    • Rapid cluster deployment (in minutes), rapid cluster expansion (in seconds), and rapid disaster recovery (in seconds)
    • Support rolling update, support backup and restore
  • shortcoming
    • The introduction of more components increases the complexity
    • Not very friendly to access outside the K8S cluster

Text/ZUOQI

Pay attention to Dewu Technology and be the most fashionable technical person!


得物技术
846 声望1.5k 粉丝