With the full maturity of Kubernetes (K8s), more and more organizations have begun to build infrastructure layers based on K8s on a large scale.

According to Sysdig's research report in the field of container orchestration, K8s has a market share of up to 75%. Among the K8s deployment and operation tools, Operator and Helm are the more mainstream two. However, due to the lack of life cycle management in Helm, Operator has become the only choice in the management of the whole life cycle.

K8s Operator is an application-specific controller that can continuously monitor the change events of K8s resource objects, perform monitoring and response throughout the life cycle, and complete deployment and delivery with high reliability. Operator provides a framework. Generally speaking, it deposits the experience of operation and maintenance into code, and realizes the coding, automation and intelligence of operation and maintenance.

In order to manage the whole life cycle of the cloud-native distributed MQTT message server EMQX , EMQX Operator came into being. Using EMQX Operator, even in the K8s environment with complex network and storage environment, you can easily build an MQTT cluster with millions of connections. This article will use EMQX Operator to build a million-level MQTT connection service based on K8s, and verify the results through testing.

What is EMQX Operator

EMQX is a cloud-native distributed MQTT message server developed based on Erlang/OTP platform. With the deepening of cloud native concepts and the popularization of K8s and Operator concepts, we have developed EMQX Operator ( https://github.com/emqx/emqx-operator ), which can quickly create and manage EMQX clusters in the Kubernetes environment , realize the management of EMQX life cycle, and greatly simplify the process of deploying and managing EMQX clusters. It mainly has the following functional advantages:

  • Reduce EMQX deployment cost in K8s environment
  • Provides basic capabilities for persistent data backup and recovery
  • Provides the ability to independently deploy, manage, and configure persistence for EMQX Plugin (coming soon)
  • Dynamically update Licence and SSL, etc.
  • Automated operation and maintenance (high availability, capacity expansion, exception handling)

Use EMQX Operator to build a million-level MQTT cluster based on K8s

Kernel parameter tuning

In order to maximize the performance of EMQX, we adjusted the kernel parameters on the worker nodes.

sudo vi /etc/sysctl.conf
 #!/bin/bash
echo "DefaultLimitNOFILE=100000000" >> /etc/systemd/system.conf
echo "session required pam_limits.so" >> /etc/pam.d/common-session
echo "*      soft    nofile      10000000"  >> /etc/security/limits.conf
echo "*      hard    nofile      100000000"  >> /etc/security/limits.conf

# lsmod |grep -q conntrack || modprobe ip_conntrack

cat >> /etc/sysctl.d/99-sysctl.conf <<EOF
net.ipv4.tcp_tw_reuse=1
fs.nr_open=1000000000
fs.file-max=1000000000
net.ipv4.ip_local_port_range=1025 65534
net.ipv4.udp_mem=74583000 499445000 749166000

net.core.somaxconn=32768
net.ipv4.tcp_max_sync_backlog=163840
net.core.netdev_max_backlog=163840

net.core.optmem_max=16777216
net.ipv4.tcp_rmem=1024 4096 16777216
net.ipv4.tcp_wmem=1024 4096 16777216
net.ipv4.tcp_max_tw_buckets=1048576
net.ipv4.tcp_fin_timeout=15
net.core.rmem_default=262144000
net.core.wmem_default=262144000
net.core.rmem_max=262144000
net.core.wmem_max=262144000
net.ipv4.tcp_mem=378150000  504200000  756300000

# net.netfilter.nf_conntrack_max=1000000
# net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
EOF

sysctl -p

Deployment installation

Deploy 5 EMQX Pods

  1. Install Cert Manager dependencies

     $ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml
  2. Install EMQX Operator

     $ helm repo add emqx https://repos.emqx.io/charts
    $ helm repo update
    $ helm install emqx-operator emqx/emqx-operator \
       --set installCRDs=true \
       --namespace emqx-operator-system \
       --create-namespace
  3. Check EMQX Operator Controller Status

     $ kubectl get pods -l "control-plane=controller-manager" -n emqx-operator-system
    NAME                                                READY   STATUS    RESTARTS   AGE
    emqx-operator-controller-manager-68b866c8bf-kd4g6   1/1     Running   0          15s
  4. Deploy EMQX

     cat << "EOF" | kubectl apply -f -
    apiVersion: apps.emqx.io/v1beta2
    kind: EmqxEnterprise
    metadata:
      name: emqx-ee
      labels:
        cluster: emqx
    spec:
      image: emqx/emqx-ee:4.4.1
      env:
        - name: "EMQX_NODE__DIST_BUFFER_SIZE"
          value: "16MB"
        - name: "EMQX_NODE__PROCESS_LIMIT"
          value: "2097152"
        - name: "EMQX_NODE__MAX_PORTS"
          value: "1048576"
        - name: "EMQX_LISTENER__TCP__EXTERNAL__ACCEPTORS"
          value: "64"
        - name: "EMQX_LISTENER__TCP__EXTERNAL__BACKLOG"
          value: "1024000"
        - name: "EMQX_LISTENER__TCP__EXTERNAL__MAX_CONNECTIONS"
          value: "1024000"
        - name: "EMQX_LISTENER__TCP__EXTERNAL__MAX_CONN_RATE"
          value: "100000"
      emqxTemplate:
        license: "your license string"
        listener:
          type: LoadBalancer
          annotations:
            service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: "intranet"
            service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec: "slb.s3.large"
    EOF
  5. View EMQX Deployment Status

     $ kubectl get pods

The EMQX cluster consists of 5 Pods, and the resource limit of each Pod is as follows:

 $ kubectl get emqx-ee emqx-ee -o json | jq ".spec.replicas"
5
$ kubectl get emqx-ee emqx-ee -o json | jq ".spec.resources"
{
  "limits": {
    "cpu": "20",
    "memory": "20Gi"
  },
  "requests": {
    "cpu": "4",
    "memory": "4Gi"
  }
}

Build result verification

test environment

This test uses Alibaba Cloud's ACK proprietary service, the network plug-in is Flannel, 3 instances with specifications of ecs.c7.2xlarge and operating system of centos7.9 are used as master nodes, and 5 instances with specifications of ecs.c7.16xlarge, operating The system is the instance of centos7.9 as the worker node node, the test tool uses XMeter, and the load balance of the press and ACK is in the same VPC network.

testing scenarios

  1. 1 million clients connect to EMQX using MQTT 5.0 protocol
  2. 500k each for publish client and subscribe client
  3. Each publish client publishes a message with QoS 1 and payload size 1k per second
  4. Each corresponding subscribe client consumes one message per second

Test Results

As shown in the following EMQX Enterprise Dashboard monitoring, the cluster under test has a total of 5 worker nodes, the cluster access volume actually reaches 1M connections and 500k subscriptions, and the message inflow (publishing) message outflow (consumption) has reached 500,000 per second:

The resource consumption of EMQX Enterprise Pods during message throughput is as follows:

The XMeter test tool results are detailed as follows:

Epilogue

Through the above verification, we can see that the EMQX cluster deployed on K8s based on EMQX Operator can easily handle millions of MQTT connections. With the popularity of K8s, more users will choose Operator to deploy and operate cloud-native applications on K8s. EMQ will continue to optimize the EMQX Operator to help users simplify the process of deploying and managing EMQX clusters, fully enjoy the convenience brought by the cloud in the cloud-native era, and explore the powerful capabilities that EMQX brings to IoT real-time data movement, processing and integration.

Copyright statement: This article is original by EMQ, please indicate the source when reprinting.

Original link: https://www.emqx.com/zh/blog/building-a-million-connection-mqtt-service-on-k8s


EMQX
336 声望438 粉丝

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服务器和流处理数据库,提供基于云原生+边缘计算技术的一站式解决方案,实现企业云边端实时数据连接、移动、...