author
Author Li Tengfei, Tencent container technology R&D engineer, Tencent Cloud TKE background R&D, SuperEdge core development member.
background
In an edge cluster, the edge and the cloud are a one-way network, and the cloud cannot actively connect to the edge. A common solution is that the edge actively establishes a long connection with the cloud (tunnel server), and the cloud forwards the request to the edge through the long connection. After the expansion of the cloud tunnel server instance, it is necessary to consider the impact of the newly added instance on the existing edge-end long connection forwarding. For the sake of system stability, the monitoring information at the edge can be collected through the cloud side tunnel.
Community ANP
Tunnel Cloud Server automatically expands and shrinks
ANP is mainly used for proxy forwarding apiserver requests. The architecture diagram is shown in the following figure:
ANP's server only supports single instance. If it is multi-instance, there will be problems. The following is an explanation based on the architecture diagram of multi-instance:
- ANP Agent needs to establish long connections with all ANP Server instances.
- After the expansion of ANP Server, the scale of ANP Agent that supports access will not increase
Node monitoring
The ANP project is mainly aimed at the feature EgressSelector released in K8s version 1.16. In this feature, the apiserver will first use the HTTP CONNECT method to establish a tunnel, and then send the request from the edge of the request to the ANP Server through the tunnel. The ANP Server establishes a long connection with the ANP Agent. Send the request to the edge. Prometheus, a monitoring collection component commonly used in the industry, does not support the EgressSelector feature, so the use of the ANP project cannot support node monitoring.
SuperEdge Cloud Edge Tunnel (tunnel) solution
The SuperEdge cloud edge tunnel tunnel uses DNS as the registry center of the edge node during the plan design. The registry stores the ID of the tunnel-edge and the podIp of the tunnel-edge connected to the tunnel-cloud. It can be used as an apiserver to forward the request to the edge. According to the ID of the registration center, the request is forwarded to the pod of the tunnel cloud connected to the edge. The specific architecture diagram is as follows:
The apiserver component in the above figure can be other components in the cloud, such as Prometheus. The following is a further description of the tunnel usage scenarios from automatic expansion and node monitoring.
Automatic expansion and contraction of tunnel cloud (HPA)
Compared with the ANP project in a multi-instance scenario, tunnel has the following advantages:
- The tunnel-edge only needs to be connected to a tunnel-cloud instance for a long time. The apiserver determines the requested tunnel-cloud pod according to the mapping relationship between the tunnel-edge ID stored in the tunnel-dns and the tunnel-cloud pod, and then forwards the request to the tunnel-edge.
- After the tunnel-cloud is expanded, the number of tunnel-edges that the tunnel-cloud supports will increase.
Custom auto-scaling strategy
In addition to automatic expansion and contraction according to the usage of memory and CPU, tunnel-cloud can also realize automatic expansion and contraction according to the number of edge nodes that establish a long connection with tunnel-cloud. The architecture diagram is as follows:
- prometheus collects metrics from tunnel-cloud pod
{
"__name__": "tunnel_cloud_nodes",
"instance": "172.31.0.10:6000",
"job": "tunnel-cloud-metrics",
"kubernetes_namespace": "edge-system",
"kubernetes_pod_name": "tunnel-cloud-64ff7d9c9d-4lljh"
}
- prometheus-adapter registers Custom Metrics API extension apiserver with apiserver
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "namespaces/nodes_per_pod",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "pods/nodes_per_pod",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
]
}
- prometheus-adapter converts metrics into pod metrics
{
"describedObject":{
"kind":"Pod",
"namespace":"edge-system",
"name":"tunnel-cloud-64ff7d9c9d-vmkxh",
"apiVersion":"/v1"
},
"metricName":"nodes_per_pod",
"timestamp":"2021-07-14T10:19:37Z",
"value":"1",
"selector":null
}
- Configure custom HPA
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: tunnel-cloud
namespace: edge-system
spec:
minReplicas: 1
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: tunnel-cloud
metrics:
- type: Pods
pods:
metric:
name: nodes_per_pod
target:
averageValue: 300 #平均每个pod连接的边缘节点的个数,超过这个数目就会触发扩容
type: AverageValue
node monitoring solution
Node monitoring mainly collects the metrics of the edge node kubelet and the hardware and system indicators collected by the node-exporter. When deploying Prometheus, configure the pod's dns to point to tunnel-dns. Prometheus uses the node name to access the kubelet and node-exporter on the edge node. Tunnel-dns will resolve the node name to the podIp of the tunnel-cloud connected to the tunnel-edge of the edge node. , Prometheus accesses tunnel-cloud according to podIp (where the metrics to obtain kubelet access port 10250, and port 9100 requests node-exporter access), tunnel-cloud forwards the request to the tunnel-edge through the long-connection tunnel, and the tunnel-edge sends the request to the tunnel-edge. The kubelet and node-exporter initiate the request, and the block diagram of the entire process is as follows:
- Configure Prometheus DNS to point to tunnel-dns
dnsConfig:
nameservers:
- <tunnel-dns的clusterip>
options:
- name: ndots
value: "5"
searches:
- edge-system.svc.cluster.local
- svc.cluster.local
- cluster.local
dnsPolicy: None
- Configure Prometheus to use node name to access kubelet and node-exporter
- job_name: node-cadvisor
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __address__
replacement: ${1}:10250
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /metrics/cadvisor
- source_labels: [__address__]
target_label: "unInstanceId"
replacement: "none"
- job_name: node-exporter
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __address__
replacement: ${1}:9100
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /metrics
- source_labels: [__address__]
target_label: "unInstanceId"
replacement: "none"
Summary and outlook
SuperEdge's cloud-side tunnel solution (tunnel) has the following characteristics compared to the community's ANP solution:
- Support automatic expansion and contraction
- Supports Prometheus to collect node monitoring data
- SSH into the edge node
- Support TCP forwarding
Of course, we will continue to improve the capabilities of the tunnel so that it can meet the needs of more scenarios. According to the feedback from the community partners, the tunnel components will support the following functions in the future:
- Supports access to the edge service from the cloud and the edge access to the cloud service
- Support EgressSelector feature
Cooperation and open source
The new features of cloud edge tunnel support for cloud server auto-scaling and node monitoring have been introduced in SuperEdge release 0.5.0 [ https://github.com/superedge/superedge/blob/main/CHANGELOG/CHANGELOG-0.5.md] Open source, everyone is welcome to experience. We will also continue to improve the capabilities of the Tunnel to adapt to more complex edge network scenarios. Companies, organizations and individuals interested in edge computing are also welcome to jointly build the SuperEdge edge container project.
[Tencent Cloud Native] Yunshuo new products, Yunyan new technology, Yunyou Xinhuo, Yunxiang information, scan the QR code to follow the public account of the same name, and get more dry goods in time! !
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。