2
Introduction to With the rapid development of 5G, IoT and other technologies, edge computing is increasingly used in industries and scenarios such as telecommunications, media, transportation, logistics, agriculture, and retail, and has become a solution to the efficiency of data transmission in these fields. The key way. At the same time, the form, scale and complexity of edge computing are increasing day by day, and the operation and maintenance methods and capabilities in the field of edge computing are increasingly weak in supporting the innovation speed of edge services. As a result, Kubernetes has quickly become a key element of edge computing, helping companies to better run containers at the edge, maximize the use of resources, and shorten the development cycle.

Author|He Linbo (Xinsheng)
封面图.jpg

background

With the rapid development of technologies such as 5G and IoT, edge computing has been increasingly used in industries and scenarios such as telecommunications, media, transportation, logistics, agriculture, and retail, and has become a key way to solve data transmission efficiency in these fields. At the same time, the shape, scale, and complexity of edge computing are increasing day by day, and the operation and maintenance methods and capabilities in the field of edge computing are increasingly weak in supporting the speed of edge business innovation. As a result, Kubernetes has quickly become a key element of edge computing, helping companies to better run containers at the edge, maximize the use of resources, and shorten the development cycle.

However, if native Kubernetes is directly applied to edge computing scenarios, many problems still need to be solved. For example, cloud and edge are generally located on different network planes, while edge nodes are generally located inside the firewall. The use of cloud (center) edge collaboration architecture will lead to native The operation and maintenance monitoring capabilities of the K8s system face the following challenges:

  • K8s native operation and maintenance capabilities are missing (such as kubectl logs/exec cannot be executed)
  • The mainstream monitoring operation and maintenance components of the community cannot work (such as Prometheus/metrics-server)

In order to help enterprises solve the challenges of application lifecycle management, cloud-side network connection, cloud-side operation and maintenance coordination, and heterogeneous resource support in the edge scenarios of native Kubernetes, OpenYurt, an edge computing cloud native open source platform based on K8s, came into being. It is also an important part of CNCF's native map of edge cloud. This article will introduce in detail how Yurt-Tunnel, one of the core components of OpenYurt, expands the capabilities of the native K8s system in edge scenarios.

Yurt-Tunnel design ideas

Since the edge can access the cloud, it can be considered to build a tunnel that can be penetrated backwards at the cloud edge to ensure that the cloud (center) can actively access the edge based on the tunnel. We were also investigating a number of open source tunnel option, in terms of capacity and ecological compatibility, we finally selected based on ANP designed and implemented Yurt-Tunnel overall solution with safe, non-invasive, can Advantages such as expansion and high transmission efficiency.

1.png

Method to realize

To build a secure, non-invasive, and scalable back-channel solution in the K8s cloud-side integrated architecture, the solution needs to include at least the following capabilities.

  • Cloud side tunnel construction
  • Self-management of certificates at both ends of the tunnel
  • Cloud component requests are seamlessly flowed back to the tunnel

The architecture module of Yurt-tunnel is as follows:

2.png

3.1 Cloud side tunnel construction

  • When the yurt-tunnel-agent on the edge starts, it will establish a connection and register with the yurt-tunnel-server according to the access address, and periodically check the health status of the connection and reestablish the connection.

# https://github.com/openyurtio/apiserver-network-proxy/blob/master/pkg/agent/client.go#L189
# yurt-tunnel-agent的注册信息:
"agentID": {nodeName}
"agentIdentifiers": ipv4={nodeIP}&host={nodeName}"
  • When yurt-tunnel-server receives a request from the cloud component, it needs to forward the request to the corresponding yurt-tunnel-agent. Because in addition to forwarding the initial request, the request session is followed by data return or continuous data forwarding (such as kubectl exec). Therefore, data needs to be forwarded in both directions. At the same time, it is necessary to support concurrent forwarding of requests for cloud components, which means that an independent identity needs to be established for each request life cycle. So there are generally two options in design.

Plan 1: initial cloud edge connection only notifies the forwarding request, and the tunnel-agent will establish a new connection with the cloud to process this request. Through the new connection, the problem of requesting independent identification can be solved well, and concurrency can also be solved well. However, a connection needs to be established for each request, which consumes a lot of resources.

Solution 2 : Only use the initial cloud-side connection to forward requests. In order to reuse the same connection for a large number of requests, each request needs to be encapsulated and an independent identifier is added to solve the concurrent forwarding requirements. At the same time, since a connection needs to be reused, connection management and request lifecycle management need to be decoupled, that is, independent management of the state transition of request forwarding is required. This scheme involves packet unpacking, request processing state machine, etc. The scheme will be more complicated.

  • The ANP component selected by OpenYurt adopts the above scheme 2, which is also consistent with our original design intention.

# https://github.com/openyurtio/apiserver-network-proxy/blob/master/konnectivity-client/proto/client/client.pb.go#L98
# 云边通信的数据格式以及数据类型
type Packet struct {
  Type PacketType `protobuf:"varint,1,opt,name=type,proto3,enum=PacketType" json:"type,omitempty"`
  // Types that are valid to be assigned to Payload:
  //  *Packet_DialRequest
  //  *Packet_DialResponse
  //  *Packet_Data
  //  *Packet_CloseRequest
  //  *Packet_CloseResponse
  Payload              isPacket_Payload `protobuf_oneof:"payload"`
}

  • The request forwarding link construction is encapsulated in Packet\_DialRequest and Packet\_DialResponse, where Packet\_DialResponse.ConnectID is used to identify the request, which is equivalent to the requestID in the tunnel. The request and associated data are encapsulated in Packet\_Data. Packet\_CloseRequest and Packet\_CloseResponse are used for forwarding link resource recovery. For details, please refer to the following timing diagram:

3.png

  • The role of the RequestInterceptor module

It can be seen from the above analysis that before yurt-tunnel-server forwards the request, the requester needs to initiate an Http Connect request to construct the forwarding link. However, it will be more difficult to add corresponding processing to open source components such as Prometheus and metrics-server. Therefore, a request hijacking module Interceptor is added to Yurt-tunnel-server to initiate Http Connect requests. The relevant code is as follows:

# https://github.com/openyurtio/openyurt/blob/master/pkg/yurttunnel/server/interceptor.go#L58-82
    proxyConn, err := net.Dial("unix", udsSockFile)
    if err != nil {
      return nil, fmt.Errorf("dialing proxy %q failed: %v", udsSockFile, err)
    }

    var connectHeaders string
    for _, h := range supportedHeaders {
      if v := header.Get(h); len(v) != 0 {
        connectHeaders = fmt.Sprintf("%s\r\n%s: %s", connectHeaders, h, v)
      }
    }

    fmt.Fprintf(proxyConn, "CONNECT %s HTTP/1.1\r\nHost: %s%s\r\n\r\n", addr, "127.0.0.1", connectHeaders)
    br := bufio.NewReader(proxyConn)
    res, err := http.ReadResponse(br, nil)
    if err != nil {
      proxyConn.Close()
      return nil, fmt.Errorf("reading HTTP response from CONNECT to %s via proxy %s failed: %v", addr, udsSockFile, err)
    }
    if res.StatusCode != 200 {
      proxyConn.Close()
      return nil, fmt.Errorf("proxy error from %s while dialing %s, code %d: %v", udsSockFile, addr, res.StatusCode, res.Status)
    }

3.2 Certificate Management

In order to ensure the long-term and secure communication of the cloud-side channel, and also to support HTTPS request forwarding, yurt-tunnel needs to generate certificates by itself and maintain automatic rotation of certificates. The specific implementation is as follows:


# 1. yurt-tunnel-server证书:
# https://github.com/openyurtio/openyurt/blob/master/pkg/yurttunnel/pki/certmanager/certmanager.go#L45-90
- 证书存储位置: /var/lib/yurt-tunnel-server/pki
- CommonName: "kube-apiserver-kubelet-client"  // 用于kubelet server的webhook校验
- Organization: {"system:masters", "openyurt:yurttunnel"} // 用于kubelet server的webhook校验和yurt-tunnel-server证书的auto approve
- Subject Alternate Name values: {x-tunnel-server-svc, x-tunnel-server-internal-svc的ips和dns names}
- KeyUsage: "any"

# 2. yurt-tunnel-agent证书:
# https://github.com/openyurtio/openyurt/blob/master/pkg/yurttunnel/pki/certmanager/certmanager.go#L94-112
- 证书存储位置: /var/lib/yurt-tunnel-agent/pki
- CommonName: "yurttunnel-agent"
- Organization: {"openyurt:yurttunnel"} // 用于yurt-tunnel-agent证书的auto approve
- Subject Alternate Name values: {nodeName, nodeIP}
- KeyUsage: "any"

# 3. yurt-tunnel证书申请(CSR)均由yurt-tunnel-server来approve
# https://github.com/openyurtio/openyurt/blob/master/pkg/yurttunnel/pki/certmanager/csrapprover.go#L115
- 监听csr资源
- 过滤非yurt-tunnel的csr(Organization中没有"openyurt:yurttunnel")
- approve还未Approved的csr

# 4. 证书自动轮替处理
# https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/util/certificate/certificate_manager.go#L224

3.3 Seamlessly divert cloud component requests to the tunnel

Because it is necessary to seamlessly forward the request of the cloud component to yurt-tunnel-server, it also means that there is no need to make any changes to the cloud component. Therefore, it is necessary to analyze the requests of cloud components. At present, there are mainly two types of O&M requests for components:

Different solutions need to be adopted for the diversion of different types of requests.

  • Solution 1: Use iptables dnat rules to ensure that type 1 requests are seamlessly forwarded to yurt-tunnel-server
# 相关iptables rules维护代码: https://github.com/openyurtio/openyurt/blob/master/pkg/yurttunnel/iptables/iptables.go
# yurt-tunnel-server维护的iptables dnat rules如下:
[root@xxx /]# iptables -nv -t nat -L OUTPUT
TUNNEL-PORT  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* edge tunnel server port */

[root@xxx /]# iptables -nv -t nat -L TUNNEL-PORT
TUNNEL-PORT-10255  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:10255 /* jump to port 10255 */
TUNNEL-PORT-10250  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:10250 /* jump to port 10250 */

[root@xxx /]# iptables -nv -t nat -L TUNNEL-PORT-10255
RETURN     tcp  --  *      *       0.0.0.0/0            127.0.0.1            /* return request to access node directly */ tcp dpt:10255
RETURN     tcp  --  *      *       0.0.0.0/0            172.16.6.156         /* return request to access node directly */ tcp dpt:10255
DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* dnat to tunnel for access node */ tcp dpt:10255 to:172.16.6.156:10264

  • Solution 2: Use dns to resolve nodeName as the access address of yurt-tunnel-server, so that type 2 requests are seamlessly forwarded to yurt-tunnel

# x-tunnel-server-svc和x-tunnel-server-internal-svc的不同用途:
 - x-tunnel-server-svc: 主要expose 10262/10263端口,用于从公网访问yurt-tunnel-server。如yurt-tunnel-agent
 - x-tunnel-server-internal-svc: 主要用于云端组件从内部网络访问,如prometheus,metrics-server等

# dns域名解析原理:
1. yurt-tunnel-server向kube-apiserver创建或更新yurt-tunnel-nodes configmap, 其中tunnel-nodes字段格式为: {x-tunnel-server-internal-svc clusterIP}  {nodeName},确保记录了所有nodeName和yurt-tunnel-server的service的映射关系
2. coredns pod中挂载yurt-tunnel-nodes configmap,同时使用host插件使用configmap的dns records
3. 同时在x-tunnel-server-internal-svc中配置端口映射,10250映射到10263,10255映射到10264
4. 通过上述的配置,可以实现http://{nodeName}:{port}/{path}请求无缝转发到yurt-tunnel-servers

  • Cloud request extension:

If users need to access other ports on the edge (except 10250 and 10255), they need to add corresponding dnat rules in iptables or add corresponding port mapping in x-tunnel-server-internal-svc, as shown below:


# 例如需要访问边缘的9051端口
# 新增iptables dnat rule:
[root@xxx /]# iptables -nv -t nat -L TUNNEL-PORT
TUNNEL-PORT-9051  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9051 /* jump to port 9051 */

[root@xxx /]# iptables -nv -t nat -L TUNNEL-PORT-9051
RETURN     tcp  --  *      *       0.0.0.0/0            127.0.0.1            /* return request to access node directly */ tcp dpt:9051
RETURN     tcp  --  *      *       0.0.0.0/0            172.16.6.156         /* return request to access node directly */ tcp dpt:9051
DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* dnat to tunnel for access node */ tcp dpt:9051 to:172.16.6.156:10264

# x-tunnel-server-internal-svc中新增端口映射
spec:
  ports:
  - name: https
    port: 10250
    protocol: TCP
    targetPort: 10263
  - name: http
    port: 10255
    protocol: TCP
    targetPort: 10264
  - name: dnat-9051 # 新增映射
    port: 9051
    protocol: TCP
    targetPort: 10264

Of course, the above iptables dnat rules and service port mapping are yurt-tunnel-server . Users only need to add port configuration in yurt-tunnel-server-cfg configmap. details as follows:

# 注意:由于证书不可控因素,目前新增端口只支持从yurt-tunnel-server的10264转发
apiVersion: v1
data:
  dnat-ports-pair: 9051=10264 # 新增端口=10264(非10264转发不支持)
kind: ConfigMap
metadata:
  name: yurt-tunnel-server-cfg
  namespace: kube-system

Near-term planning

  • Support EgressSelector function of kube-apiserver
  • Verify yurt-tunnel-server multi-instance deployment verification
  • Support yurt-tunnel-agent to configure multiple yurt-tunnel-server addresses
  • Support certificate storage directory customization
  • Support certificate Usage definition to be more refined to ensure that the scope of certificate usage is controllable
  • Support the yurt-tunnel-server certificate can be updated automatically after the access address of yurt-tunnel-server is changed
  • Support yurt-tunnel-agent to automatically refresh the access address of yurt-tunnel-server
  • Support non-NodeIP/NodeName type request forwarding (such as cloud access side of non-host network Pod)
  • Support access to cloud Pod from edge Pod through Tunnel
  • Support independent deployment of yurt-tunnel (unbound k8s)
  • Support more protocol forwarding, such as gRPC, websocket, ssh, etc.

Welcome to join the OpenYurt community

As the core of Alibaba Cloud's edge container service ACK@Edge, OpenYurt has been commercialized in dozens of industries, including CDN, audio and video live broadcast, Internet of Things, logistics, industrial brain, and urban brain, with a service scale of millions of CPU cores. . We are pleased to see that more and more developers, open source communities, enterprises, and credit institutions now recognize OpenYurt’s philosophy and are joining the team to jointly build OpenYurt, such as VMware, Intel, Sangfor, China Merchants, Zhejiang University, EdgeX Foundry community, eKuiper community, etc. We also welcome more friends to build the OpenYurt community to prosper the cloud-native edge computing ecosystem, and allow true cloud-nativeness to create value in more edge scenarios.

Welcome to join the OpenYurt community nail group:

4.png

Click https://openyurt.io/en-us/ to go directly to the OpenYurt official website!

Copyright statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。