Author

Wei Houmin, Tencent Cloud background development engineer, pays attention to open source communities such as containers, Kubernetes, and Cilium, and is responsible for Tencent Cloud TKE hybrid cloud container network and other related work.

Zhao Qiyuan, a senior engineer of Tencent Cloud, is mainly responsible for the design and research and development of Tencent Cloud's container network.

Preface

Hybrid Cloud not a new concept, but with the development of container technology, the combination of hybrid cloud and container is getting more and more attention. Through cloud native technologies such as containers, the differences in the underlying computing resource infrastructure of hybrid cloud heterogeneous clusters can be shielded, and the unification of multi-cloud scenarios, IDC scenarios, and even edge scenarios can be realized. Hybrid cloud will no longer be a simple combination of public and private clouds, but a distributed cloud with ubiquitous computing loads, which can give full play to its advantages in scenarios such as resource expansion, multi-active disaster recovery, and multi-cluster hybrid deployment.

Tencent Cloud TKE Container Service launched a public cloud cluster to add a third-party IDC computing node service. This service allows customers to reuse IDC computing resources, avoiding the cost of building, operating and maintaining Kubernetes clusters locally, and maximizing the utilization of computing resources.

In the underlying implementation of this solution, opening up the network between IDC and the public cloud is an important link. A Kubernetes cluster may contain many computing nodes in different network environments, such as computing nodes in an IDC network environment and a public cloud VPC network environment. In order to shield the differences between different network environments at the bottom, TKE proposed a hybrid cloud container network solution. At the container level, it sees a unified network plane, so that Pod does not need to perceive whether it is a computing node running on IDC or a computing node in the public cloud.

The TKE hybrid cloud container network supports both Overlay networks based on the VxLAN tunnel mode and Underlay networks based on direct routing. When customers do not want to change their IDC basic network facilities, they can use the Overlay network; when customers have high requirements for the performance of the hybrid cloud container network, they can use the Underlay network based on direct routing. This article will detail the challenges and solutions faced by the container network in the hybrid cloud, and introduce the implementation of the TKE hybrid cloud container overlay network. Next, there will be an article separately introducing the implementation of the TKE hybrid cloud container Underlay network, so stay tuned.

Challenges facing hybrid cloud container networks

In a hybrid cloud scenario, the various components of a Kubernetes cluster may be distributed on different network planes:

  • Master Network: A network plane running ApiServer and other control plane components
  • VPC Network: A network plane containing TKE public cloud cluster computing nodes
  • IDC Network: a network plane that contains customer IDC computing nodes

In the complex scenario of hybrid cloud, how to open up links between different network planes poses challenges to container network design:

VPC Network and IDC Network communicate with each other

In a hybrid cloud scenario, a Kubernetes cluster may include both public cloud computing nodes under the VPC Network and computing nodes under the IDC Network. Getting through these node networks in different network environments is the basis for the interconnection of the upper-layer container network.

IDC Network actively visits Master Network

One of the most common scenarios in the Kubernetes environment is that the Kubelet of the computing node will connect to the ApiServer in the Master Network to obtain cluster-related status and report node information, which requires IDC Network to be able to actively access the Master Network.

Master Network actively access IDC Network

For debugging in the Kubernetes environment, we often use kubectl logs and kubectl exec to obtain application Pod logs and log in directly to the operating environment of the application Pod. Taking kubectl exec as an example, the following figure shows the implementation principle of this type of command: when kubectl exec is executed, a request is first initiated to the ApiServer, which is forwarded to the kubelet process on the node where the Pod is located, and then forwarded to the exec interface of the runtime.

If the above mechanism wants to run successfully, it requires a network path between the ApiServer in the Master Network and the kubelet on the computing node, allowing the ApiServer to actively access the kubelet. In addition kubectl exec and kubectl log commands, kube-scheduler the Extender mechanism and ApiServer of Admission WebHook mechanism are dependent on the Master Network and network computing nodes get through.

How to shield the differences in the underlying network and unify the container network

In a hybrid cloud scenario, a Kubernetes cluster may include not only public cloud nodes under the VPC Network, but also IDC nodes under the IDC Network, or even public cloud nodes from other cloud vendors, or even edge nodes in environmental scenarios. Sometimes customers do not want to change the basic network settings of their IDC environment, but hope to have a unified container network.

TKE Hybrid Cloud Network Solution

In order to solve the challenges faced by the container network in the hybrid cloud scenario, the Tencent Cloud container team designed the TKE hybrid cloud container network solution with Cilium as the cluster network base. Cilium redesigned the cloud-native network based on eBPF technology, bypassing iptables, and provided a complete set of solutions in terms of network, observability, and security. Cilium can support tunnel-based Overlay networks and direct routing-based Underlay networks, and has superior performance in terms of large-scale service expansion. The Tencent Cloud container team started to optimize the Kubernetes network based on eBPF technology . The combination of hybrid cloud network and Cilium is also a further exploration of eBPF technology.

The main features of TKE hybrid cloud container network are as follows:

  • Realize the connection of the full-link container network and shield the differences in the underlying network
  • Supports both Overlay networks based on VxLAN tunnel mode and Underlay networks based on direct routing
  • Implement Service and NetworkPolicy based on eBPF technology to optimize Kubernetes network performance
  • Support custom container IPAM, which can realize multi-segment PodCIDR and PodCIDR dynamic allocation of nodes on demand
  • Support the observability of network links

How to use TKE hybrid cloud container network

On the basic information page of the TKE cluster, click to open "Support for importing third-party nodes", and you need to select the hybrid cloud container network mode. Here we can choose Cilium VxLAN to use hybrid cloud Overlay container network, or choose Cilium BGP to hybrid cloud Underlay container network:

TKE Hybrid Cloud Network Interworking Solution

VPC Network and IDC Network communicate with each other

In order to get through the VPC Network and IDC Network, we recommend using Tencent Cloud’s cloud networking service . The cloud networking service can realize the intercommunication between VPCs on the cloud, VPC and IDC networks, with multi-point interconnection of the whole network, routing self-learning, and chain The ability to select the best route and quickly converge on faults.

IDC Network actively visits Master Network

In order to open up the network link that IDC Network actively accesses to Master Network, we are based on Tencent Cloud PrivateLink to implement Kubelet in IDC Network to actively access ApiServer in Master Network.

Master Network actively access IDC Network

In order to open up the network link of Master Network to actively access IDC Network, we chose the community-based apiserver-network-proxy project to achieve. The specific principle is as follows:

  • Konnectivity Server on Master Network as a proxy server
  • Konnectivity Agent in IDC Network, and establish a long connection with the proxy server in Master Network through PrivateLink
  • When the ApiServer in the Master Network actively accesses the Kubelet of the IDC Network, it will reuse the long connection between the Agent and the Server

At this point, Master Network's initiative to access IDC Network's network connection requirements has also been resolved. Furthermore, based on the same solution, we can manage the computing nodes of the multi-cloud VPC and the edge nodes of the edge scenario to the same control plane to realize a truly distributed cloud.

TKE Hybrid Cloud Overlay Container Network Solution

After the Master Network and IDC Network are connected, we can build the Overlay network through the tunnel mode on this basis. VxLAN is a tunnel encapsulation protocol widely used in data center networks. It encapsulates data packets through MAC in UDP and decapsulates them at the opposite end. Cilium's tunnel encapsulation protocol supports VxLAN and Geneve, and VxLAN is used by default. Based on the high scalability of VxLAN, only need to open up the network between nodes, you can achieve a unified container network on top of this.

Cross-node Pod access to each other

When the data packet is sent from the Pod's network port, it passes through the veth pair to the lxc00aa network port. The eBPF program mounted on the lxc00aa network port finds that the destination address of the data packet is the remote endpoint, and forwards it to cilium_vxlan for packetization. After the packet is packaged, the outer address is the node's IP, and the inner address is the Pod IP, which is forwarded to the opposite node through the physical network port on the node. After arriving at the opposite node, the cilium_vxlan through 060ecfab21c4a2, and the data packet is forwarded to the opposite Pod.

Node access to remote Pod

Pod for access to the remote node from the local node, on this machine will be forwarded by node routing to cilium_host network port, in cilium_host the mounted eBPF program forwards the packet to cilium_vxlan tunnel packets and then transmitted to the peer. It can be seen that after the packet is packaged, the outer address is the node's IP, the inner source IP address is CiliumHostIP, and the destination IP address is the target Pod IP address. The following links are the same as the previous ones, and will not be repeated here.

Pod access non-ClusterCIDR network

When the Pod on the computing node accesses the network address of the non-container ClusterCIDR, after the data packet arrives at the lxc00aa network port from the Pod network port, the eBPF program finds that the target address is not the ClusterCIDR of the container network, and the overlay packet will not be carried out, but will be forwarded to the protocol The stack takes node routing. By setting the masquerade parameter of cilium, after the data packet arrives at the physical network port of the node, masquerade is executed for the data packet with this destination address, replacing the source address of the data packet with the node IP, so that the data packet can be returned to the node and finally returned To Pod.

Summary and outlook

This article introduces the complex scenarios and performance challenges faced by the hybrid cloud container network under the public cloud cluster adding third-party IDC node services in the TKE hybrid cloud scenario, and proposes a container network solution based on Cilium's Overlay. It can be seen that this solution is not only suitable for adding IDC nodes, but also for heterogeneous clusters (multi-cloud, edge scenarios) under hybrid clouds, and can solve the management problems and experience problems caused by different cluster network plug-ins under hybrid clouds. As a result, the combination of hybrid cloud and container is no longer just a hybrid cloud, but can achieve the unification of multi-cloud scenarios, IDC scenarios, and edge scenarios. It is a real and ubiquitous distributed cloud.

Overlay's hybrid cloud container network can shield the differences between different underlying network environments through the tunnel mode, so that the container level sees a unified network level. For another part of customers, they have very high requirements for the performance of the hybrid cloud container network and do not want to introduce the performance loss caused by the packaging and unpacking process of Overlay. In this case, the customer hopes to open up the container network through direct routing based directly on the Underlay network. Next, I will introduce the implementation of the Underlay network based on BGP direct routing of the TKE hybrid cloud container network, so stay tuned.

Reference

  1. Mind the Gap: Here Comes Hybrid Cloud
  2. Kubernetes scheduler extender
  3. Kubernetes addmision controllers
  4. CNI Benchmark: Understanding Cilium Network Performance
  5. Tencent Cloud bypasses conntrack and uses eBPF to enhance IPVS to optimize K8s network performance
  6. Tencent Cloud Cloud Networking Service CCN
  7. Kubernetes apiserver-network-proxy
  8. RFC 7348: Virtual eXtensible Local Area Network (VXLAN)
Container Service TKE: You can use the stable, secure, efficient, and flexibly scalable Kubernetes container platform on Tencent Cloud without having to build it yourself.

账号已注销
350 声望974 粉丝