Kubernetes Network from the shallower to the deeper
Mainly to learn in-depth analysis of the notes of the Kubernetes column class.
One, stand-alone container network
noun
- Network stack: The network stack includes the network card (
Network Interface
), loopback device (Loopback Device
), routing table (Routing Table
) andIptables
rules. For a process, these elements constitute the basic environment for it to initiate and respond to network requests. - Bridge (
Bridge
):bridge
is a virtual network device, so it has the characteristics of a network device. It can be configured withIP
andMAC
addresses;Bridger
is a virtual switch with similar functions as a physical switch. It is a device that works at the data link layer. Veth Pair
: Virtual network cable, used to connect the container to the bridge; after it is created, it alwaysVeth Peer
), and the data packet sent from one of the network cards will automatically appear On the corresponding network card, even if these two \_*network cards\_ are in differentNetwork Namespace
.ARP
: It is a protocol to find the corresponding layer 2MAC
IP
CAM table: The virtual switch (here is the bridge) learns and maintains the port corresponding to the
MAC
MAC
Host network
As a container, you can use -net=host
host computer Network Namespace
.
$ docker run -d -net=host --name nginx-1 nginx
The advantage of using the Host
network is that the network performance is better, and the network stack of the host is directly used. The disadvantage is that it will introduce problems of sharing network resources, such as port conflicts. Therefore, in most cases, we all hope to use the network stack in our Network Namespace and have our own IP and port.
How to communicate
As shown in the figure above, the communication process of a single-node container network is described. The following describes the interaction process in detail based on the access process of C1->C2
# 先创建两个容器,用于模拟发起请求,启动两个centos容器,并在里面安装net-tools工具,才可以使用ifconfig命令
# 创建C1,并安装net-tools
$ docker run -d -it --name c1 centos /bin/bash
$ docker exec -it c1 bash
$ [root@60671509044e /]# yum install -y net-tools
# 创建C2,并安装net-tools
$ docker run -d -it --name c2 centos /bin/bash
$ docker exec -it c2 bash
$ [root@94a6c877b01a /]# yum install -y net-tools
After the containers
C1
andC2
started, there is a default routing rule in the container, and all requests in the current container network segment will go to theeth0
network card device.- C1
# 进入c1容器,查看ip以及路由表
$ docker exec -it c1 bash
# 查看IP
$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.7 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:ac:11:00:07 txqueuelen 0 (Ethernet)
RX packets 6698 bytes 9678058 (9.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3518 bytes 195061 (190.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# 查看路由
$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default _gateway 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
C2
# 进入C2容器查看IP和路由表
$ docker exec -it c2 bash
$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.8 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:ac:11:00:08 txqueuelen 0 (Ethernet)
RX packets 6771 bytes 9681937 (9.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3227 bytes 179347 (175.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# 查看路由
$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default _gateway 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
The above containers IP
have their own 0609dd9aeecc1a and MAC
addresses, and each container has a default route _gateway
pointing to the eth0
network card; and _gateway
has a corresponding MAC
address that already exists in the local ARP
cache.
- Need to use network communication between the host
MAC
address, which is the data link layer to identify the host mode,C1
accessC2
time will start the localARP
lookup cache if thereC2
container correspondingIP:172.17.0.3
theMAC
address. If not, it will initiate theARP
protocol to find theMAC
address.
# c1 -> c2 ,先发起ARP请求查找MAC地址,可以在容器中查看ARP缓存对应IP的MAC
$ docker exec -it c1 bash
# 先查看本地的ARP缓存
$ [root@94a6c877b01a /]# arp
Address HWtype HWaddress Flags Mask Iface
_gateway ether 02:42:2e:8d:21:d6 C eth0
# 执行ping命令就会发起ARP寻址请求
$ ping 172.17.0.8
# 再查询本地arp缓存,发现已经有MAC地址存在了
$ [root@60671509044e /]# arp
Address HWtype HWaddress Flags Mask Iface
172.17.0.8 ether 02:42:ac:11:00:08 C eth0
_gateway ether 02:42:2e:8d:21:d6 C eth0
ARP
addressing process: C1
container initiates ARP
request, after entering the local routing protocol, the request will be routed to the bridge. At this time, the bridge ( Bridge
) acts as a virtual switch, and the virtual switch will ARP
to others inserted into the bridge C2
receiving the ARP
agreement, 0609dd9aeecd92 will reply to the MAC
address.
- Find
C2
ofMAC
after the address can initiate communication.
Two, cross-host container communication
Cross-host container communication is divided into two network structuresOverlay
andUnderlay
according to whether they rely on the underlying network environmentOverlay
network requires only that the network between the hosts is reachable, and the hosts are not required to be in the same layer 2 domain;Underlay
There are requirements for the underlying infrastructure. According to the implementation method, there are different requirements for the underlying network infrastructure. For example, theFlanan host-gw
component requires the hosts to be in the same layer 2 domain, that is, the hosts must be connected to a switch.
noun
Overlay Network
(overlay network): On top of the existing host network, a virtual network that covers the host network and connects all containers is built through software.Tun device (Tunnel device): In
Linux
, theTUN
device is a virtual network device that works at the third layer (Network Layer
Tun
IP
package between the operating system kernel and user applications.VXLAN
: Virtual Extensible Local Area Network (Virtual Extensible LAN
), is a network virtualization technology supported by theLINUX
VXLAN
completely realizes the encapsulation and decapsulation of network data packets in the kernel mode.VTEP
: Virtual tunnel endpoint device, which has bothIP
andMAC
addresses.BGP
: Border Gateway Protocol (Border Gateway Protocol
), which is aLinux
kernel and specifically used in large-scale data centers to maintain routing information between different autonomous systems.
Cross-host communication
For cross-host container communication, Overlay Network
used to realize cross-host container communication. There are many ways to implement Overlay Network
Overlay mode
1. Three-layer Flannel UDP
Flannel UDP
mode is the simplest and most easily implemented container cross-main network solution provided byFlannel
But it is of great reference significance for understandingOverlay
Let's take an example to describe the process of network access. In this process, there are two hosts and four containers. We need to request Container-4
Container-1
container.
Container-1
container initiates a request to the Container-4
Docker0
is located at Root Network Namespace
, through veth peer
one end is connected to the container Network Namespace
connected to the Docker0
virtual network device Root Netwrok Namespace
- The container
100.96.1.2
accesses100.96.2.2
. Since the destination address is not in the network segment of theDocker0
bridge (you know that the target container is not on this bridgeARP
IP
will executeContainer-1
the default routing rules in the container. It isdefault via 172.17.0.1 dev eth0
as follows. Corresponds to step 1 in the figure above.
# 容器中默认设置的的路由规则,
[root@94a6c877b01a /]# ip route
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2
# 下一跳是172.17.0.1且从eth0设备上出去,通过查看docker的网络,172.17.0.1就是bridge设备的网关IP
lengrongfu@MacintoshdeMacBook-Pro ~ % docker network ls
NETWORK ID NAME DRIVER SCOPE
e522990979b3 bridge bridge local
# 查看网络
lengrongfu@MacintoshdeMacBook-Pro ~ % docker inspect network e522990979b3
[
{
"Name": "bridge",
"Id": "e522990979b365e9df4d967c3600483e598e530361deb28513b6e75b8b66bedf",
"Created": "2021-04-12T12:11:57.321486866Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"94a6c877b01ac3a1638f1c5cde87e7c58be9ce0aafd4a78efcb96528ab00ed94": {
"Name": "c2",
"EndpointID": "a5c12fb3800991228f8dc3a2a8de1d6f4865439701a83558e4430c2aebf783a8",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
- After entering the
Docker0
bridge, the route on the host depends on how to go. Here's what the host's routing table, access to the targetIP
to100.96.2.2
device will hit second match rule, meaning that access100.96.0.0/16
data segment toflannel0
equipment, and is willing toIP
is100.96.1.0
. Correspond to step 2 in the figure above.
# Node1路由表
$ ip route
1 Default via 10.168.0.1 dev eth0
2 100.96.0.0/16 dev flannel0 proto kernel scope link src 100.96.1.0
3 100.96.1.0/24 dev docker0 proto kernel scope link src 100.96.1.1
4 10.168.0.0/24 dev eth0 proto kernel scope link src 10.168.0.2
Flannel0 device
The above mentioned that
Flannel0
is aTUN
virtual three-layer network device, which is mainly used to transfer theIP
packet between the kernel mode and the user mode; continue according to the above process analysis, after the data message reaches theFlannel0
device from the kernel mode, it will be passed to the creationFlannel0
process equipment isFlannelD
process, thenflanneld
process see the destination address is100.96.2.2
, put the packets sent toNode2
on the nodeflanneld
process of listeningUDP
port.flanneld
will encapsulate the data to be sent into aUDP
data packet and send it out. Correspond to steps 3, 4, 5, and 6 in the figure above.- How does the
flanneld
100.96.2.2
ip
onNode2
? This is because it uses the subnet. When each node is started, it will be assigned a subnet segment. Through the subnet, it can be determined that thisip
belongs to that node. , The subnet is stored inetcd
.
- How does the
Node2
theflanneld
process on 0609dd9aeed540 receives the data packet, it will be sent to theflannel0
device. This is a process from user mode to kernel mode, so theLinux
kernel network protocol stack will be responsible for processing thisIP
packet. The specific processing method isIP
for the next flow of this 0609dd9aeed54a packet through the local routing table. Correspond to steps 7 and 8 in the figure above.
# node2上的路由表
$ ip route
1 default via 10.168.0.1 dev eth0
2 100.96.0.0/16 dev flannel0 proto kernel scope link src 100.96.2.0
3 100.96.2.0/24 dev docker0 proto kernel scope link src 100.96.2.1
4 10.168.0.0/24 dev eth0 proto kernel scope link src 10.168.0.3
- By analyzing the target
ip
is100.96.2.2
match, he third and routing rule more accurately, the route rule means destined to100.96.2.0/24
network data packet todocker0
device up and set the sourceIP
to100.96.2.1
. Correspond to step 9 in the figure above. - After the data packet enters the
docker0
device, thedocker0
bridge will play the role of a two-layer switch and send the data packet to the correctveth pair
pair. After entering this device, it enters theContaniner-2
network protocol stack. Corresponds to step 10 in the figure above.
Flannel UDP
mode provides a three-layer Overlay
network. It first IP
packet at the UDP
end with 0609dd9aeed681, and then decapsulates it at the receiving end to get the original IP
packet, and then IP
packet to the destination container.
Flannel UDP
mode has serious performance problems. The main problem is that because the TUN
device is used, only in IP
packet, it needs to go through three data copies between the user mode and the kernel mode.
2. Three-layer Calico ipip
3. Layer 2 + Layer 3 VXLAN
VXLAN
network is to cover a virtual layer 2 network maintainedVXLAN
module on top of the existing layer 3 networkVXLAN
layer 2 network can be like Communicate freely as in the same local area network.In order to open the tunnel on the Layer 2 network,
VXLAN
will set up a special network device on the host machine as the two ends of the tunnel, this device is calledVTEP
, the full name is: (VXLAN Tun End Poin
) virtual tunnel endpoint.The role of the
VTEP
flanneld
process, which is to encapsulate and decapsulate data packets, except that it encapsulates and decapsulates the two-layer data frame, and this workflow is all done in the kernel.
Underlay mode
1. Three-layer BGP
The above figure is a typical BGP
network topology diagram. By using Route1
and Route2
as border routing gateways, and writing other LAN
routing information into the current routing, the LAN
can be achieved to achieve the full three-layer network. through.
Calico BGP usage
After understanding BGP
, Calico
project is very easy to understand. It treats each host node as a boundary route, so the routing information of all other nodes is stored on each node, let’s analyze it Its realization, it consists of three parts:
Calico
plug-in ofCNI
, this is the docking partCaclico
andKubernetes
BIRD
isBGP
, which is responsible for distributing routing information in the cluster.Felix
, which is aDemoset
, is responsible for inserting routing rules on the host machine (FIB
Linux
kernel), and maintaining the network equipment required forCalico
Calico BGP
mode and Flannel host-gw
different modes, Calico
not created any virtual bridge device, Calico
work adopts the following chart to illustrate.
Calico BGP
pattern shown in FIG interaction network as described above, as container1
need access Container3
, we analyze how to reach the network. Because the cni0
virtual bridge device is veth
device pair is in the Network Namespace
of the container, and one end is in the container network space of the host.
- First of all, the
Calico CNI
plug-in also needs toVeth Pair
device of each container on each host, because it accepts incomingIP
packets, such as:
# 192.168.0.2节点上的路由信息有
$ ip route
10.20.0.2 dev cali1 scope link
10.20.0.3 dev cali2 scope link
# 192.168.0.3节点上的路由信息有
$ ip route
10.20.1.2 dev cali3 scope link
10.20.1.3 dev cali4 scope link
- There are other node routing protocols broadcast by
BGP
on each node, such as:
# 192.168.0.2上有一条指向192.169.0.3的路由
$ ip route
10.20.1.0/24 via 192.168.0.3 dev eth0
# 192.168.0.3上有一条指向192.168.0.2的路由
$ ip route
10.20.0.0/24 via 192.168.0.2 dev eth0
- The default
Calico BGP
uses theNode to Node
mode, which will cause the connection on each node to increase byN^2
Sudoku. Generally, it is recommended to use less than100
nodes in a cluster. In a large-scale cluster,Route Reflector
is needed. All routes are reported to a central node, and other nodes are synchronized from the central node.
Calico BGP
mode, Flannel host-gw
mode, has a dependency on the basic network facilities and requires that the cluster hosts be reachable at the second layer. If in a different between a host LAN
, the need to use Calico ipip
of Overlay
pattern of.
2. Layer 2 VLAN
3、Flannel host-gw
Flannel host-gw
mode A picture can explain the realization principle between them clearly.
CNI0
device is a three-layer switch with the function of a two-layer switch and an independentIP
flannel
to Daemonset
way to start each node on a Flanneld
process for maintaining routing information on each node, implementation is local
For example: The 192.168.1.0/24 via 10.20.0.3 dev eth0
route defines the next hop to 192.168.1.0/24
10.20.0.3
and exits from the eth0
device.
Then when the IP
packet is encapsulated into a frame and sent out, the next hop in the routing table will be used to set the destination MAC
address; in this way, the destination host can be reached through the Layer 2 network.
Because he will use the next hop destination MAC
address, so it requires the host to be connected at the second layer, it is better to use the ARP
protocol to use IP
to obtain the MAC
address.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。