Author | Zhao Mingshan (Li Heng)

头图.jpg

Preface


OpenKruise is Alibaba Cloud's open source cloud native application automation management suite, and it is also a Sandbox project currently hosted under the Cloud Native Computing Foundation (CNCF). It comes from the accumulation of Alibaba’s containerization and cloud-native technology for many years. It is a standard extension component based on Kubernetes for large-scale applications in Alibaba’s internal production environment. Best practice.

OpenKruise in 2021.5.20 has released the latest version v0.9.0 (ChangeLog), the last article we introduced new Pod restart, delete and other heavy protective function , today to introduce another core properties, based on SidecarSet The previous version extended support specifically for Service Mesh scenarios.

Background: How to upgrade the Mesh container independently


SidecarSet is a workload provided by Kruise to independently manage Sidecar containers. Users can conveniently accomplished by SidecarSet of Sidecar container automatically injected and independent upgrade , please refer to: OpenKruise official website

By default, Sidecar's independent upgrade sequence is to stop the old version of the container first, and then create the new version of the container. This method is especially suitable for Sidecar containers that do not affect the availability of Pod services, such as log collection agents, but for many proxies or runtime Sidecar containers, such as Istio Envoy, this upgrade method is problematic. Envoy acts as a Proxy container in the Pod to proxy all traffic. In this scenario, if you restart the upgrade directly, the availability of the Pod service will inevitably be affected. Therefore, the release and capacity of the application need to be considered, and it cannot be used as a sidecar completely independent of the application. Released.
1--1.png

The tens of thousands of Pods in Alibaba Group are based on Service Mesh to communicate with each other. Since the upgrade of Mesh containers will cause the unavailability of business Pods, the upgrade of Mesh containers will greatly hinder the iteration of Service Mesh. In response to this scenario, we cooperated with the Service Mesh team within the group to achieve the hot upgrade capability of the Mesh container. This article will focus on the important role SidecarSet plays in the process of realizing the hot upgrade capability of mesh containers.

SidecarSet helps mesh container non-destructive heat upgrade

Mesh containers cannot be directly upgraded in place like log collection containers. The reasons are: Mesh containers must provide services to the outside without interruption, and independent upgrade methods will cause Mesh services to be unavailable for a period of time. Although there are some well-known Mesh services in the community, such as Envoy, Mosn, etc., which can provide smooth upgrade capabilities by default, these upgrade methods cannot be properly combined with cloud native, and kubernetes itself lacks an upgrade plan for such Sidecar containers. .

OpenKruise SidecarSet provides a Sidecar hot upgrade mechanism for this type of Mesh container, which can help Mesh containers achieve a lossless hot upgrade through cloud-native methods.

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: hotupgrade-sidecarset
spec:
  selector:
    matchLabels:
      app: hotupgrade
  containers:
    - name: sidecar
      image: openkruise/hotupgrade-sample:sidecarv1
      imagePullPolicy: Always
      lifecycle:
        postStart:
          exec:
            command:
              - /bin/sh
              - /migrate.sh
      upgradeStrategy:
        upgradeType: HotUpgrade
        hotUpgradeEmptyImage: openkruise/hotupgrade-sample:empty
  • upgradeType : HotUpgrade means that the type of the sidecar container is Hot upgrade, that is, the hot upgrade program.
  • HotUpgradeEmptyImage : When the Sidecar container is hot upgraded, the business needs to provide an empty container for container switching during the hot upgrade process. The Empty container has the same configuration as the Sidecar container (except for the mirror address), such as command, lifecycle, probe, etc.

SidecarSet hot upgrade mechanism mainly includes two processes of injecting hot upgrade Sidecar container and Mesh container smooth upgrade.

Inject the heat to upgrade the Sidecar container

For sidecar containers of the hot upgrade type, the SidecarSet Webhook will inject two containers when the Pod is created:

  • {Sidecar.name} -1: As shown in the figure below, envoy -1, this container represents the sidecar container that is actually working, for example: envoy:1.16.0
  • {Sidecar.name} -2: As shown in the figure below, envoy-2, this container is the HotUpgradeEmptyImage container provided by the service, for example: empty :1.0

2-2.png

The Empty container mentioned above does not do any actual work while the Mesh container is running.

Smooth upgrade of Mesh containers

The hot upgrade process is mainly divided into the following three steps:

  1. Upgrade : Replace Empty container with the latest version of Sidecar container, for example: envoy-2.Image = envoy:1.17.0
  2. Migration : Execute the PostStartHook script of the Sidecar container to complete the smooth upgrade of the mesh service
  3. Reset : After the Mesh service is smoothly upgraded, replace the old version of the Sidecar container with an Empty container, for example: envoy-1.Image = empty: 1.0

Only the above three steps are required to complete the entire process of the hot upgrade. If you perform multiple hot upgrades on the Pod, repeat the above three steps.

3-3.png

Migration core logic

SidecarSet hot upgrade mechanism not only completes the switch of the Mesh container, but also provides a coordination mechanism for the new and old versions (PostStartHook), but it is only the first step in the long march. The Mesh container also needs to provide the PostSartHook script to complete the Mesh service itself. Smooth upgrade (above Migration process), such as: Envoy hot restart, Mosn lossless restart.

Mesh containers generally provide services to the outside world by listening to fixed ports. esh containers can be summarized as: Passing ListenFD through UDS, stopping Accpet, and starting to drain . For Mesh containers that do not support hot restart, you can refer to this process to complete the transformation. The logic diagram is as follows:
4-4.png

Hot upgrade Migration Demo

The external services provided by different Mesh containers and the internal implementation logic are different, and the specific migration is also different. The above logic is just a summary of some of the main points. I hope it will be of benefit to everyone in need. At the same time, it is on Github. We also provide a hot upgrade Migration Demo for reference, some of the key codes will be introduced below.

1. Negotiation mechanism

Mesh container first startup logic requires determines the first start or hot upgrade smooth migration , Mesh container in order to reduce communication costs, Kruise two environment variables into a sidecar two containers SIDECARSET_VERSION and SIDECARSET_VERSION_ALT, by determining two The value of an environment variable is used to determine whether it is a hot upgrade process and whether the current sidecar container is a new version or an old version.

// return two parameters:
// 1. (bool) indicates whether it is hot upgrade process
// 2. (bool ) when isHotUpgrading=true, the current sidecar is newer or older
func isHotUpgradeProcess() (bool, bool) {
    // 当前sidecar容器的版本
    version := os.Getenv("SIDECARSET_VERSION")
    // 对端sidecar容器的版本
    versionAlt := os.Getenv("SIDECARSET_VERSION_ALT")
    // 当对端sidecar容器version是"0"时,表明当前没有在热升级过程
    if versionAlt == "0" {
        return false, false
    }
    // 在热升级过程中
    versionInt, _ := strconv.Atoi(version)
    versionAltInt, _ := strconv.Atoi(versionAlt)
    // version是单调递增的int类型,新版本的version值会更大
    return true, versionInt > versionAltInt
}

2. ListenFD migration

between different containers through Unix Domain Socket. This step of 160d1ada8dfda0 is also a very critical step in the hot upgrade. The code example is as follows:

// 为了代码的简洁,所有的失败都将不捕获

/* 老版本sidecar通过Unix Domain Socket迁移ListenFD到新版本sidecar */
// tcpLn *net.TCPListener
f, _ := tcpLn.File()
fdnum := f.Fd()
data := syscall.UnixRights(int(fdnum))
// 与新版本sidecar容器通过 Unix Domain Socket建立链接
raddr, _ := net.ResolveUnixAddr("unix", "/dev/shm/migrate.sock")
uds, _ := net.DialUnix("unix", nil, raddr)
// 通过UDS,发送ListenFD到新版本sidecar容器
uds.WriteMsgUnix(nil, data, nil)
// 停止接收新的request,并且开始排水阶段,例如:http2 GOAWAY
tcpLn.Close()

/* 新版本sidecar接收ListenFD,并且开始对外服务 */
// 监听 UDS
addr, _ := net.ResolveUnixAddr("unix", "/dev/shm/migrate.sock")
unixLn, _ := net.ListenUnix("unix", addr)
conn, _ := unixLn.AcceptUnix()
buf := make([]byte, 32)
oob := make([]byte, 32)
// 接收 ListenFD
_, oobn, _, _, _ := conn.ReadMsgUnix(buf, oob)
scms, _ := syscall.ParseSocketControlMessage(oob[:oobn])
if len(scms) > 0 {
    // 解析FD,并转化为 *net.TCPListener 
    fds, _ := syscall.ParseUnixRights(&(scms[0]))
    f := os.NewFile(uintptr(fds[0]), "")
    ln, _ := net.FileListener(f)
    tcpLn, _ := ln.(*net.TCPListener)
    // 基于接收到的Listener开始对外提供服务,以http服务为例
    http.Serve(tcpLn, serveMux)
}

Known Mesh container hot upgrade cases

Alibaba Cloud Service Mesh (ASM) provides a fully managed service mesh platform that is compatible with the community Istio open source service mesh. Currently, based on the hot upgrade capability of OpenKruise SidecarSet, ASM has realized the data plane Sidecar hot upgrade capability (Beta), users can complete the data plane version upgrade of the service grid without application sense, the official version will also be launched soon . In addition to the hot upgrade capability, ASM also supports configuration diagnosis, operation audit, access log, monitoring, service registration access and other capabilities to enhance the service grid experience in an all-round way. You are welcome to try it out.

to sum up

The hot upgrade of Mesh containers in cloud native has always been an urgent but thorny issue. The solution in this article is only an exploration of Alibaba Group on this issue. While giving feedback to the community, I also hope that it can be a source of inspiration for everyone. Thinking of the scene. At the same time, we also welcome more students to participate in the OpenKruise community to jointly build a richer and more complete K8s application management and delivery expansion capabilities, which can be oriented to more large-scale, complex, and extreme performance scenarios.

5-5.png


阿里云云原生
1.1k 声望326 粉丝