How does K8s provide more efficient and stable orchestration capabilities? Analysis of K8s Watch Implementation Mechanism

about us

For more cases and knowledge about cloud native, you can pay attention to the public account of the same name [Tencent Cloud Native]~

Welfare:

① Reply to the [Manual] in the background of the official account, you can get the "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~

②The official account will reply to [series] in the background, and you can get "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency enhancement, K8s performance optimization practices, best practices and other series.

③If you reply to the [White Paper] in the background of the official account, you can get the "Tencent Cloud Container Security White Paper" & "The Source of Cost Reduction - Cloud Native Cost Management White Paper v1.0"

④ Reply to [Introduction to the Speed of Light] in the background of the official account, you can get a 50,000-word essence tutorial of Tencent Cloud experts, Prometheus and Grafana of the speed of light.

author

Wang Cheng, Tencent Cloud R&D engineer, Kubernetes member, engaged in database product containerization, resource management and control, etc., focusing on Kubernetes, Go, and cloud native fields.

content

1 Overview

2. Start with HTTP

2.1 Content-Length

2.2 Chunked Transfer Encoding

2.3 HTTP/2

3. APIServer start

4.etcD resource encapsulation

5. Client Watch Implementation

6. Server-side Watch implementation

7. Summary

Overview

Entering the world of K8s, you will find that almost all objects are abstracted as resources, including K8s Core Resources (Pod, Service, Namespace, etc.), CRD, and resource types extended by APIService. At the same time, the bottom layer of K8s uniformly abstracts these resources into RESTful storage (Storage). operations (get/post/put/patch/delete, etc.).

The K8s Watch API is a mechanism for continuously monitoring the changes of resources. When there is any change in the resources, it can be delivered to the client in real time, in sequence and reliably, so that users can flexibly apply and operate the target resources.

How is the K8s Watch mechanism implemented? What technologies does the bottom layer depend on?

This article analyzes the implementation mechanism of K8s Watch from the aspects of HTTP protocol, APIServer startup, ETCD Watch packaging, server Watch implementation, and client Watch implementation.

An overview of the process is as follows:

This article and subsequent related articles are based on K8s v1.23

Start with HTTP

Content-Length

As shown in the figure below, when HTTP sends a Request or Server Response, it will carry Content-Length in the HTTP header to indicate the total data length of this transmission. If the length of Content-Length is inconsistent with the actual transmission length, an exception will occur (more than the actual value will timeout, less than the actual value will truncate and may cause subsequent data parsing confusion).

 curl baidu.com -v

> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: baidu.com
> Accept: */*

< HTTP/1.1 200 OK
< Date: Thu, 17 Mar 2022 04:15:25 GMT
< Server: Apache
< Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT
< ETag: "51-47cf7e6ee8400"
< Accept-Ranges: bytes
< Content-Length: 81
< Cache-Control: max-age=86400
< Expires: Fri, 18 Mar 2022 04:15:25 GMT
< Connection: Keep-Alive
< Content-Type: text/html

<html>
<meta http-equiv="refresh" content="0;url=http://www.baidu.com/">
</html>

What if the server does not know the total length of the data to be transmitted in advance?

Chunked Transfer Encoding

HTTP has added Chunked Transfer Encoding since 1.1, which breaks data into a series of data chunks and sends them in one or more chunks, so that the server can send the data without knowing the total size of the content in advance. The length of the data block is expressed in hexadecimal, followed by \r\n, followed by the block data itself, followed by \r\n, and the termination block is a block of length 0.

 > GET /test HTTP/1.1
> Host: baidu.com
> Accept-Encoding: gzip

< HTTP/1.1 200 OK
< Server: Apache
< Date: Sun, 03 May 2015 17:25:23 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< Content-Encoding: gzip

4\r\n        (bytes to send)
Wiki\r\n     (data)
6\r\n        (bytes to send)
pedia \r\n   (data)
E\r\n        (bytes to send)
in \r\n
\r\n
chunks.\r\n  (data)
0\r\n        (final byte - 0)
\r\n         (end message)

In order to watch the server resource changes in the way of streaming, the HTTP1.1 server will tell the client in the Header to change the Transfer-Encoding to chunked, and then transfer in chunks until the server sends data with a size of 0. .

HTTP/2

HTTP/2 does not use Chunked Transfer Encoding for streaming, but introduces Frame (frame) as a unit for transmission. Its data completely changes the original encoding and decoding method, and the whole method is similar to many RPC protocols. Frame is encoded in binary, and the bytes in the fixed position of the frame header describe the length of the body, and the body can be read until Flags encounters END_STREAM. This method naturally supports the server to send data on the Stream without notifying the client to make changes.

 +-----------------------------------------------+
|                 Body Length (24)                   | ----Frame Header
+---------------+---------------+---------------+
|   Type (8)    |   Flags (8)   |
+-+-------------+---------------+-------------------+
|R|                 Stream Identifier (31)          |
+=+=================================================+
|                   Frame Payload (0...)        ...    ----Frame Data
+---------------------------------------------------+

In order to make full use of the high-performance Stream features of HTTP/2 in Server-Push and Multiplexing, K8s provides HTTP1.1/HTTP2 protocol negotiation (ALPN, Application-Layer Protocol Negotiation) mechanism when implementing RESTful Watch. HTTP2 is preferred, and the negotiation process is as follows:

 curl  https://{kube-apiserver}/api/v1/watch/namespaces/default/pods/mysql-0 -v

* ALPN, offering h2
* ALPN, offering http/1.1
* SSL verify...
* ALPN, server accepted to use h2
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f2b921a6a90)
> GET /api/v1/watch/namespaces/default/pods/mysql-0 HTTP/2
> Host: 9.165.12.1
> user-agent: curl/7.79.1
> accept: */*
> authorization: Bearer xxx
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!

< HTTP/2 200 
< cache-control: no-cache, private
< content-type: application/json
< date: Thu, 17 Mar 2022 04:46:36 GMT

{"type":"ADDED","object":{"kind":"Pod","apiVersion":"v1","metadata":xxx}}

APIServer start

The APIServer is started using the Cobra command line, parsing the relevant flags parameters, and after the Complete (filling the default value) -> Validate (checking) logic, the service is started through Run. The startup entry is as follows:

 // kubernetes/cmd/kube-apiserver/app/server.go
// NewAPIServerCommand creates a *cobra.Command object with default parameters
func NewAPIServerCommand() *cobra.Command {
   s := options.NewServerRunOptions()
   cmd := &amp;cobra.Command{
      Use: "kube-apiserver",
      ...
      RunE: func(cmd *cobra.Command, args []string) error {
         ...
         // set default options
         completedOptions, err := Complete(s)
         if err != nil {
            return err
         }

         // validate options
         if errs := completedOptions.Validate(); len(errs) != 0 {
            return utilerrors.NewAggregate(errs)
         }

         return Run(completedOptions, genericapiserver.SetupSignalHandler())
      },
   }
   ...

   return cmd
}

In the Run function, the APIServer chains (APIExtensionsServer, KubeAPIServer, and AggregatorServer) are initialized in sequence to serve resource requests corresponding to CRD (user-defined resources), K8s API (built-in resources), and API Service (API extension resources). The relevant code is as follows:

 // kubernetes/cmd/kube-apiserver/app/server.go
// 创建 APIServer 链(APIExtensionsServer、KubeAPIServer、AggregatorServer)，分别服务 CRD、K8s API、API Service
func CreateServerChain(completedOptions completedServerRunOptions, stopCh <-chan struct{}) (*aggregatorapiserver.APIAggregator, error) {
   // 创建 APIServer 通用配置
   kubeAPIServerConfig, serviceResolver, pluginInitializer, err := CreateKubeAPIServerConfig(completedOptions)
   if err != nil {
      return nil, err
   }
   ...

   // 第一：创建 APIExtensionsServer
   apiExtensionsServer, err := createAPIExtensionsServer(apiExtensionsConfig, genericapiserver.NewEmptyDelegateWithCustomHandler(notFoundHandler))
   if err != nil {
      return nil, err
   }

   // 第二：创建 KubeAPIServer
   kubeAPIServer, err := CreateKubeAPIServer(kubeAPIServerConfig, apiExtensionsServer.GenericAPIServer)
   if err != nil {
      return nil, err
   }
   ...

   // 第三：创建 AggregatorServer
   aggregatorServer, err := createAggregatorServer(aggregatorConfig, kubeAPIServer.GenericAPIServer, apiExtensionsServer.Informers)
   if err != nil {
      // we don't need special handling for innerStopCh because the aggregator server doesn't create any go routines
      return nil, err
   }

   return aggregatorServer, nil
}

After that, start SecureServingInfo.Serve in non-blocking (NonBlockingRun) mode, configure HTTP2 (default enabled) related transmission options, and finally start Serve to listen for client requests.

For security reasons, K8s APIServer only supports client HTTPS requests, not HTTP.

ETCD resource encapsulation

ETCD implements the Watch mechanism and has undergone a transition from ETCD2 to ETCD3. ETCD2 monitors resource event changes by means of long-polling; ETCD3 implements Watch stream through HTTP2-based gRPC, and its performance has been greatly improved.

Polling: Since http1.x does not have a server-side push mechanism, in order to watch the data changes on the server-side, the easiest way is of course for the client to pull: the client goes to the server for data synchronization at regular intervals, regardless of whether There is no data change on the server side. However, there must be problems of untimely notification and a large number of invalid polling.
Long-Polling: It is an optimization based on this polling. When the client initiates Long-Polling, if the server has no relevant data, it will hold the request until the server has data to send or times out. will return.

When configuring APIServerConfig in the previous step, the ETCD used for the underlying storage is encapsulated. Take kubeAPIServerConfig as an example to illustrate how the built-in resources of K8s encapsulate the underlying storage of ETCD.

First, instantiate RESTOptionsGetter through buildGenericConfig to encapsulate RESTStorage. After that, instantiate the RESTStorage of K8s built-in resources through InstallLegacyAPI -> NewLegacyRESTStorage, including podStorage, nsStorage, pvStorage, serviceStorage, etc., which are used for back-end resource storage called by APIServer when processing client resource requests.

The source code of InstallLegacyAPI is as follows:

 // kubernetes/pkg/controlplane/instance.go
// 注册 K8s 的内置资源，并封装到对应的 RESTStorage(如 podStorage/pvStorage)
func (m *Instance) InstallLegacyAPI(c *completedConfig, restOptionsGetter generic.RESTOptionsGetter) error {
   ...
   legacyRESTStorage, apiGroupInfo, err := legacyRESTStorageProvider.NewLegacyRESTStorage(c.ExtraConfig.APIResourceConfigSource, restOptionsGetter)
   if err != nil {
      return fmt.Errorf("error building core storage: %v", err)
   }
   if len(apiGroupInfo.VersionedResourcesStorageMap) == 0 { // if all core storage is disabled, return.
      return nil
   }

   controllerName := "bootstrap-controller"
   coreClient := corev1client.NewForConfigOrDie(c.GenericConfig.LoopbackClientConfig)
   bootstrapController, err := c.NewBootstrapController(legacyRESTStorage, coreClient, coreClient, coreClient, coreClient.RESTClient())
   if err != nil {
      return fmt.Errorf("error creating bootstrap controller: %v", err)
   }
   m.GenericAPIServer.AddPostStartHookOrDie(controllerName, bootstrapController.PostStartHook)
   m.GenericAPIServer.AddPreShutdownHookOrDie(controllerName, bootstrapController.PreShutdownHook)

   ...
   return nil
}

In the instantiated ETCD underlying storage, whether to enable Watch cache is controlled by switching EnableWatchCache. If it is enabled, the StorageWithCacher logic will be used first, and then UndecoratedStorage will actually call the underlying ETCD3 storage.

K8s currently only supports ETCD3 and no longer supports ETCD2. K8s fully trusts ETCD3's Watch mechanism to ensure the consistency of resource status and ETCD underlying storage.

The whole calling process is as follows:

All K8s resources (CRD/Core/Aggregator) expose HTTP request interfaces in a RESTful style, and support multiple types of encoding and decoding formats, such as json/yaml/protobuf.

Client Watch Implementation

After the above steps, the APIServer server has prepared the RESTStorage of various K8s resources (the bottom layer encapsulates ETCD3). At this time, the client can send resource requests to APIServer through the RESTful HTTP interface, including GET/POST/PATCH/WATCH/DELETE, etc. operate.

Client Watch includes:
(1). kubectl get xxx -w, to obtain a certain type of resource and continuously monitor resource changes;
(2). Various resources of Reflector ListAndWatch APIServer in client-go;

We take kubectl get pod -w as an example to illustrate how the client implements the watch operation of resources.

First, kubectl also parses the parameters (--watch, or --watch-only) through the Cobra command line, then calls Run to call the Watch interface under the cli-runtime package, and then sends a Watch request to APIServer through RESTClient.Watch to get a stream watch.Interface, and then keep getting watch.Event from ResultChan in it. After that, according to the codec type (json/yaml/protobuf) sent by the client, read and decode (Decode) data by frame (Frame) from the stream, and display the output to the command line terminal.

The client initiates a Watch request through RESTClient, the code is as follows:

 // kubernetes/staging/src/k8s.io/cli-runtime/pkg/resource/helper.go
func (m *Helper) Watch(namespace, apiVersion string, options *metav1.ListOptions) (watch.Interface, error) {
   options.Watch = true
   return m.RESTClient.Get().
      NamespaceIfScoped(namespace, m.NamespaceScoped).
      Resource(m.Resource).
      VersionedParams(options, metav1.ParameterCodec).
      Watch(context.TODO())
}

The client Watch implementation process is summarized as follows:

Server Watch Implementation

After the server APIServer is started, it has been continuously monitoring the change events of various resources. After receiving a Watch request for a certain type of resource, call the Watch interface of RESTStorage, control whether to enable Watch cache by switching EnableWatchCache, and finally realize the underlying Event change event of ETCD through etcd3.Watch package.

RESTStorage is the ETCD resource storage that is registered and packaged in advance when the APIServer starts.

etcd3.watcher realizes the conversion from etcd bottom event to watch.Event through two channels (incomingEventChan, resultChan, the default capacity is 100), and then stream-listens the returned watch.Interface through serveWatch, and continuously extracts change events from resultChan. After that, according to the codec type (json/yaml/protobuf) sent by the client, the encoded (Encode) data is assembled by frame (Frame) and sent to the stream to the client.

The server stream monitors the returned watch.Interface through serveWatch. The code is as follows:

 // kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go
func ListResource(r rest.Lister, rw rest.Watcher, scope *RequestScope, forceWatch bool, minRequestTimeout time.Duration) http.HandlerFunc {
   return func(w http.ResponseWriter, req *http.Request) {
      ...

      if opts.Watch || forceWatch {
         ...
         watcher, err := rw.Watch(ctx, &amp;opts)
         if err != nil {
            scope.err(err, w, req)
            return
         }
         requestInfo, _ := request.RequestInfoFrom(ctx)
         metrics.RecordLongRunning(req, requestInfo, metrics.APIServerComponent, func() {
            serveWatch(watcher, scope, outputMediaType, req, w, timeout)
         })
         return
      }
      ...
   }
}

K8s abandoned the action.Verb of WATCH/WATCHLIST type after v1.11, and handed it over to LIST -> restfulListResource for processing.

The server-side Watch implementation process is summarized as follows:

In addition to supporting HTTP2, APIServer also supports WebSocket communication. When the client request contains Upgrade: websocket, Connection: Upgrade, the server will transmit data with the client through WebSocket.

It is worth noting that the underlying ETCD event is converted to watch.Event through the transform function, including the following types (Type):

summary

This article analyzes the implementation mechanism of K8s Watch by analyzing the core processes of APIServer startup, ETCD watch encapsulation, server Watch implementation, and client Watch implementation in K8s. The relevant process logic is explained through source code and pictures, in order to better understand the implementation details of K8s Watch.

The bottom layer of K8s fully trusts ETCD (ListAndWatch), abstracts all kinds of resources into RESTful storage (Storage), obtains the change events of various resources through the Watch mechanism, and then distributes them to the downstream monitoring ResourceEventHandler through the Informer mechanism, and finally realizes the resources by the Controller. business logic processing. With the continuous optimization and improvement of ETCD3 on the basis of HTTP/2, K8s will provide more efficient and stable orchestration capabilities.

References

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

How does K8s provide more efficient and stable orchestration capabilities? Analysis of K8s Watch Implementation Mechanism

about us

Welfare:

author

content

Overview

Start with HTTP

Content-Length

Chunked Transfer Encoding

HTTP/2

APIServer start

ETCD resource encapsulation

Client Watch Implementation

Server Watch Implementation

summary

References

账号已注销

引用和评论

Serverless AI绘画技术沙龙【深圳站】火热报名中

DeepSeek 从热潮到应用，腾讯云携手行业专家共探 AI 下一步

2025免费云服务器盘点

信息安全风云录，AI 时代安全江湖如何见招拆招？

腾讯云TVP AI与安全高峰论坛圆满落幕，共探大模型时代的安全破局之道

腾讯云cos大文件上传服务端实现一篇搞定

具身智能全解读，从实验室到产业化 | TVP技术夜未眠