Pod 就地升级4--如何判断Pod就地升级完成

上一章我们讲了，kubelet如何通过计算容器的hash来判断是否需要升级容器。

本文主要讲如何判断Pod就地升级完成。

在ContainerStatus中不仅有hash filed。还记录了ImageID filed。

// Status represents the status of a container.
type Status struct {
    // ID of the container.
    ID ContainerID
    // Name of the container.
    Name string
    // Status of the container.
    State State
    // Creation time of the container.
    CreatedAt time.Time
    // Start time of the container.
    StartedAt time.Time
    // Finish time of the container.
    FinishedAt time.Time
    // Exit code of the container.
    ExitCode int
    // Name of the image, this also includes the tag of the image,
    // the expected form is "NAME:TAG".
    Image string
    // ID of the image.
    ImageID string
    // Hash of the container, used for comparison.
    Hash uint64
    // Number of times that the container has been restarted.
    RestartCount int
    // A string explains why container is in such a status.
    Reason string
    // Message written by the container before exiting (stored in
    // TerminationMessagePath).
    Message string
}

关于ImageID的更加详细的讲述，大家可以参考这个系列文章的第一篇。

我们任意查看一个Pod的具体信息：

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-07-10T02:33:03Z"
  generateName: nginx-574b87c764-
  labels:
    app: nginx
    pod-template-hash: 574b87c764
  name: nginx-574b87c764-gf4tx
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: nginx-574b87c764
    uid: c0a0499b-808e-4aa9-a3c3-2ecdb823a19b
  resourceVersion: "2088142"
  selfLink: /api/v1/namespaces/default/pods/nginx-574b87c764-gf4tx
  uid: a5e4d703-3a01-4712-a64a-2814848d9dff
spec:
  containers:
  - image: nginx:1.14.2
    imagePullPolicy: IfNotPresent
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-lzpd4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-172-17-186-211.ap-south-1.compute.internal
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-lzpd4
    secret:
      defaultMode: 420
      secretName: default-token-lzpd4
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-10T02:33:03Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-07-10T02:33:06Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-07-10T02:33:06Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-07-10T02:33:03Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://918275efc39411f454b7dd8c7f1e3cbd0e6eba00370b0bcaaac0b1b9a5dd868d
    image: nginx:1.14.2
    imageID: docker-pullable://nginx@sha256:f7988fb6c02e0ce69257d9bd9cf37ae20a60f1df7563c3a2a6abe24160306b8d
    lastState: {}
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-07-10T02:33:05Z"
  hostIP: 172.17.186.211
  phase: Running
  podIP: 172.17.166.134
  podIPs:
  - ip: 172.17.166.134
  qosClass: BestEffort
  startTime: "2020-07-10T02:33:03Z"

如上 imageID: docker-pullable://nginx@sha256:f7988fb6c02e0ce69257d9bd9cf37ae20a60f1df7563c3a2a6abe24160306b8d 。

具体的流程如下：

在更新Pod的某个container镜像之前，获取对应container的 ImageID
然后通过Annotations的形式将旧的 ImageID 记录下来
更新spec当中的镜像
定期去检查Pod的ContainerStatus中对应的container 当中的ImageID是否和旧的ImageID相等。如果相等，这说明更新没有完成。相反，如果不相等了，那说明就地更新已经完成

结论

至此，关于实现就地升级的几个关键点基本已经讲完。

就地升级的流程大致如下：

通过Readiness gate，设置其condition 为false。这样该Pod便不再就绪，那么其的地址会从Endpoints当中剔除，不再服务流量。
更改Spec.containers[i].image 为新的镜像。
并将旧container的ContainerStatus中的ImageID记录下来。
kubectl 通过计算容器的hash，发现期望container 已经发生变化，便kill该容器，使用新的Image启动新的容器。
控制器定期检查Pod新container的ContainerStatus中的ImageID是否和之前记录的相等，如果不等，则已经成功更新了新的Image。
修改Readiness gate，设置其condition 为true。如果该容器的readiness probe 也通过，那么该Pod就绪，开始服务流量。

Pod 就地升级4--如何判断Pod就地升级完成

结论

iyacontrol

引用和评论

关于多集群Kubernetes的一些思考

docker 安装 php-fpm 服务 / 扩展 / 配置

记录下安装open-eBackup过程

docker 打包 php 应用

🔥吐血整理 Bolt.diy 部署与应用攻略

【Docker】基本概念及语法与环境搭建

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s