The latest, most complete and detailed summary of K8S study notes (the latest version of 2021)! Recommended collection

Although Docker already very powerful, there are still many inconveniences in actual use, such as cluster management, resource scheduling, file management, and so on. So in such a blooming container era, many solutions have emerged, such as Mesos, Swarm, Kubernetes, etc., among which Google's open source Kubernetes exists as a big brother.

Kubernetes has become the king in the field of container orchestration. It is a container-based cluster orchestration engine with multiple features such as cluster expansion, rolling upgrade and rollback, elastic scaling, automatic healing, and service discovery.

kubernetes introduction

The core problem solved by Kubernetes

Service discovery and load balancing
- Kubernetes can use the DNS name or its own IP address to expose the container. If the traffic to the container is large, Kubernetes can load balance and distribute network traffic, thus making the deployment stable.
Storage orchestration
- Kubernetes allows you to automatically mount the storage system of your choice, such as local storage, public cloud providers, etc.
Automatic deployment and rollback
- You can use Kubernetes describe the desired state of the deployed container, which can change the actual state to the desired state at a controlled rate. For example, you can automate Kubernetes to create new containers for your deployment, delete existing containers and use all their resources for new containers.
Automatic binary packaging
- Kubernetes allows you to specify the CPU and memory (RAM) required for each container. When the container specifies a resource request, Kubernetes can make better decisions to manage the container's resources.
Self-repair
- Kubernetes restarts the failed container, replaces the container, kills the container that does not respond to user-defined health checks, and does not notify the client until it is ready for service.
Key and configuration management
- Kubernetes allows you to store and manage sensitive information such as passwords, OAuth tokens, and ssh keys. You can deploy and update the key and application configuration without rebuilding the container image, and there is no need to expose the key in the stack configuration.

The emergence of Kubernetes not only dominates the container orchestration market, but also changes the way of operation and maintenance in the past. It not only blurs the boundary between development and operation and maintenance, but also makes the role of DevOps clearer. Every software engineer has You can use Kubernetes to define the topological relationship between services, the number of online nodes, and the amount of resource usage, and it can quickly realize horizontal expansion, blue-green deployment and other complex operation and maintenance operations in the past.

Knowledge Graph

Mainly introduce what knowledge to learn

Software Architecture

Traditional client-server architecture

Architecture description

Kubernetes follows a very traditional client/server architecture model. The client can Kubernetes cluster through a RESTful interface or directly using kubectl. There is not much difference between the two in fact, and the latter is only for Kubernetes. The provided RESTful API is packaged and provided. Each Kubernetes cluster is composed of a set of Master nodes and a series of Worker nodes. The Master node is mainly responsible for storing the state of the cluster and allocating and scheduling resources for Kubernetes objects.

Master Node Service-Master Architecture

As the Master node that manages the state of the cluster, it is mainly responsible for receiving client requests, arranging the execution of the container and running the control loop, and migrating the state of the cluster to the target state. The master node is composed of the following three components:

API Server: Responsible for processing requests from users. Its main role is to provide RESTful interfaces to the outside world, including read requests for viewing the cluster status and write requests for changing the cluster status. It is also the only component that communicates with the etcd cluster.

etcd: It is a key-value database with both consistency and high availability. It can be used as a backend database for storing all cluster data in Kubernetes.

Scheduler: The component on the master node that monitors those newly created Pods that are not designated to run nodes, and selects the node to let the Pod run on it. Factors considered in scheduling decisions include resource requirements for individual Pod and Pod sets, hardware/software/policy constraints, affinity and anti-affinity specifications, data location, interference between workloads, and deadlines.

controller-manager: Run the components of the controller on the master node. Logically speaking, each controller is a separate process, but in order to reduce complexity, they are all compiled into the same executable file and integrated in one Run in process. These controllers include: node controllers (responsible for notification and response when nodes fail), replica controllers (responsible for maintaining the correct number of Pods for each replica controller object in the system), endpoint controllers (filling endpoint Endpoints) Object, that is, add Service and Pod)), service account and token controller (create a default account and API access token for the new namespace).

Work Node-Node Architecture

The implementation of other Worker nodes is relatively simple, and it is mainly composed of two parts: kubelet and kube-proxy.

Kubelet: It is the agent that the worker node performs operations, responsible for specific container lifecycle management, manages the container based on the information obtained from the database, and reports the running status of the pod, etc.

kube-proxy: is a simple network access proxy, but also a Load Balancer. It is responsible for specifically assigning the request for access to a certain service to the Pod of the same type of label on the working node. The essence of kube-proxy is to implement Pod mapping by operating firewall rules (iptables or ipvs).

Container Runtime: The container runtime environment is the software responsible for running containers. Kubernetes supports multiple container runtime environments: Docker, containerd, cri-o, rktlet, and any implementation of Kubernetes CRI (container runtime interface).

Component description

Mainly introduce some basic concepts about K8s

It is mainly composed of the following core components:

apiserver
- The only entrance to all services, providing authentication, authorization, access control, API registration and discovery mechanisms
controller manager
- Responsible for maintaining the state of the cluster, such as the expected number of replicas, failure detection, automatic expansion, rolling updates, etc.
scheduler
- Responsible for resource scheduling, scheduling Pod to the corresponding machine according to the predetermined scheduling strategy
etcd
- Key-value pair database, which saves the state of the entire cluster
kubelet
- Responsible for maintaining the life cycle of the container, as well as for the management of Volume and network
kube-proxy
- Responsible for providing service discovery and load balancing within the cluster for Service
Container runtime
- Responsible for image management and real operation of Pod and container

In addition to the core components, there are some recommended plugins:

CoreDNS
- A DNS service that resolves the correspondence between domain names and IPs can be created for SVCs in the cluster
Dashboard
- Provides a B/S architecture access entrance to the K8s cluster
Ingress Controller
- Officials can only implement a four-layer network proxy, while Ingress can implement a seven-layer proxy
Prometheus
- Provide K8s cluster with the ability to monitor resources
Federation
- Provide a unified management function that can multi-K8s across cluster centers, and provide clusters across availability zones

The above content reference link: https://www.escapelife.site/posts/2c4214e7.html

installation

The installation of v1.16.0 version was successful. Record it here to prevent latecomers from stepping on the pit.

In this article, the installation steps are as follows:

Install docker-ce 18.09.9 (all machines)
Set k8s environmental preconditions (all machines)
Install k8s v1.16.0 master management node
Install k8s v1.16.0 node worker node
Install flannel (master)

For detailed installation steps, refer to: 16082afc17a983 CentOS builds K8S, it is successful at one time, and it is collected!
For the cluster installation tutorial, please refer to: The latest and most detailed version of the whole network is based on the V1.20 version, no pit deployment minimizes K8S cluster tutorial

Pod implementation principle

Pod is the smallest and simplest Kubernetes object

Pod, Service, Volume, and Namespace are the four basic objects in a Kubernetes cluster. They can represent the applications, workloads, network and disk resources deployed in the system, and jointly define the state of the cluster. Many other resources in Kubernetes actually only combine these basic objects.

Pod -> the basic unit in the cluster
Service -> Solve the problem of how to access services in Pod
Volume -> storage volume in the cluster
Namespace -> Namespace provides virtual isolation for clusters

For details, please refer to: Kubernetes Pod Implementation Principle

Harbor warehouse

Kuternetes enterprise-level Docker private warehouse Harbor tool.
Every component of Harbor is built in the form of a Docker container, and Docker Compose is used to deploy it. The Docker Compose template used to deploy Harbor is located in /Deployer/docker-compose.yml. It consists of 5 containers. These containers are connected together in the form of Docker link, and the containers can access each other through the container name. For end users, only need to expose the service port of Proxy (ie Nginx).

Proxy
- Reverse proxy composed of Nginx server
Registry
- A container instance composed of Docker's official open source official open source Registry image
UI
- That is, the core services in the architecture, and the code that constitutes this container is the main body of the Harbor project
MySQL
- Database container composed of official MySQL mirror
Log
- The container running rsyslogd collects logs from other containers in the form of log-driver

Details please refer to the steps and structures: enterprise environments based Harbor to build

YAML syntax

YAML is a very concise/powerful/specialized language for writing configuration files!

The full name of YAML is the recursive abbreviation of "YAML Ain't a Markup Language". The design of the language refers to languages such as JSON/XML and SDL, emphasizing data-centric, concise and easy to read, and simple to write.

YAML syntax features

It should be very easy for people who have studied programming to understand

Grammatical characteristics

Case Sensitive
Show hierarchical relationship through indentation
Prohibit the use of tab indentation, only use the space bar
The number of indented spaces is not important, as long as the same level is left aligned
Use # to indicate comments

Recommend to everyone an article: Kubernetes YAML syntax , this article is very detailed, there are many examples.

Resource list

All content in K8S is abstracted as resources, and resources are called objects after they are instantiated.

In the Kubernetes system, Kubernetes objects are persistent entities, Kubernetes use these entities to represent the state of the entire cluster. In particular, they describe the following information:

Which containerized applications are running and on which Node
Resources that can be used by the application
Strategies for application runtime performance, such as restart strategies, upgrade strategies, and fault tolerance strategies

Kubernetes 16082afc17ae3f objects are "target records" Kubernetes system will continue to work to ensure that the object exists. By creating an object, you are essentially telling the Kubernetes system what the required cluster workload looks like, and this is the expected state of the Kubernetes cluster.

Kubernetes resource list for detailed introduction, see here

Resource Controller

The preparation of Kubernetes resource controller configuration files is the most important part of learning K8S!

The resource quota controller ensures that the specified resource objects will never exceed the configured resources, can effectively reduce the probability of the entire system downtime, enhance the robustness of the system, and play a very important role in the stability of the entire cluster.

Kubernetes Resource Controller User Guide Manual

Service discovery

In Kubernetes, in order to achieve load balancing between service instances and service discovery between different services, a Service object is created, and an Ingress object is created for accessing the cluster from outside the cluster.

Kubernetes service discovery

Ingress service

We all know that the traditional SVC only supports the code above the fourth layer, and can't do anything about the code on the seventh layer. For example, we use the K8S cluster to provide HTTPS services externally. For convenience and convenience, we need to configure SSL encryption on the external Nginx service, but when sending the request to the back-end service, perform the certificate uninstallation operation. The HTTP protocol is processed. In the face of this problem, K8S provided Ingress (K8S was introduced in version 1.11) to handle it.

For more details, please refer to: Kubernetes' Ingress Service , which introduces the installation method of the Ingress service, configures the HTTP proxy access of the Ingress service, introduces the BasicAuth authentication method of the Ingress service, and introduces the method of Ingress rule rewriting.

data storage

In the previous article, we already know a lot of components in K8S, including resource controllers. In the resource controller, we talked about the StatefulSet controller component, which is specially created for stateful services, and where should the corresponding storage be stored?

Introduce the common storage mechanisms in K8S that we can use: Kubernetes data storage

Cluster scheduling

There is a requirement that the configuration of multiple services in the cluster is inconsistent. This leads to uneven resource allocation. For example, we need some service nodes to run computationally intensive services, and some service nodes to run services that require a lot of memory. Of course, related services are also configured in k8s to deal with the above-mentioned problems, that is, Scheduler.

Scheduler is the scheduler of kubernetes. Its main task is to assign defined Pods to the nodes of the cluster. It sounds very simple, but there are many issues to consider:

fair
- How to ensure that each node can be allocated resources
Efficient use of resources
- Maximize all resources of the cluster to be used
effectiveness
- The scheduling performance is good, and the scheduling work can be completed for a large number of Pods as soon as possible
flexible
- Allow users to control the logic of scheduling according to their own needs

The Scheduler runs as a separate program. After startup, it will maintain the API Server, get the Pod whose PodSpec.NodeName is empty, and create a binding for each Pod, indicating which node the Pod should be placed on.

For detailed introduction, please refer to: Kubernetes cluster scheduling

kubectl user guide

kubectl is a client of Kubernetes, you can use it to directly operate the Kubernetes cluster.

In the daily use of Kubernetes , the kubectl tool may be the most commonly used tool, so when we spend a lot of time researching and learning Kuernetes , then it is very necessary for us to understand how to use it efficiently It's up.

From a user point of view, kubectl is the cockpit that controls Kubernetes, which allows you to perform all possible Kubernetes operations; from a technical point of view, kubectl is just a client of the Kubernetes API.

Kubernetes API is an HTTP REST API service. This API service is the real user interface of . Therefore, 16082afc17b180 Kubernetes performs actual control through this API. This means that every Kubernetes will be exposed through API endpoints, and of course, you can perform corresponding operations by making HTTP requests to these API ports. Therefore, the main job of is to execute the HTTP request of the 16082afc17b184 Kubernetes

Tool usage parameters

get       #显示一个或多个资源
describe  #显示资源详情
create    #从文件或标准输入创建资源
update   #从文件或标准输入更新资源
delete   #通过文件名、标准输入、资源名或者 label 删除资源
log       #输出 pod 中一个容器的日志
rolling-update  #对指定的 RC 执行滚动升级
exec  #在容器内部执行命令
port-forward #将本地端口转发到 Pod
proxy   #为 Kubernetes API server 启动代理服务器
run     #在集群中使用指定镜像启动容器
expose   #将 SVC 或 pod 暴露为新的 kubernetes service
label     #更新资源的 label
config   #修改 kubernetes 配置文件
cluster-info #显示集群信息
api-versions #以”组/版本”的格式输出服务端支持的 API 版本
version       #输出服务端和客户端的版本信息
help         #显示各个命令的帮助信息
ingress-nginx  #管理 ingress 服务的插件(官方安装和使用方式)

Use related configuration

# Kubectl自动补全
$ source <(kubectl completion zsh)
$ source <(kubectl completion bash)

# 显示合并后的 kubeconfig 配置
$ kubectl config view

# 获取pod和svc的文档
$ kubectl explain pods,svc

Create resource objects

Step-by-step creation

# yaml
kubectl create -f xxx-rc.yaml
kubectl create -f xxx-service.yaml

# json
kubectl create -f ./pod.json
cat pod.json | kubectl create -f -

# yaml2json
kubectl create -f docker-registry.yaml --edit -o json

One-time creation

kubectl create -f xxx-service.yaml -f xxx-rc.yaml

Create according to the definition of all the yaml files in the directory

kubectl create -f <目录>

Use url to create resources

kubectl create -f https://git.io/vPieo

View resource objects

View all Node or Namespace objects

kubectl get nodes
kubectl get namespace

View all Pod objects

# 查看子命令帮助信息
kubectl get --help

# 列出默认namespace中的所有pod
kubectl get pods

# 列出指定namespace中的所有pod
kubectl get pods --namespace=test

# 列出所有namespace中的所有pod
kubectl get pods --all-namespaces

# 列出所有pod并显示详细信息
kubectl get pods -o wide
kubectl get replicationcontroller web
kubectl get -k dir/
kubectl get -f pod.yaml -o json
kubectl get rc/web service/frontend pods/web-pod-13je7
kubectl get pods/app-prod-78998bf7c6-ttp9g --namespace=test -o wide
kubectl get -o template pod/web-pod-13je7 --template={{.status.phase}}

# 列出该namespace中的所有pod包括未初始化的
kubectl get pods,rc,services --include-uninitialized

View all RC objects

kubectl get rc

View all Deployment objects

# 查看全部deployment
kubectl get deployment

# 列出指定deployment
kubectl get deployment my-app

View all Service objects

kubectl get svc
kubectl get service

View Pod objects under different Namespaces

kubectl get pods -n default
kubectl get pods --all-namespace

View resource description

Show Pod details

kubectl describe pods/nginx
kubectl describe pods my-pod
kubectl describe -f pod.json

View Node details

kubectl describe nodes c1

View Pod information associated with RC

kubectl describe pods <rc-name>

Update patching resources

Rolling update

# 滚动更新 pod frontend-v1
kubectl rolling-update frontend-v1 -f frontend-v2.json

# 更新资源名称并更新镜像
kubectl rolling-update frontend-v1 frontend-v2 --image=image:v2

# 更新 frontend pod 中的镜像
kubectl rolling-update frontend --image=image:v2

# 退出已存在的进行中的滚动更新
kubectl rolling-update frontend-v1 frontend-v2 --rollback

# 强制替换; 删除后重新创建资源; 服务会中断
kubectl replace --force -f ./pod.json

# 添加标签
kubectl label pods my-pod new-label=awesome

# 添加注解
kubectl annotate pods my-pod icon-url=http://goo.gl/XXBTWq

Patch resources

# 部分更新节点
kubectl patch node k8s-node-1 -p '{"spec":{"unschedulable":true}}'

# 更新容器镜像；spec.containers[*].name 是必须的，因为这是合并的关键字
kubectl patch pod valid-pod -p \
    '{"spec":{"containers":[{"name":"kubernetes-serve-hostname","image":"new image"}]}}'

Scale resources

# Scale a replicaset named 'foo' to 3
kubectl scale --replicas=3 rs/foo

# Scale a resource specified in "foo.yaml" to 3
kubectl scale --replicas=3 -f foo.yaml

# If the deployment named mysql's current size is 2, scale mysql to 3
kubectl scale --current-replicas=2 --replicas=3 deployment/mysql

# Scale multiple replication controllers
kubectl scale --replicas=5 rc/foo rc/bar rc/baz

Delete resource object

Delete Pod object based on xxx.yaml file

# yaml文件名字按照你创建时的文件一致
kubectl delete -f xxx.yaml

Delete the pod object that includes a label

kubectl delete pods -l name=<label-name>

Delete the service object that includes a certain label

kubectl delete services -l name=<label-name>

Delete the pod and service objects that include a certain label

kubectl delete pods,services -l name=<label-name>

Delete all pod/services objects

kubectl delete pods --all
kubectl delete service --all
kubectl delete deployment --all

Edit resource files

Edit any API resource in the editor

# 编辑名为docker-registry的service
kubectl edit svc/docker-registry

Directly execute commands

On the host, execute the command directly without entering the container

Execute the date command of the pod, and execute it in the first container of the pod by default

kubectl exec mypod -- date
kubectl exec mypod --namespace=test -- date

Specify a container in the pod to execute the date command

kubectl exec mypod -c ruby-container -- date

Into a container

kubectl exec mypod -c ruby-container -it -- bash

View container logs

View log directly

# 不实时刷新kubectl logs mypod
kubectl logs mypod --namespace=test

View log refresh in real time

kubectl logs -f mypod -c ruby-container

Management tools

Kubernetes is continuously accelerating its application in cloud native environments, but how to manage Kubernetes clusters running anywhere in a unified and secure manner is facing challenges, and effective management tools can greatly reduce the difficulty of management.

K9s

k9s is a terminal-based resource dashboard. It has only a command line interface. No matter what you do on the Kubernetes dashboard Web UI, you can use the K9s dashboard tool to perform the same operation on the terminal. k9s continues to pay attention to the Kubernetes cluster and provides commands to use the resources defined on the cluster.

Detailed introduction: Kubernetes cluster management tool K9S

Recommendation: 7 tools for easy management of Kubernetes clusters

Best Practices in Production Environment

Use some Kubernetes strategies to apply best practices in security, monitoring, networking, governance, storage, container lifecycle management, and platform selection. Let's take a look at some of Kubernetes production best practices. Running Kubernetes in production is not easy; there are several aspects to be aware of.

Are the survival probes and ready probes used for health checks?

Managing large distributed systems can be complicated, especially when there is a problem, we cannot be notified in time. In order to ensure the normal operation of the application instance, it is important to set the Kubernetes health check.

By creating a custom operation health check, you can effectively avoid the operation of zombie services in a distributed system, which can be adjusted according to the environment and needs.

The purpose of the ready probe is to let Kubernetes know whether the application is ready to serve traffic. Kubernetes will always ensure that after the ready probe passes, it will start allocating services and send traffic to the Pod.

Liveness-survival probe

How do you know if your application is alive or dead? Survival probes allow you to do this. If your application dies, Kubernetes will remove the old Pod and replace it with the new Pod.

Resource Management-Resource Management

It is a good practice to specify resource requests and limits for a single container. Another good practice is to divide the Kubernetes environment into separate namespaces for different teams, departments, applications, and clients.

Kubernetes resource usage

Kubernetes resource usage refers to the number of resources used by containers/pods in production.

Therefore, it is very important to pay close attention to the resource usage of pods. One obvious reason is cost, because the higher the use of resources, the less the waste of resources.

Resource utilization

The Ops team usually wants to optimize and maximize the percentage of resources consumed by pods. Resource usage is one of the indicators of the actual optimization degree of the Kubernetes environment.

You can think that the average CPU and other resource utilization of containers running in the Kubernetes

Enable RBAC

RBAC stands for role-based access control. It is a method used to restrict access and admission of users and applications on the system/network. They introduced RBAC from Kubernetes 1.8 version. Use rbac.authorization.k8s RBAC to create authorization policies.

In Kubernetes, RBAC is used for authorization. Using RBAC, you will be able to grant permissions to users, accounts, add/remove permissions, set rules, etc. Therefore, it basically adds an extra layer of security Kubernetes RBAC restricts who can access your production environment and cluster.

Cluster provisioning and load balancing

Production-level Kubernetes infrastructure usually needs to consider certain key aspects, such as high availability, multiple hosts, and multiple etcd Kubernetes clusters. The configuration of such clusters usually involves tools such as Terraform or Ansible. Once the cluster is set up and pods are created for running applications, these pods are equipped with load balancers; these load balancers route traffic to services. The open source Kubernetes project is not the default load balancer; therefore, it needs to be integrated with the NGINX Ingress controller and tools such as HAProxy or ELB, or any other tools to expand the Kubernetes Ingress plugin to provide load balancing capabilities.

Add tags to Kubernetes objects

Tags are like key/value pairs attached to objects, such as pods. Tags are used to identify the attributes of objects, which are important and meaningful to users.

When using Kubernetes in production, an important issue that cannot be ignored is labels; labels allow batch query and manipulation of Kubernetes objects. The special thing about tags is that they can also be used to identify Kubernetes objects and organize them into groups. One of the best use cases for this is to group pods according to the application they belong to. Here, the team can build and have any number of labeling conventions.

Configure network policy

When using Kubernetes, setting up network policies is crucial. Network policy is just an object, which enables you to clearly declare and decide which traffic is allowed and which is not allowed. In this way, Kubernetes will be able to block all other unwanted and non-compliant traffic. Defining and restricting network traffic in our cluster is one of the basic and necessary security measures that are strongly recommended.

Each network policy in Kubernetes defines a list of authorized connections as described above. Whenever any network policy is created, all pods referenced by it are eligible to establish or accept the listed connections. Simply put, a network policy is basically a whitelist of authorized and allowed connections-a connection, whether it is to or from a pod, is only allowed if at least one network policy applied to the pod allows it.

Cluster monitoring and logging

When using Kubernetes, monitoring deployment is crucial. It is even more important to ensure that the configuration, performance, and traffic remain safe. Without logging and monitoring, it is impossible to diagnose the problem. To ensure compliance, monitoring and logging have become very important. When monitoring, it is necessary to set up logging functions on each layer of the architecture. The generated logs will help us enable security tools, audit functions, and analyze performance.

Start with a stateless application

Running stateless applications is much simpler than running stateful applications, but as Kubernetes operators continue to grow, this thinking is changing. For teams new to Kubernetes, it is recommended to use stateless applications first.

It is recommended to use a stateless backend, so that the development team can ensure that there are no long-running connections, which increases the difficulty of scaling. Using statelessness, developers can also deploy applications more efficiently and with zero downtime. It is generally believed that stateless applications can be easily migrated and expanded according to business needs.

Automatic expansion and shrinking while starting

Kubernetes has three auto-scaling functions for deployment: horizontal pod auto-scaling (HPA), vertical pod auto-scaling (VPA) and cluster auto-scaling.

The horizontal pod autoscaler automatically expands the number of deployment, replication controller, replicaset, and statefulset based on the perceived CPU utilization.

Vertical pod autoscaling recommends appropriate values for CPU and memory requests and limits, and it can automatically update these values.

Cluster Autoscaler expands and reduces the size of the worker node pool. It adjusts the size of the Kubernetes cluster based on the current utilization.

Control the source of mirror pull

Control the mirror source of all containers running in the cluster. If you allow your Pod to pull images from public sources, you don't know what is actually running in it.

If you extract them from a trusted registry, you can apply policies on the registry to extract safe and certified images.

Keep learning

Constantly evaluate the status and settings of the application to learn and improve. For example, reviewing the historical memory usage of containers can draw the conclusion that we can allocate less memory and save costs in the long run.

Protect important services

Using Pod priority, you can decide how important it is to set different services to run. For example, for better stability, you need to ensure that the RabbitMQ pod is more important than your application pod. Or your ingress controller pods are more important than data processing pods to keep the service available to users.

Zero downtime

By running all services in HA, it supports zero-downtime upgrades of clusters and services. This will also ensure higher availability for your customers.

Use pod anti-affinity to ensure that multiple copies of a pod are scheduled on different nodes, thereby ensuring service availability through planned and unplanned cluster node downtime.

Use the pod Disruptions strategy to ensure that you have the lowest number of Pod copies at all costs!

Plan failed

The hardware will eventually fail, and the software will eventually run. --(Michael Hatton)

in conclusion

As we all know, Kubernetes has actually become the orchestration platform standard in the DevOps Kubernetes responds to the storm generated by the production environment from the perspective of availability, scalability, security, flexibility, resource management, and monitoring. Since many companies use Kubernetes in production, the best practices mentioned above must be followed to scale applications smoothly and reliably.

Content source: https://my.oschina.net/u/1787735/blog/4870582

introduces 5 top Kubernetes log monitoring tools

For newly installed Kubernetes, a problem that often occurs is that the Service does not work properly. If you have already run Deployment and created a Service, but you don't get a response when you try to access it, I hope this document ( most detailed K8s Service in the entire network cannot access the troubleshooting process ) can help you find the problem.

Summary of Kubernetes frequently asked questions

How to delete rc, deployment, and service in an inconsistent state

In some cases, it is often found that the kubectl process hangs, and then it is found that half of it is deleted during get, and the other cannot be deleted.

[root@k8s-master ~]# kubectl get -f fluentd-elasticsearch/
NAME DESIRED CURRENT READY AGE
rc/elasticsearch-logging-v1 0 2 2 15h

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kibana-logging 0 1 1 1 15h
Error from server (NotFound): services "elasticsearch-logging" not found
Error from server (NotFound): daemonsets.extensions "fluentd-es-v1.22" not found
Error from server (NotFound): services "kibana-logging" not found

Delete these deployment, service or rc commands as follows:

kubectl delete deployment kibana-logging -n kube-system --cascade=false

kubectl delete deployment kibana-logging -n kube-system  --ignore-not-found

delete rc elasticsearch-logging-v1 -n kube-system --force now --grace-period=0

How to reset etcd after it cannot be deleted

rm -rf /var/lib/etcd/*

Reboot the master node after deletion.

reset etcd 后需要重新设置网络

etcdctl mk /atomic.io/network/config '{ "Network": "192.168.0.0/16" }'

Failed to start apiserver

The following problems are reported every time:

start request repeated too quickly for kube-apiserver.service

But in fact it is not the startup frequency problem. You need to check /var/log/messages. In my case, it was because the ca.crt and other files could not be found after opening ServiceAccount, which led to startup errors.

May 21 07:56:41 k8s-master kube-apiserver: Flag --port has been deprecated, see --insecure-port instead.
May 21 07:56:41 k8s-master kube-apiserver: F0521 07:56:41.692480 4299 universal_validation.go:104] Validate server run options failed: unable to load client CA file: open /var/run/kubernetes/ca.crt: no such file or directory
May 21 07:56:41 k8s-master systemd: kube-apiserver.service: main process exited, code=exited, status=255/n/a
May 21 07:56:41 k8s-master systemd: Failed to start Kubernetes API Server.
May 21 07:56:41 k8s-master systemd: Unit kube-apiserver.service entered failed state.
May 21 07:56:41 k8s-master systemd: kube-apiserver.service failed.
May 21 07:56:41 k8s-master systemd: kube-apiserver.service holdoff time over, scheduling restart.
May 21 07:56:41 k8s-master systemd: start request repeated too quickly for kube-apiserver.service
May 21 07:56:41 k8s-master systemd: Failed to start Kubernetes API Server.

When deploying log components such as fluentd, many problems are caused by the need to turn on the ServiceAccount option and the need to configure security, so in the final analysis, it is necessary to configure the ServiceAccount.

Permission denied situation

When configuring fluentd, the error “cannot create /var/log/fluentd.log: Permission denied” appears, which is caused by not turning off SElinux security.

You can set SELINUX=enforcing to disabled in /etc/selinux/config, and then reboot

ServiceAccount-based configuration

First generate various required keys, k8s-master needs to be replaced with the host name of master.

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=k8s-master" -days 10000 -out ca.crt
openssl genrsa -out server.key 2048

echo subjectAltName=IP:10.254.0.1 > extfile.cnf

#ip由下述命令决定

#kubectl get services --all-namespaces |grep 'default'|grep 'kubernetes'|grep '443'|awk '{print $3}'

openssl req -new -key server.key -subj "/CN=k8s-master" -out server.csr

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out server.crt -days 10000

If you modify the configuration file parameters of /etc/kubernetes/apiserver, the systemctl start kube-apiserver fails to start, and the error message is:

Validate server run options failed: unable to load client CA file: open /root/keys/ca.crt: permission denied

But you can start API Server through the command line

/usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://k8s-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=true --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount --insecure-bind-address=0.0.0.0 --client-ca-file=/root/keys/ca.crt --tls-cert-file=/root/keys/server.crt --tls-private-key-file=/root/keys/server.key --basic-auth-file=/root/keys/basic_auth.csv --secure-port=443 &>> /var/log/kubernetes/kube-apiserver.log &

Start Controller-manager from the command line

/usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://k8s-master:8080 --root-ca-file=/root/keys/ca.crt --service-account-private-key-file=/root/keys/server.key & >>/var/log/kubernetes/kube-controller-manage.log

ETCD does not start up-problem <1>

etcd is the zookeeper process of the kubernetes cluster. Almost all services depend on the startup of etcd, such as flanneld, apiserver, docker... and the error log when starting etcd is as follows:

May 24 13:39:09 k8s-master systemd: Stopped Flanneld overlay address etcd agent.
May 24 13:39:28 k8s-master systemd: Starting Etcd Server...
May 24 13:39:28 k8s-master etcd: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd:2379,http://etcd:4001
May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag
May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag
May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag
May 24 13:39:28 k8s-master etcd: etcd Version: 3.1.3
May 24 13:39:28 k8s-master etcd: Git SHA: 21fdcc6
May 24 13:39:28 k8s-master etcd: Go Version: go1.7.4
May 24 13:39:28 k8s-master etcd: Go OS/Arch: linux/amd64
May 24 13:39:28 k8s-master etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1
May 24 13:39:28 k8s-master etcd: the server is already initialized as member before, starting as etcd member...
May 24 13:39:28 k8s-master etcd: listening for peers on http://localhost:2380
May 24 13:39:28 k8s-master etcd: listening for client requests on 0.0.0.0:2379
May 24 13:39:28 k8s-master etcd: listening for client requests on 0.0.0.0:4001
May 24 13:39:28 k8s-master etcd: recovered store from snapshot at index 140014
May 24 13:39:28 k8s-master etcd: name = master
May 24 13:39:28 k8s-master etcd: data dir = /var/lib/etcd/default.etcd
May 24 13:39:28 k8s-master etcd: member dir = /var/lib/etcd/default.etcd/member
May 24 13:39:28 k8s-master etcd: heartbeat = 100ms
May 24 13:39:28 k8s-master etcd: election = 1000ms
May 24 13:39:28 k8s-master etcd: snapshot count = 10000
May 24 13:39:28 k8s-master etcd: advertise client URLs = http://etcd:2379,http://etcd:4001
May 24 13:39:28 k8s-master etcd: ignored file 0000000000000001-0000000000012700.wal.broken in wal
May 24 13:39:29 k8s-master etcd: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 148905
May 24 13:39:29 k8s-master etcd: 8e9e05c52164694d became follower at term 12
May 24 13:39:29 k8s-master etcd: newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 12, commit: 148905, applied: 140014, lastindex: 148905, lastterm: 12]
May 24 13:39:29 k8s-master etcd: enabled capabilities for version 3.1
May 24 13:39:29 k8s-master etcd: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store
May 24 13:39:29 k8s-master etcd: set the cluster version to 3.1 from store
May 24 13:39:29 k8s-master etcd: starting server... [version: 3.1.3, cluster version: 3.1]
May 24 13:39:29 k8s-master etcd: raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory
May 24 13:39:29 k8s-master systemd: etcd.service: main process exited, code=exited, status=1/FAILURE
May 24 13:39:29 k8s-master systemd: Failed to start Etcd Server.
May 24 13:39:29 k8s-master systemd: Unit etcd.service entered failed state.
May 24 13:39:29 k8s-master systemd: etcd.service failed.
May 24 13:39:29 k8s-master systemd: etcd.service holdoff time over, scheduling restart.

Core statement:

raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory

Enter the relevant directory, delete 0.tmp, and then you can start it!

ETCD won't start up-timeout issue <2>

Background of the problem: 3 etcd nodes are currently deployed, and suddenly all 3 clusters went down one day. After restarting, I found that the K8S cluster can be used normally, but after checking the components, I found that the etcd of one node could not be started.

After inspection, the time is found to be inaccurate. Use the following command ntpdate ntp.aliyun.com to re-adjust the time to the correct one. Restart etcd, but it still fails to get up. The error is reported as follows:

Mar 05 14:27:15 k8s-node2 etcd[3248]: etcd Version: 3.3.13
Mar 05 14:27:15 k8s-node2 etcd[3248]: Git SHA: 98d3084
Mar 05 14:27:15 k8s-node2 etcd[3248]: Go Version: go1.10.8
Mar 05 14:27:15 k8s-node2 etcd[3248]: Go OS/Arch: linux/amd64
Mar 05 14:27:15 k8s-node2 etcd[3248]: setting maximum number of CPUs to 4, total number of available CPUs is 4
Mar 05 14:27:15 k8s-node2 etcd[3248]: the server is already initialized as member before, starting as etcd member
...
Mar 05 14:27:15 k8s-node2 etcd[3248]: peerTLS: cert = /opt/etcd/ssl/server.pem, key = /opt/etcd/ssl/server-key.pe
m, ca = , trusted-ca = /opt/etcd/ssl/ca.pem, client-cert-auth = false, crl-file =
Mar 05 14:27:15 k8s-node2 etcd[3248]: listening for peers on https://192.168.25.226:2380
Mar 05 14:27:15 k8s-node2 etcd[3248]: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert
files are presented. Ignored key/cert files.
Mar 05 14:27:15 k8s-node2 etcd[3248]: listening for client requests on 127.0.0.1:2379
Mar 05 14:27:15 k8s-node2 etcd[3248]: listening for client requests on 192.168.25.226:2379
Mar 05 14:27:15 k8s-node2 etcd[3248]: member 9c166b8b7cb6ecb8 has already been bootstrapped
Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE
Mar 05 14:27:15 k8s-node2 systemd[1]: Failed to start Etcd Server.
Mar 05 14:27:15 k8s-node2 systemd[1]: Unit etcd.service entered failed state.
Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service failed.
Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service failed.
Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 05 14:27:15 k8s-node2 systemd[1]: Starting Etcd Server...
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_NAME, but unused: shadowed by correspo
nding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corr
esponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_LISTEN_PEER_URLS, but unused: shadowed
 by corresponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadow
ed by corresponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS, but unuse
d: shadowed by corresponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_ADVERTISE_CLIENT_URLS, but unused: sha
dowed by corresponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER, but unused: shadowed
by corresponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER_TOKEN, but unused: sha
dowed by corresponding flag
Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER_STATE, but unused: sha
dowed by corresponding flag

Solution:

Check the log and found that there is no particularly obvious error. According to experience, a broken etcd node has no major impact on the cluster. At this time, the cluster can be used normally, but the broken etcd node has not been started. Solution as follows:

Enter the etcd data storage directory to back up the original data:

cd /var/lib/etcd/default.etcd/member/
cp *  /data/bak/

Delete all data files in this directory

rm -rf /var/lib/etcd/default.etcd/member/*

Stop the other two etcd nodes, because when the etcd node starts, all nodes need to be started together, and they can be used after successful startup.

#master 节点
systemctl stop etcd
systemctl restart etcd

#node1 节点
systemctl stop etcd
systemctl restart etcd

#node2 节点
systemctl stop etcd
systemctl restart etcd

Configure host mutual trust under CentOS

In each server, the user name that needs to establish mutual trust between the host is executed and the following command is executed to generate the public key/key, and the default is to press Enter

ssh-keygen -t rsa

You can see the file that generates a public key.

To transfer the public key to each other, you need to enter the password for the first time, and then it is OK.

ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.199.132 (-p 2222)

-p port default port without -p, if you have changed the port, you have to add -p. You can see that an authorized_keys file is generated under .ssh/, recording the public of other servers that can log in to this server key.

Test to see if you can log in:

ssh 192.168.199.132 (-p 2222)

Modification of CentOS hostname

hostnamectl set-hostname k8s-master1

Virtualbox implements CentOS copy and paste function

If you do not install or do not output, you can modify update to install before running.

yum install update
yum update kernel
yum update kernel-devel
yum install kernel-headers
yum install gcc
yum install gcc make

After running

sh VBoxLinuxAdditions.run

Delete Pod has been in Terminating state

You can force the deletion by the following command

kubectl delete pod NAME --grace-period=0 --force

Delete namespace has been in Terminating state

You can force the deletion through the following script

[root@k8s-master1 k8s]# cat delete-ns.sh
#!/bin/bash
set -e

useage(){
    echo "useage:"
    echo " delns.sh NAMESPACE"
}

if [ $# -lt 1 ];then
    useage
    exit
fi

NAMESPACE=$1
JSONFILE=${NAMESPACE}.json
kubectl get ns "${NAMESPACE}" -o json > "${JSONFILE}"
vi "${JSONFILE}"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @"${JSONFLE}" \
    http://127.0.0.1:8001/api/v1/namespaces/"${NAMESPACE}"/finalize

The container contains valid CPU/memory requests and no limits are specified. What might happen?

Next, we create a corresponding container, which has only requests settings but no limits settings.

- name: busybox-cnt02
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo hello from cnt02; sleep 10;done"]
    resources:
      requests:
        memory: "100Mi"
        cpu: "100m"

What's the problem when this container is created?

In fact, there is no problem for a normal environment, but for resource-based pods, if some containers do not set limit limits, resources will be preempted by other pods, which may cause container application failure. You can use the limitrange strategy to match and let the pod set automatically, provided that the limitrange rules are configured in advance.

Source: https://www.cnblogs.com/passzhang

6 tips for troubleshooting applications on recommends to everyone, it is necessary for daily troubleshooting. Share a copy of Alibaba Cloud's internal ultra-full K8s practical manual , which is free to download!

Interview questions

One goal: container operation; two places and three centers; four-layer service discovery; five kinds of Pod shared resources; six CNI commonly used plug-ins; seven-layer load