前言
在k8s集群建设过程中,一般情况下我们部署的 Pod 是通过集群的自动调度策略来选择节点的,默认情况下调度器考虑的是资源足够,并且负载尽量平均。但是有的时候我们需要能够更加细粒度的去控制 Pod 的调度;有时我们希望对内和对外的两类业务分别跑在不同的节点上,相互有依赖的两个pod跑在同一节点上,等情况;这就需要我们更好的控制pod的部署;k8s给我们提供了亲和性和反亲和性,污点(taint)和Toleration(容忍)等概念。
- 节点node亲和性
- 其中又分为硬亲和与软亲和
- 硬亲和表示条件必须满足
- 软亲和表示尽量满足
节点亲和 Node affinity:
- 硬亲和表示条件必须满足条件
requiredDuringSchedulingIgnoredDuringExecution
表示pod必须部署到满足条件的节点上,如果没有满足条件的节点,就不停重试。其中IgnoreDuringExecution表示pod部署之后运行的时候,如果节点标签发生了变化,不再满足pod指定的条件,pod也会继续运行。 - 软亲和表示尽量满足条件
preferredDuringSchedulingIgnoredDuringExecution
表示优先部署到满足条件的节点上,如果没有满足条件的节点,就忽略这些条件,按照正常逻辑部署。
查看nodeAffinity的详细说明
[root@k8s-master scheduler]# kubectl explain pods.spec.affinity.nodeAffinity
KIND: Pod
...
FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object> #软亲和
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.
requiredDuringSchedulingIgnoredDuringExecution <Object> #硬亲和
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.
[root@k8s-master ~]# kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution #硬亲和详情说明
...
FIELDS:
nodeSelectorTerms <[]Object> -required- #节点选择条件
Required. A list of node selector terms. The terms are ORed.
[root@k8s-master scheduler]# kubectl explain pods.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution #软亲和详情说明
...
FIELDS:
preference <Object> -required- #亲和偏向与权重一起使用
A node selector term, associated with the corresponding weight.
weight <integer> -required- #权重
Weight associated with matching the corresponding nodeSelectorTerm, in the
range 1-100.
[root@k8s-master scheduler]# kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND: Pod
VERSION: v1
RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <Object>
DESCRIPTION:
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.
A node selector represents the union of the results of one or more label
queries over a set of nodes; that is, it represents the OR of the selectors
represented by the node selector terms.
FIELDS:
nodeSelectorTerms <[]Object> -required-
Required. A list of node selector terms. The terms are ORed.
示例1: 节点硬亲和
[root@k8s-master Scheduler]# cat pod-with-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-with-nodeselector
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
nodeSelector: #硬亲和选项
gpu: '' #为空
[root@k8s-master Scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deployment-demo-5fddfb8ffc-lssq8 1/1 Running 0 23m
deployment-demo-5fddfb8ffc-r277n 1/1 Running 0 23m
deployment-demo-5fddfb8ffc-wrpjx 1/1 Running 0 23m
deployment-demo-5fddfb8ffc-zzwck 1/1 Running 0 23m
pod-with-nodeselector 0/1 Pending 0 3m8s #挂起状态
[root@k8s-master Scheduler]# kubectl describe pod pod-with-nodeselector
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 15s default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector. #提示所有节点没有匹配的标签
Warning FailedScheduling 15s default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.
[root@k8s-master Scheduler]# kubectl label node k8s-node3 gpu='' #给node3 打标签gpu为空
node/k8s-node3 labeled
[root@k8s-master Scheduler]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment-demo-5fddfb8ffc-lssq8 1/1 Running 0 24m 192.168.113.14 k8s-node1 <none> <none>
deployment-demo-5fddfb8ffc-r277n 1/1 Running 0 24m 192.168.12.12 k8s-node2 <none> <none>
deployment-demo-5fddfb8ffc-wrpjx 1/1 Running 0 24m 192.168.12.11 k8s-node2 <none> <none>
deployment-demo-5fddfb8ffc-zzwck 1/1 Running 0 24m 192.168.51.19 k8s-node3 <none> <none>
pod-with-nodeselector 1/1 Running 0 3m59s 192.168.51.20 k8s-node3 <none> <none> #运行在node3
[root@k8s-master Scheduler]# kubectl label node k8s-node3 gpu- #删除标签不会对已创建的Pod产生影响,因为调度只发生在创建之前
node/k8s-node3 labeled
[root@k8s-master Scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deployment-demo-5fddfb8ffc-lssq8 1/1 Running 0 28m
deployment-demo-5fddfb8ffc-r277n 1/1 Running 0 28m
deployment-demo-5fddfb8ffc-wrpjx 1/1 Running 0 28m
deployment-demo-5fddfb8ffc-zzwck 1/1 Running 0 28m
pod-with-nodeselector 1/1 Running 0 8m9s
[root@k8s-master Scheduler]# cat node-affinity-required-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-required
namespace: default
spec:
replicas: 5
selector:
matchLabels :
app: demoapp
ctlr: node-affinity-required
template:
metadata:
labels :
app: demoapp
ctlr: node-affinity-required
spec:
containers :
- name: demoapp
image: ikubernetes/demoapp:v1.0
livenessProbe :
httpGet:
path: '/livez'
port: 80
initialDelaySeconds: 5
readinessProbe:
httpGet :
path: '/readyz'
port: 80
initialDelaySeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: #匹配条件
- key: gpu #拥有这个标签
operator: Exists
- key: node-role.kubernetes.io/master #不能为主节点 两个条件同时满足
operator: DoesNotExist
[root@k8s-master Scheduler]# kubectl apply -f node-affinity-required-demo.yaml
[root@k8s-master Scheduler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
node-affinity-required-5cb67df4b-d5nk6 0/1 Pending 0 3m51s
node-affinity-required-5cb67df4b-m6zxf 0/1 Pending 0 3m52s
node-affinity-required-5cb67df4b-sq5k9 0/1 Pending 0 3m51s
node-affinity-required-5cb67df4b-tvpwf 0/1 Pending 0 3m51s
node-affinity-required-5cb67df4b-vkx7j 0/1 Pending 0 3m52s
pod-with-nodeselector 0/1 Pending 0 31m #Pod挂起
[root@k8s-master Scheduler]# kubectl label node k8s-node2 gpu='true'
node/k8s-node2 labeled #为节点添加标签以符合条件
[root@k8s-master Scheduler]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity-required-5cb67df4b-d5nk6 0/1 ContainerCreating 0 5m14s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-m6zxf 0/1 ContainerCreating 0 5m15s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-sq5k9 0/1 ContainerCreating 0 5m14s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-tvpwf 0/1 ContainerCreating 0 5m14s <none> k8s-node2 <none> <none>
node-affinity-required-5cb67df4b-vkx7j 0/1 ContainerCreating 0 5m15s <none> k8s-node2 <none> <none>
示例2: nodeAffinity 硬亲和
- requiredDuringSchedulingIgnoredDuringExecution
[root@k8s-master Scheduler]# cat node-affinity-and-resourcefits.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-and-resourcefits
namespace: default
spec:
replicas: 5
selector:
matchLabels:
app: demoapp
ctlr: node-affinity-and-resourcefits
template:
metadata:
labels:
app: demoapp
ctlr: node-affinity-and-resourcefits
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
resources: #预选函数 只有满足资源需求的node 才会进行下面的权重打分 进行优选
requests:
cpu: 1000m
memory: 200Mi
livenessProbe:
httpGet:
path: '/livez'
port: 80
initialDelaySeconds: 5
readinessProbe:
httpGet:
path: '/readyz'
port: 80
initialDelaySeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: #硬亲和
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: Exists
示例3: nodeAffinity 软亲和
- preferredDuringSchedulingIgnoredDuringExecution
[root@k8s-master Scheduler]# cat node-affinity-preferred-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-preferred
spec:
replicas: 5
selector:
matchLabels:
app: demoapp
ctir: node-affinity-preferred
template:
metadata:
name: demoapp
labels:
app: demoapp
ctir: node-affinity-preferred
spec:
containers :
- name: demoapp
image: ikubernetes/demoapp:v1.0
resources: #预选函数 只有满足资源需求的node 才会进行下面的权重打分 进行优选
requests:
cpu: 100m
memory: 100Mi
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution: #软亲和
- weight: 60
preference:
matchExpressions: #带gpu标签的加60权重
- key: gpu
operator: Exists
- weight: 30
preference:
matchExpressions: #包含foo、bar标签的加30权重
- key: region
operator: In
values: ["foo","bar"]
[root@k8s-master Scheduler]# kubectl apply -f node-affinity-preferred-demo.yaml
deployment.apps/node-affinity-preferred created
#为节点添加标签
[root@k8s-master ~]# kubectl label node k8s-node1.org gpu=2
node/k8s-node1.org labeled
[root@k8s-master ~]# kubectl label node k8s-node3.org region=foo
node/k8s-node3.org labeled
[root@k8s-master ~]# kubectl label node k8s-node2.org region=bar
node/k8s-node2.org labeled
[root@k8s-master ~]# kubectl get node -l gpu # node1为gpu
NAME STATUS ROLES AGE VERSION
k8s-node1.org Ready <none> 47d v1.22.2
[root@k8s-master ~]# kubectl get node -l region #node2、node3为
NAME STATUS ROLES AGE VERSION
k8s-node2.org Ready <none> 47d v1.22.2
k8s-node3.org Ready <none> 47d v1.22.2
[root@k8s-master Scheduler]# kubectl apply -f node-affinity-preferred-demo.yaml
deployment.apps/node-affinity-preferred created
- 理论上 node1权重更高 pod会都运行在node1上
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
details-v1-79f774bdb9-vjmll 2/2 Running 0 27d 10.244.42.13 k8s-node3.org <none> <none>
node-affinity-preferred-5579fd76bc-5hfd2 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-gzhd6 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-q8wrc 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-v42sn 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
node-affinity-preferred-5579fd76bc-vvc42 0/2 Init:0/1 0 15s <none> k8s-node1.org <none> <none>
productpage-v1-6b746f74dc-q564k 2/2 Running 0 27d 10.244.42.21 k8s-node3.org <none> <none>
ratings-v1-b6994bb9-vh57t 2/2 Running 0 27d 10.244.42.19 k8s-node3.org <none> <none>
reviews-v1-545db77b95-clh87 2/2 Running 0 27d 10.244.42.12 k8s-node3.org <none> <none>
reviews-v2-7bf8c9648f-hdbdl 2/2 Running 0 27d 10.244.42.9 k8s-node3.org <none> <none>
reviews-v3-84779c7bbc-4vzcz 2/2 Running 0 27d 10.244.42.17 k8s-node3.org <none> <none>
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。