Kubeadm部署高可用K8s集群-v1.30.0

1. 环境规划

部署集群时尽量使用 root 用户
使用 Keepalive + Nginx 作集群负载均衡及高可用。
单独部署 etcd 高可用集群。
使用 Cilium 作为网络插件，替代calico、kube-proxy和IPVS等，启用完整功能需要内核版本>=5.19。
使用 Containerd 作为容器运行时。
修改 Kubeadm 所创建证书的时限为10年。
需要从网上下载资源，如果服务器不能上网，需要提前下载对应资源并导入节点。
部署测试使用2master+2node实现高可用，2个master节点虽然可以实现高可用，但是不推荐，官方推荐使用单数个master节点，从而让投票选举Leader时保证超过半数票数。

1.1 软件环境

软件	版本
操作系统	Anolis 8.9
容器运行时	Containerd v1.7.15
Kubernetes	Kubernetes v1.30.0
网络组件	Cilium v1.14.10

1.2 服务器整体规划

角色	IP	配置	组件
k8s-master-1	10.7.0.21	8C16G	所有k8s组件，containerd，etcd，nginx，keepalived
k8s-master-2	10.7.0.22	8C16G	所有k8s组件，containerd，etcd，nginx，keepalived
k8s-master-3			所有k8s组件，containerd，etcd，nginx，keepalived
k8s-node-1	10.7.0.24	8C16G	kubelet，kube-proxy，containerd，etcd
k8s-node-2	10.7.0.25	8C16G	kubelet，kube-proxy，containerd，coreDNS
k8s-node-3			kubelet，kube-proxy，containerd，coreDNS
负载均衡器IP	10.7.0.20

1.3 集群网段

配置	备注
节点网络	10.7.0.21~2x
Pod网络	10.218.0.0/16
Service网络	172.28.0.0/16

2. 系统设置（所有节点）

2.6 使用变量简化部署

# 安装的k8s版本
Kube_Version=1.30.0
# k8s集群APIServer地址(虚拟IP)和端口
Kube_APIServerIP="10.7.0.20"
Kube_APIServerPort="8443"
# k8s集群Master、Node节点IP地址
Kube_MasterAddr=(10.7.0.21 10.7.0.22)
Kube_NodeAddr=(10.7.0.24 10.7.0.25)
# etcd集群地址
ETCD_ClusterAddr=(10.7.0.21 10.7.0.22 10.7.0.24)
# etcd版本
ETCD_Version=v3.5.12
# 创建Etcd集群配置目录
ETCD_ConfigDir='/etc/etcd'
# Etcd数据目录
ETCD_DataDir='/var/lib/etcd'
# 创建Etcd证书目录
ETCD_SSLDir='/etc/etcd/ssl'
# 软件包存放目录
Soft_SrcDir='/app/src'
# 集群Pod和Service的子网网段
PodSubnet="10.218.0.0/16"
SvcSubnet="172.28.0.0/16"

# go版本
go_vers=1.22.0
# nerdctl、buildkit、containerd版本
nerd_vers='1.7.5'
buid_vers='0.13.1'
cont_vers='1.7.15'

# 获取本机IP地址
LocalIPAddr=$(ip addr | awk '/^[0-9]+: / {}; /inet.*global/ {print gensub(/(.*)\/(.*)/, "\\1", "g", $2)}'| sed -n '1p')

# 以第一台master作为keepalived的master，另外两台作为backup
Keepalived_State="BACKUP"
Keepalived_Priority=90
if [[ ${LocalIPAddr} == ${Kube_MasterAddr[0]} ]];then
  Keepalived_State="MASTER"
  Keepalived_Priority=100
fi
# 使用的主网卡名称
NetworkInterface=$(ip link show | awk -F': ' '/^[0-9]+: /{print $2}'| grep -E '^en|^eth')

# 设置etcd集群节点名称，
for i in `seq ${#ETCD_ClusterAddr[@]}`;do
  let j=i-1
  [[ "$LocalIPAddr" == "${ETCD_ClusterAddr[$j]}" ]] && ETCD_NAME="etcd0${i}"
done
echo $ETCD_NAME

2.2 配置主机名，修改/etc/hosts文件

# 配置主机名,将所有主机写入hosts
sed -i '3,$d' /etc/hosts
m=0;n=0
# master
for node in ${Kube_MasterAddr[@]};do
  echo "$node k8s-master-$((++m))" >> /etc/hosts
  [ "$node" == "$LocalIPAddr" ] && hostnamectl set-hostname k8s-master-$m
done
# node
for node in ${Kube_NodeAddr[@]};do
  echo "$node k8s-node-$((++n))" >> /etc/hosts
  [ "$node" == "$LocalIPAddr" ] && hostnamectl set-hostname k8s-node-$n
done

# 检查
cat /etc/hosts
hostname

2.3 关闭防火墙、selinux、swap

# 安装常用及kubekey需要的依赖项
yum -y install vim-enhanced yum-utils bash-completion lrzsz iproute iproute-tc zip unzip rsync wget 

# 关闭防火墙
systemctl disable --now firewalld
# 关闭selinux
sed -i '/^SELINUX=/cSELINUX=disabled' /etc/selinux/config
sed -i '/^SELINUX=/cSELINUX=disabled' /etc/sysconfig/selinux
setenforce 0
# 关闭swap
swapoff -a
sed -i.bak '/swap/s/^/#/' /etc/fstab

# 检查selinux是否为permissive
getenforce 
# 检查swap是否为0
free -h
# 检查防火墙是否关闭
systemctl status firewalld

2.4 所有节点配置limit

ulimit -SHn 65536

cat <<EOF | sudo tee -a /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft noproc 65535
* hard noproc 655350
* soft memlock unlimited
* hard memlock unlimited
EOF

# 查看
tail /etc/security/limits.conf

2.5 升级内核版本LTS-6.1

内核rpm包下载地址： https://elrepo.org/linux/kernel
http://193.49.22.109/elrepo/kernel

rhel7、8安装高版本LTS：https://dl.lamp.sh/kernel

# 内核版本
kv=6.1.87

# 下载rhel8的内核kernel包
wget  https://dl.lamp.sh/kernel/el8/kernel-ml-modules-${kv}-1.el8.x86_64.rpm
wget  https://dl.lamp.sh/kernel/el8/kernel-ml-devel-${kv}-1.el8.x86_64.rpm
wget  https://dl.lamp.sh/kernel/el8/kernel-ml-core-${kv}-1.el8.x86_64.rpm
wget  https://dl.lamp.sh/kernel/el8/kernel-ml-${kv}-1.el8.x86_64.rpm

# # 以下为rhel7版本kernel包
# wget https://dl.lamp.sh/kernel/el7/kernel-ml-${kv}-1.el7.x86_64.rpm
# wget https://dl.lamp.sh/kernel/el7/kernel-ml-devel-${kv}-1.el7.x86_64.rpm
# wget https://dl.lamp.sh/kernel/el7/kernel-ml-headers-${kv}-1.el7.x86_64.rpm

# 下载内核LTS-6.1，上传到服务器，安装内核
yum -y install kernel-ml-*

# 查看系统安装的所有内核
grubby --info=ALL

# 设置默认启动
grub2-set-default 0 && grub2-mkconfig -o /boot/grub2/grub.cfg && grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"

# 检查默认启动是否为LTS-6.1
grubby --default-kernel

# 重启后检查
reboot
uname -sr

2.6 配置内核参数（参考）

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
# 决定系统中所允许的文件句柄最大数目，即可以打开的文件的数量。
fs.file-max = 2097152
# 单个进程可分配的最大文件数
fs.nr_open = 1048576
# 表示每一个real user ID可创建的inotify instatnces的数量上限，默认128.
fs.inotify.max_user_instances = 8192
# 默认值: 8192 表示同一用户同时可以添加的watch数目（watch一般是针对目录，决定了同时同一用户可以监控的目录数量）
fs.inotify.max_user_watches = 524288
# 表示调用inotify_init时分配给inotify instance中可排队的event的数目的最大值，超出这个值的事件被丢弃，但会触发IN_Q_OVERFLOW事件。
fs.inotify.max_queued_events = 16384
# 用于限制普通用户建立硬链接，0：不限制 1：限制，如果文件不属于用户，或者用户对此用户没有读写权限，则不能建立硬链接
fs.protected_hardlinks = 1
# 用于限制普通用户建立软链接 0：不限 1：限制，允许用户建立软连接的情况是软连接所在目录是全局可读写目录或者软连接的uid与跟从者的uid匹配，又或者目录所有者与软连接所有者匹配
fs.protected_symlinks = 1
# 可以同时拥有异步I/O请求的数目，Oracle推荐的值为1048576
fs.aio-max-nr = 1048576

# softlockup 是一种触发系统在内核态中一直循环超过 20 秒导致其它任务没有机会得到运行的 BUG。
# 设置产生softlockup时是否抛出一个panic。0：不产生panic   1：产生panic
kernel.softlockup_panic = 1
# softlookup时捕获更多调试信息
kernel.softlockup_all_cpu_backtrace = 1
# panic错误中自动重启，等待时间为10秒
kernel.panic = 10
# 在Oops发生时会进行panic()操作
kernel.panic_on_oops = 1
# 用于设置高精度定时器(hrtimer)、nmi事件、softlockup、hardlockup的阀值(以秒为单位)
kernel.watchdog_thresh = 30
# 最大进程数
kernel.pid_max = 655360 
# 参数指定在一个消息队列中最大的字节数 默认：16384
kernel.msgmnb = 65536
# 指定了从一个进程发送到另一个进程的消息最大长度。进程间的消息传递是在内核的内存中进行的。不会交换到硬盘上。所以如果增加该值，则将增加操作系统所使用的内存数量
kernel.msgmax = 65536
# 整个系统共享内存段的最大数量。 
kernel.shmmni = 4096
# 系统中进程数量(包括线程)的最大值
kernel.threads-max = 655360
# 用于限制访问性能计数器的权限 -1：不限制
kernel.perf_event_paranoid = -1
# 这用于控制进程的追踪，0-默认附加安全权限。1-受限附件。只有子进程加上普通权限。
kernel.yama.ptrace_scope = 0
# Core文件的文件名是否添加应用程序pid做为扩展，0：不添加 1：添加
kernel.core_uses_pid = 1

# 每个网络接口接收数据包的速率比内核处理这些包的速率快时，允许送到队列的数据包的最大数目。
net.core.netdev_max_backlog = 16384
# 接收套接字缓冲区大小的最大值(以字节为单位)。最大化 Socket Receive Buffer
net.core.rmem_max = 16777216
# 发送套接字缓冲区大小的最大值(以字节为单位)。最大化 Socket Send Buffer
net.core.wmem_max = 16777216
# 增加此选项允许内核根据需要分配更多内存，以便为每个连接的套接字（包括IPC套接字/管道）发送更多控制消息。
net.core.optmem_max = 20480
# 所有协议类型读写的缓存区缺省大小
net.core.rmem_default = 262144
net.core.wmem_default = 262144
# 第二个积压队列长度，表示socket监听（listen）的backlog上限。
# backlog是socket的监听队列，当一个请求（request）尚未被处理或建立时，他会进入backlog。
net.core.somaxconn = 32768

# 是否开启路由转发功能，0:禁止，1:打开。
net.ipv4.ip_forward = 1
# 不转发源路由帧，如果做NAT建议开启。里面的容器想要跟外网通信，必须开启ip forward
net.ipv4.conf.all.forwarding = 1
# 定义网络连接可用作其源（本地）端口的最小和最大端口的限制，同时适用于TCP和UDP连接。
net.ipv4.ip_local_port_range = 1024 65535 

# 主备IP地址切换控制机制，当主IP地址被移除时,0:删除所有次IP地址；1:将次IP地址提升为主IP地址
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
# 源路由验证，0: 禁用，1(默认): 严格，2: 松散(只记录不拒绝)。
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
# arp_announce 对网络接口（网卡）上发出的ARP请求包中的源IP地址作出相应的限制
# 值为2时，表示始终使用与目的IP地址对应的最佳本地IP地址作为ARP请求的源IP地址
net.ipv4.conf.default.arp_announce  =  2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
# 默认是否接收无源路由的数据包，主机设为0，路由器设为1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0

# 存在于ARP高速缓存中的最少层数，如果少于这个数，垃圾收集器将不会运行。缺省值是128。
net.ipv4.neigh.default.gc_thresh1 = 4096
# 保存在 ARP 高速缓存中的最多的记录软限制。垃圾收集器在开始收集前，允许记录数超过这个数字 5 秒。缺省值是 512。
net.ipv4.neigh.default.gc_thresh2 = 8192
# 保存在 ARP 高速缓存中的最多记录的硬限制，一旦高速缓存中的数目高于此，垃圾收集器将马上运行。缺省值是1024。
net.ipv4.neigh.default.gc_thresh3 = 16384

# 第一个积压队列长度，对于那些尚未收到客户端确认的连接请求，需要保存在队列中的最大数目。
net.ipv4.tcp_max_syn_backlog = 8096
# 表示回应第二个握手包（SYN+ACK包）给客户端IP后，如果收不到第三次握手包（ACK包），进行重试的次数（默认为5）
net.ipv4.tcp_synack_retries = 2
# 发送缓存区大小，缓存应用程序的数据，有序列号被应答确认的数据会从发送缓冲区删除掉。
net.ipv4.tcp_wmem = 4096 87380 16777216
# 接收缓存区大小，缓存从对端接收的数据，后续会被应用程序读取
net.ipv4.tcp_rmem = 4096 87380 16777216
# 禁用TCP SSR(Slow-StartRestart，慢启动重启）对于那些会出现突发空闲的长周期TCP连接（比如HTTP的keep-alive连接）有很大的影响
net.ipv4.tcp_slow_start_after_idle = 0
# 表示允许重用TIME_WAIT状态的套接字用于新的TCP连接, 默认为0, 表示关闭。
# 允许在协议安全的情况下重用TIME_WAIT 套接字用于新的连接。
net.ipv4.tcp_tw_reuse = 1
# 关闭TCP时间戳功能，用于提供更好的安全性。
net.ipv4.tcp_timestamps = 0
# tcp检查间隔时间（keepalive探测包的发送间隔，默认75s）
net.ipv4.tcp_keepalive_intvl = 30
# tcp检查次数（如果对方不予应答，探测包的发送次数，默认值9）
net.ipv4.tcp_keepalive_probes = 5
# 此参数表示TCP发送keepalive探测消息的间隔时间(秒)，默认2小时
net.ipv4.tcp_keepalive_time = 600
# 系统同时处理TIME_WAIT sockets数目。如果一旦TIME_WAIT tcp连接数超过了这个数目，系统会强制清除并且显示警告消息。设立该限制，主要是防止那些简单的DoS攻击，加大该值有可能消耗更多的内存资源。
net.ipv4.tcp_max_tw_buckets = 6000
# 此参数应该设置为1，防止SYN Flood攻击
net.ipv4.tcp_syncookies = 1
# 对于本端断开的 socket 连接，这个参数决定了它保持在 FIN-WAIT-2 状态的时间
net.ipv4.tcp_fin_timeout = 30    
# 设置了默认队列规则为fq
net.core.default_qdisc = fq
# 开启bbr算法，bbr在有一定丢包率的网络链路上更加充分的利用带宽，降低延迟。
net.ipv4.tcp_congestion_control = bbr
# TCP失败重传次数,默认值15,意味着重传15次才彻底放弃.可减少到5,以尽早释放内核资源
net.ipv4.tcp_retries2 = 5

# 禁用IPv6，0为启用IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

# 限制一个进程可以拥有的VMA(虚拟内存区域)的数量，一个更大的值对于 elasticsearch、mongo 或其他 mmap 用户来说非常有用
vm.max_map_count = 262144
# 该设置允许原始的内存过量分配策略，当系统的内存已经被完全使用时，系统仍然会分配额外的内存。
vm.overcommit_memory = 1
# 当系统内存不足（OOM）时，禁用系统崩溃和重启。
vm.panic_on_oom = 0
# 禁止使用 swap 空间，只有当系统 OOM 时才允许使用它
vm.swappiness = 0
# 当系统脏页的比例或者所占内存数量超过阈值时，启动内核线程开始将脏页写入磁盘
vm.dirty_background_ratio = 5 
# 当系统pagecache的脏页达到系统内存 dirty_ratio(百分数)阈值时，系统就会阻塞新的写请求,直到脏页被回写到磁盘
vm.dirty_ratio = 20
EOF

modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
sysctl --system

# 检查
sysctl -a |grep -E 'track_max|tcp_mem|shmmax|shmall'

2.7 启用Cgroup v2（可选）

注：使用 Cilium 作为网络组件时，需要启用 Cgroup v2 。

使用 Cgroup v2 的条件：

kernel 版本 > 4.5
containerd v1.4+

# 一定要关闭SELINUX
sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
setenforce 0 && getenforce

# 启用cgroup v2
grubby --update-kernel=ALL --args=systemd.unified_cgroup_hierarchy=1

# 重新启动系统，然后重启后检查是否启用cgroupv2
reboot

# 查看cgroups版本
mount|grep cgroup
# 检查你的发行版使用的是哪个 cgroup 版本
# 对于 cgroup v2，输出为 cgroup2fs。
# 对于 cgroup v1，输出为 tmpfs。
stat -fc %T /sys/fs/cgroup/

2.8 进行时间同步

yum -y install chrony

# 修改配置文件，使用阿里时间
#sed -i.bak '/^server/s/^/#/' /etc/chrony.conf
#sed -i '/^# Please consider/apool ntp.aliyun.com iburst' /etc/chrony.conf

#启动服务
systemctl restart chronyd
systemctl enable --now chronyd
systemctl status chronyd

# 检查时间同步状态
chronyc sources

2.9 免密ssh到node节点

# 使用sshpass免密
# 是否有私钥
yum -y install sshpass
[[ ! -f ~/.ssh/id_rsa ]] && ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

passwd='Mid49343' 
for ip in ${Kube_MasterAddr[@]} ${Kube_NodeAddr[@]};do
    sshpass -p ${passwd} ssh-copy-id -o StrictHostKeyChecking=no root@$ip 
done

2.10 安装Containerd作为Runtime

# 下载containerd
mkdir -p ${Soft_SrcDir} ; cd ${Soft_SrcDir}
wget https://mirror.ghproxy.com/https://github.com/containerd/containerd/releases/download/v${cont_vers}/cri-containerd-cni-${cont_vers}-linux-amd64.tar.gz

# 安装containerd
tar -xvzf cri-containerd-cni-${cont_vers}-linux-amd64.tar.gz -C /

# 创建服务启动文件
cat <<EOF | sudo tee /usr/lib/systemd/system/containerd.service 
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target
EOF


# 创建Containerd的配置文件
mkdir -p /etc/containerd
/usr/local/bin/containerd config default | tee /etc/containerd/config.toml

# 修改Containerd的配置文件
sed -i '/SystemdCgroup =/s/false/true/;/sandbox_image =/s#".*"#"registry.aliyuncs.com/google_containers/pause:3.9"#' /etc/containerd/config.toml

# 配置镜像加速
sed -i '/registry\.mirrors/a\        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]\
          endpoint = ["https://m.daocloud.io/registry.k8s.io"]\
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]\
          endpoint = ["https://docker.m.daocloud.io","https://docker.nju.edu.cn"]\
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gcr.io"]\
          endpoint = ["https://gcr.m.daocloud.io"]\
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"]\
          endpoint = ["https://quay.m.daocloud.io"]\
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."ghcr.io"]\
          endpoint = ["https://ghcr.m.daocloud.io"]\  
' /etc/containerd/config.toml       
# 查看是否成功
grep -E 'SystemdCgroup|sandbox_image|mirrors|endpoint' /etc/containerd/config.toml

# 设置 Containerd 开机自启
systemctl daemon-reload
systemctl enable --now containerd

# 检查服务
systemctl status containerd

2.11 安装containerd管理工具nerdctl

cd ${Soft_SrcDir}
# containerd 客户端工具 nerdctl (操作兼容docker)
wget https://mirror.ghproxy.com/https://github.com/containerd/nerdctl/releases/download/v${nerd_vers}/nerdctl-${nerd_vers}-linux-amd64.tar.gz
# 使用精简版 nerdctl 无法直接通过 containerd 构建镜像，需要与 buildkit 组全使用以实现镜像构建。
wget https://mirror.ghproxy.com/https://github.com/moby/buildkit/releases/download/v${buid_vers}/buildkit-v${buid_vers}.linux-amd64.tar.gz

# （可选）将tar包分发到所有节点，（视需求执行）
for NODE in ${Kube_MasterAddr[@]} ${Kube_NodeAddr[@]}; do 
    echo "Node: $NODE"
    if [[ $NODE != $LocalIPAddr ]];then
      scp -rpq ${Soft_SrcDir}/nerdctl-${nerd_vers}-linux-amd64.tar.gz $NODE:${Soft_SrcDir}
      scp -rpq ${Soft_SrcDir}/buildkit-v${buid_vers}.linux-amd64.tar.gz $NODE:${Soft_SrcDir}
    fi
done

# 部署nerdctl
tar xf nerdctl-${nerd_vers}-linux-amd64.tar.gz -C /usr/local/bin/
mkdir -p /etc/nerdctl/
cat <<EOF |sudo tee /etc/nerdctl/nerdctl.toml
namespace      = "k8s.io"  # 设置nerdctl工具默认namespace
insecure_registry = true   # 跳过安全镜像仓库检测
EOF

# 安装 buildkit 支持构建镜像
tar xf buildkit-v${buid_vers}.linux-amd64.tar.gz -C /usr/local/
# 配置 buildkit 的启动文件
cat <<EOF | sudo tee /usr/lib/systemd/system/buildkit.socket
[Unit]
Description=BuildKit
Documentation=https://github.com/moby/buildkit

[Socket]
ListenStream=%t/buildkit/buildkitd.sock
SocketMode=0660

[Install]
WantedBy=sockets.target
EOF

cat <<EOF | sudo tee /usr/lib/systemd/system/buildkit.service
[Unit]
Description=BuildKit
Requires=buildkit.socket
After=buildkit.socket
Documentation=https://github.com/moby/buildkit

[Service]
Type=notify
ExecStart=/usr/local/bin/buildkitd --addr fd://

[Install]
WantedBy=multi-user.target
EOF

# 启动 buildkit
systemctl daemon-reload && systemctl enable --now buildkit 
systemctl status buildkit

# 查看
nerdctl version
buildctl --version
nerdctl info

# nerdctl命令补全
echo "source <(nerdctl completion bash)" >> ~/.bashrc


# 简单测试
cat <<EOF | sudo tee Dockerfile
FROM nginx:alpine
RUN echo 12345 > /usr/share/nginx/html/index.html
EOF
# 构建镜像
nerdctl build -t nginx:mytest .
# 启动容器
nerdctl run -d -p 80:80 nginx:mytest
# 访问
sleep 2;curl localhost

3. k8s基本组件安装

3.1 安装Kubernetes组件（所有主机）

# 新增k8s的yum源
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

# 更新缓存
yum clean all
yum -y makecache

# 查看k8s版本
yum list kubeadm --showduplicates --disableexcludes=kubernetes| sort

# 安装Kubernetes组件
yum -y install kubelet-${Kube_Version} kubeadm-${Kube_Version} kubectl-${Kube_Version} --disableexcludes=kubernetes

# kubectl命令补全
kubectl completion bash >/etc/bash_completion.d/kubectl
kubeadm completion bash >/etc/bash_completion.d/kubeadm

# 查看所需要的镜像
kubeadm config images list --kubernetes-version=${Kube_Version}

#指定容器运行时为containerd
crictl config runtime-endpoint /run/containerd/containerd.sock

systemctl daemon-reload
systemctl enable --now kubelet
systemctl restart kubelet
systemctl status kubelet

3.2 高可用配置（ keepalived + nginx ）

# 采用 keepalived 和 nginx 实现高可用
# 编译安装 keepalive 和 nginx，只需在master三台主机上操作

# 编译安装keepalived
# 安装依赖
yum -y install gcc openssl-devel libnl3-devel make
# 下载
cd /app/src
wget https://keepalived.org/software/keepalived-2.2.8.tar.gz
# 解压
tar xvf keepalived-2.2.8.tar.gz -C /usr/local/src
cd /usr/local/src/keepalived-2.2.8/
# 编译、安装
./configure --with-init=systemd --with-systemdsystemunitdir=/usr/lib/systemd/system
make && make install
# 检查
keepalived -v

# 创建、修改keepalive配置文件
mkdir /etc/keepalived
cat <<EOF | sudo tee /etc/keepalived/keepalived.conf 
! Configuration File for keepalived
global_defs {
   router_id `hostname`
   script_user root
   enable_script_security
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"         # 检测脚本路径
    interval 5
    weight -5 
    fall 2
    rise 1
}

vrrp_instance VI_1 {
    state ${Keepalived_State} 
    interface ${NetworkInterface}
    mcast_src_ip ${LocalIPAddr}
    virtual_router_id 50
    priority ${Keepalived_Priority}
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        ${Kube_APIServerIP}
    }
    track_script {
        check_nginx       # 模块
    }
}
EOF

# 查看配置文件内容
cat /etc/keepalived/keepalived.conf 


# 1. 使用官方yum仓库安装nginx (stable版本)
cat<<EOF | sudo tee /etc/yum.repos.d/nginx.repo
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/\$releasever/\$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
EOF

# 清除yum缓存，为了防止从默认的epel源安装Nginx需要清除yum缓存
yum clean all
yum makecache

# 列出所有Nginx版本
yum list nginx --showduplicates

# 安装指定版本1.24，也可以直接yum install nginx -y（默认安装最新版本，这里为1.24.0版本） 
yum -y install nginx-1.24.0

# 注意: 这里需要手动修改Master的IP地址
# sed 修改配置文件，添加upstream项，nginx仅做代理转发流量
sed -i '13i \
stream { \
    upstream apiserver { \
    } \
\
    server { \
        proxy_pass apiserver; \
    } \
} \
' /etc/nginx/nginx.conf 
# 写入master地址
for node in ${Kube_MasterAddr[@]};do
  sed -i "/upstream apiserver/a\        server ${node}:6443 max_fails=2  fail_timeout=10s weight=1;" /etc/nginx/nginx.conf
done
sed -i "/proxy_pass apiserver/i\        listen ${Kube_APIServerPort};" /etc/nginx/nginx.conf
sed -i '/worker_connections/s/1024/65536/' /etc/nginx/nginx.conf
# 不作为web服务器，仅使用upstream
sed -i '/conf.d/s/include/#include/' /etc/nginx/nginx.conf

# 检查
cat /etc/nginx/nginx.conf

# 启动Nginx并设置开机自启动
systemctl enable --now nginx

# 查看Nginx状态，显示Active: active (running)表示已经启动
systemctl status nginx

# 重新读取配置
nginx  -s reload
# 检查端口
ss -nlupt |grep ${Kube_APIServerPort}


# 每台master节点编写健康监测脚本
cat <<EOF | sudo tee /etc/keepalived/check_nginx.sh 
#!/bin/bash

err=0
for k in \$(seq 1 3);do
    check_code=\$(pgrep nginx)
    if [[ \${check_code} == "" ]];then
        err=(expr \$err + 1)
        sleep 1
        continue
    else
        err=0
        break
     fi
done
if [[ \$err != "0" ]];then
    echo "systemctl stop keepalived"
    /usr/bin/systemctl stop keepalived
    exit 1
else
    exit 0
fi

EOF

chmod +x /etc/keepalived/check_nginx.sh 

# 启动haproxy和keepalived
systemctl enable --now keepalived

systemctl status keepalived
ip a s |grep ${Kube_APIServerIP}

# 测试VIP是否正常
ping -c 4 ${Kube_APIServerIP}
telnet ${Kube_APIServerIP} ${Kube_APIServerPort}

3.3 部署etcd集群

etcd简介
etcd是CoreOS团队于2013年6月发起的开源项目，它的目标是构建一个高可用的分布式键值(key-value)数据库。etcd内部采用raft协议作为一致性算法，etcd基于Go语言实现。

etcd作为服务发现系统，有以下的特点：

简单：安装配置简单，而且提供了HTTP API进行交互，使用也很简单
安全：支持SSL证书验证
快速：根据官方提供的benchmark数据，单实例支持每秒2k+读操作
可靠：采用raft算法，实现分布式系统数据的可用性和一致性

etcd项目地址： https://github.com/coreos/etcd/

3.3.1 生成etcd证书（etcd01上执行即可）

# 下载安装包
cd /app/src
wget https://mirror.ghproxy.com/https://github.com/etcd-io/etcd/releases/download/${ETCD_Version}/etcd-${ETCD_Version}-linux-amd64.tar.gz
# 解压etcd安装文件
tar xzvf etcd-${ETCD_Version}-linux-amd64.tar.gz --strip-components=1 -C /usr/local/bin etcd-${ETCD_Version}-linux-amd64/etcd{,ctl}

# 查看/usr/local/bin下内容
ls /usr/local/bin/

# 查看版本
etcdctl version

# 下载管理工具cfssl，地址：https://pkg.cfssl.org
wget https://mirror.ghproxy.com/https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssl_1.6.5_linux_amd64 --no-check-certificate
wget https://mirror.ghproxy.com/https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssl-certinfo_1.6.5_linux_amd64 --no-check-certificate
wget https://mirror.ghproxy.com/https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssljson_1.6.5_linux_amd64 --no-check-certificate

# 将工具放在/usr/local/bin
mv cfssl_1.6.5_linux_amd64 /usr/local/bin/cfssl
mv cfssljson_1.6.5_linux_amd64 /usr/local/bin/cfssljson
mv cfssl-certinfo_1.6.5_linux_amd64 /usr/local/bin/cfssl-certinfo

chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson /usr/local/bin/cfssl-certinfo

# 创建证书文件
mkdir -p ~/k8s/pki/etcd /etc/etcd/ssl
cd ~/k8s/pki/etcd

# 定义ca证书，生成ca证书配置文件
cat <<EOF | sudo tee etcd-ca.json 
{
    "signing": {
        "default": {
            "expiry": "87600h"
        },
        "profiles": {
            "etcd": {
                "expiry": "87600h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}
EOF

# 生成证书签名文件
cat <<EOF | sudo tee etcd-ca-csr.json
{
    "CN": "etcd CA",
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "Shenzhen",
            "ST": "Shenzhen",
            "O": "etcd",
            "OU": "Etcd Security"
        }
    ],
    "ca": {
        "expiry": "87600h"
    }
}
EOF

# 生成ca证书，ca.pem、ca-key.pem
cfssl gencert -initca etcd-ca-csr.json | cfssljson -bare ${ETCD_SSLDir}/etcd-ca

# 指定etcd三个节点之间的通信认证
# 文件hosts字段中IP为所有etcd节点的集群内部通信IP，一个都不能少！为了方便后期扩容可以多写几个预留的IP。
cat <<EOF | sudo tee etcd-csr.json
{
    "CN": "etcd",
    "hosts": [
        "127.0.0.1",
        "k8s-master-1",
        "k8s-master-2",
        "k8s-node-1",
        "${ETCD_ClusterAddr[0]}", 
        "${ETCD_ClusterAddr[1]}",
        "${ETCD_ClusterAddr[2]}",
        "${Kube_APIServerIP}"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "Shenzhen",
            "ST": "Shenzhen",
            "O": "etcd",
            "OU": "Etcd Security"
        }
    ]
}
EOF

# 查看文件
cat etcd-csr.json

# 生成证书
cfssl gencert \
    -ca=${ETCD_SSLDir}/etcd-ca.pem \
    -ca-key=${ETCD_SSLDir}/etcd-ca-key.pem \
    -config=etcd-ca.json \
    -profile=etcd \
    etcd-csr.json | cfssljson -bare ${ETCD_SSLDir}/etcd

# 查看证书
ls ${ETCD_SSLDir}/*.pem

# 分发证书
for ip in ${ETCD_ClusterAddr[@]};do
  if [[ $ip != $LocalIPAddr ]];then
     ssh root@$ip "mkdir -p ${ETCD_SSLDir}"
     scp -rpq ${ETCD_SSLDir}/*.pem root@$ip:${ETCD_SSLDir}/
  fi
done

3.3.2 配置etcd集群（所有etcd主机）

# 需要在所有etcd集群主机上运行
# Etcd集群配置 /etc/etcd/config.yml
cat <<EOF | sudo tee ${ETCD_ConfigDir}/config.yml
# This is the configuration file for the etcd server.

# Human-readable name for this member.
name: ${ETCD_NAME}

# Path to the data directory.
data-dir: ${ETCD_DataDir}

# Path to the dedicated wal directory.
wal-dir:  ${ETCD_DataDir}/wal

# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 5000

# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100

# Time (in milliseconds) for an election to timeout.
election-timeout: 1000

# Raise alarms when backend size exceeds the given quota. 0 means use the default quota.
# 默认 ETCD 空间配额大小为 2G，超过 2G 将不再写入数据，调整为8G
quota-backend-bytes: 8589934592
# 添加 metrics 端口
listen-metrics-urls: "http://0.0.0.0:2381"

# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: "https://${LocalIPAddr}:2380"

# List of comma separated URLs to listen on for client traffic.
listen-client-urls: "https://${LocalIPAddr}:2379,http://127.0.0.1:2379"

# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5

# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5

# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:

# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: "https://${LocalIPAddr}:2380"

# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: "https://${LocalIPAddr}:2379"

# Discovery URL used to bootstrap the cluster.
discovery:

# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'

# HTTP proxy to use for traffic to discovery service.
discovery-proxy:

# DNS domain used to bootstrap initial cluster.
discovery-srv:

# Initial cluster configuration for bootstrapping.
initial-cluster: "etcd01=https://${ETCD_ClusterAddr[0]}:2380,etcd02=https://${ETCD_ClusterAddr[1]}:2380,etcd03=https://${ETCD_ClusterAddr[2]}:2380"

# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-k8s-cluster'

# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'

# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false

# Enable runtime profiling data via HTTP server
enable-pprof: true

# Valid values include 'on', 'readonly', 'off'
proxy: 'off'

# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000

# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000

# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000

# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000

# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0

client-transport-security:
  # Path to the client server TLS cert file.
  cert-file: "${ETCD_SSLDir}/etcd.pem"

  # Path to the client server TLS key file.
  key-file: "${ETCD_SSLDir}/etcd-key.pem"

  # Enable client cert authentication.
  client-cert-auth: true

  # Path to the client server TLS trusted CA cert file.
  trusted-ca-file: "${ETCD_SSLDir}/etcd-ca.pem"

  # Client TLS using generated certificates
  auto-tls: false

peer-transport-security:
  # Path to the peer server TLS cert file.
  cert-file: "${ETCD_SSLDir}/etcd.pem"

  # Path to the peer server TLS key file.
  key-file: "${ETCD_SSLDir}/etcd-key.pem"

  # Enable peer client cert authentication.
  client-cert-auth: true

  # Path to the peer server TLS trusted CA cert file.
  trusted-ca-file: "${ETCD_SSLDir}/etcd-ca.pem"

  # Peer TLS using generated certificates.
  auto-tls: false

# The validity period of the self-signed certificate, the unit is year.
# self-signed-cert-validity: 1

# Enable debug-level logging for etcd.
log-level: info

logger: zap

# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [default]

# Force to create a new one member cluster.
force-new-cluster: false

# etcd 每隔一个小时数据压缩一次
auto-compaction-mode: periodic
auto-compaction-retention: "1"

# 最大请求字节，默认值 1M，调整为10M
max-request-bytes: 10485760

EOF

# 查看配置文件
cat ${ETCD_ConfigDir}/config.yml

# 创建etcd.service并启动, 使用systemctl启动Etcd
cat <<EOF | sudo tee /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Service
Documentation=https://coreos.com/etcd/docs/latest/
After=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/etcd --config-file=${ETCD_ConfigDir}/config.yml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd3.service
EOF

# 创建etcd的目录、链接
sudo mkdir -p ${Kube_CertDir}/etcd /app/server/etcd
sudo ln -s ${ETCD_SSLDir}/* ${Kube_CertDir}/etcd/

# 启动服务
sudo systemctl daemon-reload
sudo systemctl enable --now etcd
sudo systemctl status etcd

# 验证集群
etcdctl --cacert=${ETCD_SSLDir}/etcd-ca.pem --cert=${ETCD_SSLDir}/etcd.pem --key=${ETCD_SSLDir}/etcd-key.pem --endpoints="https://${ETCD_ClusterAddr[0]}:2379,https://${ETCD_ClusterAddr[1]}:2379,https://${ETCD_ClusterAddr[2]}:2379" endpoint health 

etcdctl --cacert=${ETCD_SSLDir}/etcd-ca.pem --cert=${ETCD_SSLDir}/etcd.pem --key=${ETCD_SSLDir}/etcd-key.pem --endpoints="https://${ETCD_ClusterAddr[0]}:2379,https://${ETCD_ClusterAddr[1]}:2379,https://${ETCD_ClusterAddr[2]}:2379" endpoint status --write-out=table

# write,read to etcd
etcdctl --endpoints=localhost:2379 put foo bar
etcdctl --endpoints=localhost:2379 get foo

# etcd 数据目录权限
sudo useradd etcd
sudo chmod 700 ${ETCD_DataDir}
sudo chown etcd:etcd ${ETCD_DataDir}
# 修改 CPU 优先级
sudo sudo renice -n -20 -p $(pgrep etcd)
# 修改磁盘 IO 优先级
sudo sudo ionice -c2 -n0 -p $(pgrep etcd)
# 修改网络优先级
sudo tc qdisc add dev ${NetworkInterface} root handle 1: prio bands 3
sudo tc filter add dev ${NetworkInterface} parent 1: protocol ip prio 1 u32 match ip sport 2380 0xffff flowid 1:1
sudo tc filter add dev ${NetworkInterface} parent 1: protocol ip prio 1 u32 match ip dport 2380 0xffff flowid 1:1
sudo tc filter add dev ${NetworkInterface} parent 1: protocol ip prio 2 u32 match ip sport 2739 0xffff flowid 1:1
sudo tc filter add dev ${NetworkInterface} parent 1: protocol ip prio 2 u32 match ip dport 2739 0xffff flowid 1:1

3.4 修改kubeadm代码，修改证书时间为10年

# 1. Linux安装go环境
# 下载、解压包
wget https://studygolang.com/dl/golang/go${go_vers}.linux-amd64.tar.gz
tar -C /usr/local -xzf go${go_vers}.linux-amd64.tar.gz

# 配置go环境变量
cat <<EOF>> /etc/profile
export PATH=\$PATH:/usr/local/go/bin
EOF
source /etc/profile
# 查看版本
go version

# 2.修改代码
# 拉取对应的源码
mkdir -p ~/k8s;cd ~/k8s/
# git clone --branch v${Kube_Version} https://github.com/kubernetes/kubernetes.git
wget https://mirror.ghproxy.com/https://github.com/kubernetes/kubernetes/archive/refs/tags/v${Kube_Version}.zip
unzip v${Kube_Version}.zip
cd kubernetes-${Kube_Version} 

# 修改代码 
sed -i '/NotAfter:/s/duration365d \* 10/duration365d \* 10/' staging/src/k8s.io/client-go/util/cert/cert.go
sed -i '/CertificateValidity/s/24 \* 365/24 \* 365 \* 10/' cmd/kubeadm/app/constants/constants.go

# 查看
grep -E 'NotAfter:.*now' staging/src/k8s.io/client-go/util/cert/cert.go
grep 'CertificateValidity =' cmd/kubeadm/app/constants/constants.go

# 3. 编译kubeadm
make all WHAT=cmd/kubeadm GOFLAGS=-v

# 编译kubelet
# make all WHAT=cmd/kubelet GOFLAGS=-v
# 编译kubectl
# make all WHAT=cmd/kubectl GOFLAGS=-v
# * 编译完的kubeadm在 _output/bin/kubeadm 目录下，其中bin是使用了软连接，真实路径是_output/local/bin/linux/amd64/kubeadm

mv /usr/bin/kubeadm /usr/bin/kubeadm_bak30
cp -av _output/local/bin/linux/amd64/kubeadm /usr/bin/kubeadm
chmod +x /usr/bin/kubeadm

# 分发到所有节点
for i in ${Kube_MasterAddr[@]} ${Kube_NodeAddr[@]};do
  echo $i
  scp _output/local/bin/linux/amd64/kubeadm root@$i:/usr/bin
done

3.5 初始化集群

# 自动生成 kubeadm 初始化文件，开启审计日志
# 参考：https://kubernetes.io/zh-cn/docs/reference/config-api/kubeadm-config.v1beta3/
# kubeadm config print init-defaults > kubeadm-config.yaml
cd
# kubeadm 初始化配置文件
cat <<EOF | sudo tee kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: cl7nd8.ncpijgupi7tsm6ta
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: ${LocalIPAddr}     # 本机IP
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: `hostname`        # 本主机名
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: kubernetes
etcd:
#  local:
#    dataDir: /var/lib/etcd
  external:  # 使用外部etcd集群
      endpoints:      # etcd节点地址
      - https://${ETCD_ClusterAddr[0]}:2379
      - https://${ETCD_ClusterAddr[1]}:2379
      - https://${ETCD_ClusterAddr[2]}:2379
      caFile: ${ETCD_SSLDir}/etcd-ca.pem
      certFile: ${ETCD_SSLDir}/etcd.pem
      keyFile: ${ETCD_SSLDir}/etcd-key.pem
dns: {}
networking:
  dnsDomain: cluster.local
  serviceSubnet: "${SvcSubnet}"
  podSubnet: "${PodSubnet}"
kubernetesVersion: ${Kube_Version}     # k8s版本
controlPlaneEndpoint: "${Kube_APIServerIP}:${Kube_APIServerPort}"    # 虚拟IP和nginx端口
certificatesDir: /etc/kubernetes/pki
imageRepository: registry.aliyuncs.com/google_containers
apiServer:       # APIServer 初始化参数
  extraArgs:
    default-not-ready-toleration-seconds: "300"
    default-unreachable-toleration-seconds: "300"
    authorization-mode: Node,RBAC
    bind-address: 0.0.0.0
    service-node-port-range: 30000-32767
    kubelet-preferred-address-types: "Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP"
    enable-admission-plugins: NodeRestriction
    profiling: "False"
    request-timeout: 300s
    enable-aggregator-routing: "true"
    allow-privileged: "true"
    audit-policy-file: /etc/kubernetes/audit-policy/apiserver-audit-policy.yaml
    audit-log-path: "/var/log/audit/kube-apiserver-audit.log"
    audit-log-maxage: "30"
    audit-log-maxbackup: "10"
    audit-log-maxsize: "100"
    max-requests-inflight: "800"
    max-mutating-requests-inflight: "400"
    event-ttl: 1h0m0s
    tls-cipher-suites: TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384
  extraVolumes:
  - name: audit-policy
    hostPath: /etc/kubernetes/audit-policy
    mountPath: /etc/kubernetes/audit-policy
  - name: audit-logs
    hostPath: /var/log/kubernetes/audit
    mountPath: /var/log/audit
    readOnly: false
  timeoutForControlPlane: 5m0s
controllerManager:   # ControllerManager 初始化参数
  extraArgs:
    node-monitor-grace-period: "40s"
    node-monitor-period: "5s"
    kube-api-qps: "50"
    kube-api-burst: "100"
    cluster-cidr: "${PodSubnet}"
    service-cluster-ip-range: "${SvcSubnet}"
    node-cidr-mask-size: "24"
    profiling: "false"
    terminated-pod-gc-threshold: "125"
    bind-address: 0.0.0.0
    configure-cloud-routes: "false"
scheduler:      # Scheduler 初始化参数
  extraArgs:
    bind-address: 0.0.0.0
    profiling: "False"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
failSwapOn: true
serializeImagePulls: false    # 配置并行拉取镜像
maxPods: 50
protectKernelDefaults: true  # kubelet 在发现内核参数与预期不符时出错退出。
kubeAPIQPS: 50  # 与apiserver会话时的 QPS
kubeAPIBurst: 100  # 与apiserver会话时的并发数
evictionHard:  # 硬性驱逐阈值
    memory.available:  "100Mi"
    nodefs.available:  "10%"
    nodefs.inodesFree: "5%"
    imagefs.available: "15%"
evictionSoft:  # 软驱逐阈值
    memory.available:  "500Mi"
    nodefs.available:  "20%"
    nodefs.inodesFree: "10%"
    imagefs.available: "20%"
evictionSoftGracePeriod:  # 软性驱逐信号的宽限期限
    memory.available: 1m30s
    nodefs.available: 1m30s
    imagefs.available: 1m30s
    nodefs.inodesFree: 1m30s
evictionMaxPodGracePeriod: 60 # 达到软性逐出阈值可以赋予的宽限期限最大值（按秒计）
containerLogMaxFiles: 5  # 每个容器存在的日志数量
containerLogMaxSize: 100Mi  # 每个日志文件大小
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
EOF


# 注意: 审计配置需在所有master节点创建
# 审计日志配置文件
sudo mkdir -p /etc/kubernetes/audit-policy/
cat <<EOF | sudo tee /etc/kubernetes/audit-policy/apiserver-audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # The following requests were manually identified as high-volume and low-risk,
  # so drop them.
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
      - group: "" # core
        resources: ["endpoints", "services", "services/status"]
  - level: None
    # Ingress controller reads `configmaps/ingress-uid` through the unsecured port.
    # TODO(#46983): Change this to the ingress controller service account.
    users: ["system:unsecured"]
    namespaces: ["kube-system"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["configmaps"]
  - level: None
    users: ["kubelet"] # legacy kubelet identity
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes", "nodes/status"]
  - level: None
    userGroups: ["system:nodes"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes", "nodes/status"]
  - level: None
    users:
      - system:kube-controller-manager
      - system:kube-scheduler
      - system:serviceaccount:kube-system:endpoint-controller
    verbs: ["get", "update"]
    namespaces: ["kube-system"]
    resources:
      - group: "" # core
        resources: ["endpoints"]
  - level: None
    users: ["system:apiserver"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["namespaces", "namespaces/status", "namespaces/finalize"]
  # Don't log HPA fetching metrics.
  - level: None
    users:
      - system:kube-controller-manager
    verbs: ["get", "list"]
    resources:
      - group: "metrics.k8s.io"
  # Don't log these read-only URLs.
  - level: None
    nonResourceURLs:
      - /healthz*
      - /version
      - /swagger*
  # Don't log events requests.
  - level: None
    resources:
      - group: "" # core
        resources: ["events"]
  # Secrets, ConfigMaps, TokenRequest and TokenReviews can contain sensitive & binary data,
  # so only log at the Metadata level.
  - level: Metadata
    resources:
      - group: "" # core
        resources: ["secrets", "configmaps", "serviceaccounts/token"]
      - group: authentication.k8s.io
        resources: ["tokenreviews"]
    omitStages:
      - "RequestReceived"
  # Get responses can be large; skip them.
  - level: Request
    verbs: ["get", "list", "watch"]
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apiextensions.k8s.io"
      - group: "apiregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "metrics.k8s.io"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
    omitStages:
      - "RequestReceived"
  # Default level for known APIs
  - level: RequestResponse
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apiextensions.k8s.io"
      - group: "apiregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "metrics.k8s.io"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
    omitStages:
      - "RequestReceived"
  # Default level for all other requests.
  - level: Metadata
    omitStages:
      - "RequestReceived"
EOF


# 下载镜像
kubeadm config images pull --config kubeadm-config.yaml

# 初始化集群（仅在第一个master执行）,不部署kube-proxy
kubeadm init --config kubeadm-config.yaml --upload-certs --skip-phases=addon/kube-proxy | tee ~/k8s/master_init.log 

# 查看集群证书期限
kubeadm certs check-expiration


# 加载环境变量
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" | sudo tee -a /etc/profile
source /etc/profile

sudo mkdir -p /root/.kube
sudo cp -i /etc/kubernetes/admin.conf /root/.kube/config
sudo chown $(id -u):$(id -g) /root/.kube/config

# 如果初始化失败，可执行kubeadm reset后重新初始化

rm -rf ~/.kube/config
rm -rf /etc/cni/net.d
rm -rf /etc/kubernetes/
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet
kubeadm reset -f
ipvsadm --clear
# 去初始化成功的master重新生成验证消息
kubeadm init phase upload-certs --upload-certs

# 重新生成token
kubeadm token create --ttl=0 --print-join-command | tee -a ~/init/master_init.log
kubeadm token list
# 获取token_hash
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt |openssl rsa -pubin -outform der |openssl dgst -sha256 -hex

4. 常用组件安装（ cilium + metrics ）

4.1 包管理器：Helm

最新版本二进制文件： https://github.com/helm/helm/releases/latest

国内： https://repo.huaweicloud.com/helm/

# 下载文件
helmVersion='v3.14.1'
wget https://repo.huaweicloud.com/helm/${helmVersion}/helm-${helmVersion}-linux-amd64.tar.gz

tar xvf helm-${helmVersion}-linux-amd64.tar.gz
cp -av linux-amd64/helm /usr/local/bin/

# 查看
helm version
helm repo list

4.2 网络组件：Cilium

将 Kubernetes 的 CNI 从其他组件切换为 Cilium，可以有效地提升网络的性能。

# 配置开机自动加载cilium所需相关模块
cat <<EOF | sudo tee /etc/modules-load.d/cilium-base-requirements.conf
cls_bpf
sch_ingress
EOF

systemctl restart systemd-modules-load.service

# 下载Cilium CLI工具
cd ${Soft_SrcDir}
curl -LO https://mirror.ghproxy.com/https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz

# 将可执行文件解压
tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin

# 添加仓库
helm repo add cilium https://helm.cilium.io/

# 下载包
mkdir -p ~/k8s/helm; cd ~/k8s/helm
helm pull cilium/cilium --version 1.14.10 --untar

# 自定义部署参数
cat <<EOF | sudo tee cilium-custom-value.yaml
k8sServiceHost: ${Kube_APIServerIP}
k8sServicePort: ${Kube_APIServerPort}
k8sClientRateLimit:
  # 限制客户端每秒持续请求数
  qps: 15
  # 限制客户端每秒突发请求数
  burst: 30
# 每个节点都知道所有其他节点的所有pod IP，并在Linux内核路由表中插入路由。
autoDirectNodeRoutes: true
# 启用带宽管理器，会将 TCP 拥塞控制算法切换为 BBR。
# BBR 可为互联网流量提供更高的带宽和更低的延迟
bandwidthManager:
  enabled: true
  bbr: true
# IP 地址伪装 (Masquerading) 切换为基于 eBPF 的模式。
bpf:
  # 启用eBPF映射值的预分配。这增加了内存使用量，但可以减少延迟。
  preallocateMaps: true
  # 在eBPF中启用本机IP伪装支持
  masquerade: true
  # 配置基于eBPF的TPROXY，以减少对iptables规则的依赖，从而实现第7层策略。
  tproxy: true
# 启用BPF时钟源探测，以实现更高效的刻度检索。
bpfClockProbe: true
# 配置KVStore的使用，通过将其镜像到KVStore中来优化Kubernetes事件处理，以减少大型集群中的开销。
enableK8sEventHandover: true
# EndpointSlice 是Endpoint API 替代方案。它不跟踪 Service Pod IP 的单个 Endpoint 资源，而是将它们拆分为多个较小的 EndpointSlice。
# EndpointSlice 会跟踪 Service Pod 的 IP 地址、端口、readiness 和拓扑信息。
enableK8sEndpointSlice: true
enableCiliumEndpointSlice: true
# 是否启用gatewayAPI
gatewayAPI:
  enabled: false
# 启用ExternalIP服务支持。
externalIPs:
  enabled: true
# 启用hostPort服务支持。
hostPort:
  enabled: true
# 启用基于套接字的负载均衡
socketLB:
  enabled: true
#  启用 Hubble
hubble:
  relay:
    enabled: true
  ui:
    enabled: true
  metrics:
    enabled:
    - dns:query;ignoreAAAA
    - drop
    - tcp
    - flow
    - icmp
    - http
# IP地址管理（IPAM）负责Cilium管理的网络端点（容器和其他）使用的IP地址的分配和管理。
ipam:
  # 默认使用cilium配置，也可以使用k8s集群已配置
  mode: "cluster-pool"
  operator:
    # IP 地址池，默认为10.0.0.0/8
    clusterPoolIPv4PodCIDRList: [${PodSubnet}] 
# 使用kuernetes集群定义的默认配置
#ipam:
#  mode: "kubernetes"
#k8s:
#  requireIPv4PodCIDR: true
# 是否完全取代kube-proxy
kubeProxyReplacement: true
# 显式指定本机路由的IPv4 CIDR。
ipv4NativeRoutingCIDR: ${PodSubnet}
# Kubernetes NodePort 实现在 DSR(Direct Server Return) 模式下运行
loadBalancer:
  mode: dsr
  # 启用XDP加速
  # acceleration: native
  # 支持基于K8s拓扑感知提示的服务端点筛选
  # serviceTopology: true
# 配置k8s nodePort 服务负载平衡，启用Cilium NodePort服务实现
nodePort:
  enabled: true
# 在/metrics配置的端口上配置prometheus
prometheus:
  enabled: true
# 启用本地路由模式，1.14使用
tunnel: disabled
# 启用本地路由模式，1.15使用
# routingMode: "native"
operator:
  podDisruptionBudget:
    enabled: true
  prometheus:
    enabled: true
# 启用节点初始化DaemonSet
nodeinit:
  enabled: true
cgroup:
  # 是否需要 Cilium 自动创建并挂载 Cgroup v2，
  # 设置为 false 时使用已创建的 Cgroup v2
  autoMount:
    enabled: false
  # 挂载路径，
  hostRoot: /sys/fs/cgroup
EOF

# 将镜像下载到私有镜像仓库
img=`grep -E 'repository:|tag:' cilium/values.yaml |awk -F'"' '{print $2}' |sed -n 'N;s/\n/:/p'`
for i in ${img};do
  nerdctl pull $i
  nerdctl tag $i registry.midland.com.cn/helm/cilium-${i##*/}
  nerdctl push registry.midland.com.cn/helm/cilium-${i##*/}
  sleep 1
done
# 查看镜像
grep 'repository:' cilium/values.yaml
# 修改地址
sed -i '/repository:/s#quay.io/cilium/#registry.midland.com.cn/helm/cilium-#' cilium/values.yaml
sed -i '/repository:/s#quay.io/coreos/#registry.midland.com.cn/helm/cilium-#' cilium/values.yaml

grep 'repository:' cilium/values.yaml


# 部署cilium
helm upgrade --install cilium cilium --version 1.14.10 -f cilium-custom-value.yaml -n kube-system

# 强制重启所有Pod（非系统Pod），使用cilium
kubectl get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true | grep '<none>' | awk '{print "-n "$1" "$2}' | xargs -L 1 -r kubectl delete pod --force --grace-period 0

# 检查 cilium pod 是否已正常启动
kubectl get pods -l k8s-app=cilium -n kube-system -o wide 

# 查看集群状态
cilium status --wait

# 确认是否取代kube-proxy
kubectl exec -it -n kube-system ds/cilium -- cilium status | grep KubeProxyReplacement

# 查看端口转发
kubectl exec -it -n kube-system daemonset/cilium -- cilium service list

4.3 安装Metrics Server

官方地址：https://github.com/kubernetes-sigs/metrics-server/

# 下载最新资源文件
wget https://mirror.ghproxy.com/https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics.yaml

# 修改资源文件
sed -i '/image:/i\        - --kubelet-insecure-tls ' metrics.yaml

grep "image:" metrics.yaml
# 修改成阿里  registry.aliyuncs.com/google_containers/
sed -ri 's@(.*image:) .*metrics-server(/.*)@\1 registry.aliyuncs.com/google_containers\2@g' metrics.yaml
grep "image:" metrics.yaml

# 安装metrics server
kubectl apply -f metrics.yaml

4.4 安装kuboard

官方安装文档：https://www.kuboard.cn/install/v3/install-static-pod.html

# 使用 static pod 的方式将 kuboard 安装在 k8s master 节点上
curl -fsSL https://addons.kuboard.cn/kuboard/kuboard-static-pod.sh -o kuboard.sh
# 修改镜像
sed -i '/image:/s#eipwork#swr.cn-east-2.myhuaweicloud.com/kuboard#' kuboard.sh
# 添加资源限制
sed -i '/volumeMounts:/i\      resources:\
        limits:\
          cpu: 80m\
          memory: 500Mi\
        requests:\
          cpu: 80m\
          memory: 500Mi\
' kuboard.sh
# 安装
bash kuboard.sh

# 浏览器访问 http://${masterIP}:80
# 用户名： admin
# 密 码： Kuboard123

# 创建secret
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: kuboard-admin
  name: kuboard-admin-token
  namespace: kuboard
type: kubernetes.io/service-account-token
EOF

# 获取Token，在Kuboard登录界面添加集群，写入Token字段
kubectl -n kuboard get secret kuboard-admin-token -o go-template='{{.data.token}}' | base64 -d

4.5 安装Higress & Kruise Rollout

# 部署Higress & Kruise Rollout实现金丝雀发布
# k8s集群部署Higress
helm repo add higress https://higress.io/helm-charts
# 下载
helm pull higress/higress --untar
# 安装
helm install higress higress -n higress-system --create-namespace

# 部署Kruise Rollout
# Firstly add openkruise charts repository if you haven't do this.
helm repo add openkruise https://openkruise.github.io/charts/

# [Optional]
helm repo update
helm pull openkruise/kruise-rollout --untar
# 安装
helm install kruise-rollout kruise-rollout

# 创建测试用例
# workload-canary-demo.yaml
cat <<EOF | sudo tee workload-canary-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
  labels:
    app: echoserver
spec:
  replicas: 3
  selector:
    matchLabels:
      app: echoserver
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
      - name: echoserver
         # Mac M1 should choics image can support arm64,such as image e2eteam/echoserver:2.2-linux-arm64
        image: openkruise-registry.cn-shanghai.cr.aliyuncs.com/openkruise/demo:1.10.2
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
        env:
        - name: NODE_NAME
          value: version1
        - name: PORT
          value: '8080'
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
---
apiVersion: v1
kind: Service
metadata:
  name: echoserver
  labels:
    app: echoserver
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: echoserver
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echoserver
spec:
  ingressClassName: higress
  rules:
    - host: vsynn.com
      http:
        paths:
          - path: /apis/echo
            pathType: Exact
            backend:
              service:
                name: echoserver
                port:
                  number: 80
EOF

# 创建服务
kubectl apply -f workload-canary-demo.yaml

# 可以使用 Rollout 来进行金丝雀发布了，定义一个如下所示的 Rollout 对象：
cat <<EOF| sudo tee rollout-canary-demo.yaml
# rollout-canary-demo.yaml
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-canary-demo
  namespace: default
  annotations:
    rollouts.kruise.io/rolling-style: partition
spec:
  objectRef:
    workloadRef:
      apiVersion: apps/v1
      kind: Deployment
      name: echoserver
  strategy:
    canary: # 金丝雀发布策略
      steps: # 步骤
        - weight: 10 # 导入10%的流量
          pause: { duration: 60 } # {}中没有内容则表示人工确认
          replicas: 1 # 第一批发布的副本个数
        - weight: 40 # 导入40%的流量，没有配置 replicas，也表示发布40%的副本数
          pause: { duration: 120 } # 不需要人工确认，等待120s继续下一批
          replicas: 40%
        - weight: 70
          pause: { duration: 60 }
          replicas: 70%
        - weight: 100
          pause: { duration: 10 }
          replicas: 100%
      trafficRoutings: # 流量路由
        - service: echoserver
          ingress:
            name: echoserver
EOF

# 直接部署
kubectl apply -f rollout-canary-demo.yaml
# 检查
kubectl get rollout rollouts-canary-demo

# 获取 higress 访问端口
kubectl -n higress-system get svc higress-gateway -ojsonpath='{.spec.ports[0].nodePort}'

# 访问测试，将Ingress中的域名写入/etc/hosts，解析的主机是任意的集群节点
echo "10.7.0.21 vsynn.com" |sudo tee -a /etc/hosts
curl -s http://vsynn.com:31245/apis/echo

# 修改服务A的Deployment中镜像为demo:1.10.3，观察相关资源变化。
# 为清晰展示灰度效果，将value改为version2。
sed -i 's/value: version1/value: version2/' workload-canary-demo.yaml
sed -i 's/demo:1.10.2/demo:1.10.3/' workload-canary-demo.yaml
# 更新
kubectl apply -f workload-canary-demo.yaml

# 检查状态
kubectl get rollout rollouts-canary-demo

# 访问测试，检查version1和version2的流量是否符合
while true; do curl -s http://vsynn.com:31245/apis/echo |grep node; sleep 0.5; done


# 如果有手动确认，需安装更新工具 kubectl-kruise
wget https://mirror.ghproxy.com/https://github.com/openkruise/kruise-tools/releases/download/v1.1.0/kubectl-kruise-linux-amd64.tar.gz
tar xvf kubectl-kruise-linux-amd64.tar.gz
mv linux-amd64/kubectl-kruise /usr/local/bin/

# 查看rollout资源状态，发现当前执行完第一批发布，并且处于暂停状态，需要人工确认才能继续下一批次发布。
# 如果在Rollout过程中，发现新版本服务异常，可以通过Deployment配置恢复到之前版本。
# 查看资源状态
kubectl get rollout

# 手动同意更新
kubectl-kruise rollout approve rollout/rollouts-nginx

# 查看rollout资源状态，处于暂停状态，需要等待300s才能继续下一批次发布。
kubectl get rollout

4.6 安装Prometheus

官方github地址：https://github.com/prometheus-operator/kube-prometheus

使用 Helm 部署 Prometheus + Grafana

# 添加helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community

# 下载charts
helm pull prometheus-community/kube-prometheus-stack --untar

#  注: 部分镜像无法下载，使用镜像加速即可
# 查看正在使用的镜像
helm template prom kube-prometheus-stack/ |grep "image:"
# 添加m.daocloud.io镜像加速
for img in `grep -Er 'registry: quay.io|registry.k8s.io' kube-prometheus-stack | awk -F: '{print $1}'`;do
  sed -i 's#registry: registry.k8s.io#registry: m.daocloud.io/registry.k8s.io#' $img
  sed -i 's#registry: quay.io#registry: m.daocloud.io/quay.io#' $img
done
# 查看正在使用的镜像
helm template prom kube-prometheus-stack/ |grep "image:"

cat <<EOF |sudo tee prom-custom-values.yml
grafana:
  defaultDashboardsTimezone: Asia/Shanghai
  adminPassword: prom-operator
# 非Pod部署ControllerManager需要指定ip
kubeControllerManager:
  endpoints:
  - 10.1.3.71
  - 10.1.3.72
  # - 10.141.4.24
# 非Pod部署 ETCD 需要指定ip
kubeEtcd:
  endpoints:
  - 10.1.3.71
  - 10.1.3.72
  - 10.1.3.74
  service:
    port: 2380
    targetPort: 2380
# 非Pod部署 kubeScheduler 需要指定ip
kubeScheduler:
  endpoints:
  - 10.1.3.71
  - 10.1.3.72
EOF

kubectl create ns monitoring
# 安装
helm install prom kube-prometheus-stack/ -f prom-custom-values.yml -n monitoring

# 检查
kubectl -n monitoring get svc
# 待Running，查看Service，将Prometheus和grafana修改成NodePort
kubectl -n monitoring patch svc prom-grafana --patch '{"spec":{"type":"NodePort"}}'
kubectl -n monitoring patch svc prom-kube-prometheus-stack-prometheus --patch '{"spec":{"type":"NodePort"}}'

# 查看Port
kubectl -n monitoring get svc
# 使用端口访问Prometheus和grafana
# grafana默认账号密码：admin prom-operator

4.7 安装Longhorn

# 这里给集群每个节点准备一块数据盘，挂载到指定的数据目录上作为longhorn的存储目录；
mkfs.xfs /dev/vdb
mkdir -p /data/longhorn
echo "/dev/vdb /data/longhorn xfs defaults 0 0" >> /etc/fstab
mount -a

# 在所有集群节点安装依赖
# rhel
yum install -y iscsi-initiator-utils nfs-utils && systemctl enable iscsid --now

# ubuntu
# sudo apt-get -y install open-iscsi && sudo systemctl enable iscsid --now

# helm安装Longhorn
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm pull longhorn/longhorn --untar

# 自定义Values配置文件：longhorn-values.yaml
cat <<EOF |sudo tee longhorn-values.yaml
service:
  ui:
    type: NodePort   # 需要在集群外看到页面，调整webUI访问模式为nodeport,默认cluster
    nodePort: 30012  # 设置访问端口

persistence:
  defaultFsType: xfs # 默认ext4,这里改成和本地的磁盘格式一致即可
  defaultClassReplicaCount: 2 # 因为master不参与调度，所以卷副本数改成2,和node节点数一样
  reclaimPolicy: Retain

defaultSettings:
  defaultDataPath: "/data/longhorn" # 设置默认数据目录,默认存放到/var/lib/longhorn
EOF

# 安装longhorn
kubectl create ns longhorn
helm -n longhorn install longhorn longhorn -f longhorn-values.yaml

# 确保所有pod均为running状态表示部署成功
kubectl -n longhorn get pod  

# 待所有Pod Running后，创建测试pvc，检查是否正常提供服务
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  storageClassName: longhorn
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi
EOF
# 检查pv，pvc是否创建、绑定
kubectl get pv,pvc

## 访问集群任意节点的30012打开 Web UI 页面查看仪表盘

4.8 安装NPD（Node-Problem-Detector）

NPD（Node-Problem-Detector）是一个守护程序，用于监视和报告节点的健康状况（包括内核死锁、OOM、系统线程数压力、系统文件描述符压力等指标）。NPD从各种守护进程收集节点问题，并以 NodeCondition 和 Event 的形式报告给 API Server。

GitHub地址： https://github.com/kubernetes/node-problem-detector

# 添加仓库
helm repo add deliveryhero https://charts.deliveryhero.io/

helm update
helm pull deliveryhero/node-problem-detector --untar

# 修改镜像仓库，使用daocloud加速
sed -i 's#registry.k8s.io#m.daocloud.io/registry.k8s.io#' node-problem-detector/values.yaml

# 检查镜像
grep -A5 image: node-problem-detector/values.yaml 

# 安装
helm install node-problem-detector node-problem-detector -n kube-system 


# 模拟测试,选择一台机器操作（这里选择k8s-node-1）
sudo sh -c "echo 'kernel: BUG: unable to handle kernel NULL pointer dereference at TESTING' >> /dev/kmsg"
sudo sh -c "echo 'kernel: INFO: task docker:20744 blocked for more than 120 seconds.' >> /dev/kmsg"
# master检查node节点状态，查看是否有对应的Events
kubectl describe no k8ss-node-1

4.9 LoadBalancer：metallb

MetalLB挂载到Kubernetes集群，并提供网络负载平衡器实现。它允许在非云服务上运行的集群中创建LoadBalancer类型的Kubernetes服务，因为此时无法方便地连接到(云服务商的)付费产品来提供负载平衡器。

官网地址：https://metallb.universe.tf/installation/

# 添加仓库
helm repo add metallb https://metallb.github.io/metallb

# 下载
helm repo update
helm pull metallb/metallb --untar

# 安装
helm install metallb metallb -n kube-system

# 定义IP地址池
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: cheap
  namespace: kube-system
spec:
  addresses:
  - 10.7.0.66/26
  # - 10.7.0.3-10.7.0.19
---
# 配置 Layer2 模式
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: cheap
  namespace: kube-system
spec:
  ipAddressPools:
  - cheap
EOF

# （选做）MetalLB 可以使用注解选择特定的地址池和固定IP地址
cat <<EOF |sudo tee nginx-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    metallb.universe.tf/address-pool: cheap
    metallb.universe.tf/loadBalancerIPs:  10.7.0.64
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer
EOF

# 测试，注：Higress 默认使用 LoadBalancer 方式
# 获取 Higress 使用的 LoadBalancer IP地址
kubectl -n higress-system get svc higress-gateway -ojsonpath='{.status.loadBalancer.ingress[0].ip}'

# 修改4.5节中 echoserver 的ingress的解析
echo '10.7.0.64 vsynn.com' >> /etc/hosts

# 多次访问检测 LoadBalancer 是否正常
curl vsynn.com/apis/echo