Linux 系统运维实战手册

——从基础运维到高可用架构的全栈实践‌

一、系统初始化与基础配置

1. 系统初始化脚本（基于CentOS 8）
# 基础安全加固

sed -i 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config  
systemctl stop firewalld && systemctl disable firewalld  
yum install -y iptables-services && systemctl enable iptables

基础环境配置

echo "export HISTTIMEFORMAT=\"%F %T \"" >> /etc/profile  
echo "export TMOUT=600" >> /etc/profile  
source /etc/profile

内核参数优化

cat << EOF >> /etc/sysctl.conf  
net.core.somaxconn = 65535  
vm.swappiness = 10  
fs.file-max = 6553500  
EOF  
sysctl -p

核心功能：

SELinux策略调整
命令行历史记录增强
内核级TCP连接优化
2. 用户与权限管理

批量创建用户脚本‌：

bash
Copy Code
# users.csv 示例：  
# username,uid,group  
devops,2001,admin  
webadmin,2002,www  

while IFS=, read -r user uid group; do  
    groupadd -g $uid $group  
    useradd -u $uid -g $group -s /bin/bash $user  
    echo "$user:$(openssl rand -base64 12)" | chpasswd  
done < users.csv

安全实践：

使用随机密码初始化
UID/GID统一规划
用户组权限隔离

二、日常运维操作指南

1. 日志分析黄金命令集

    场景‌    ‌命令组合‌
    实时监控Nginx访问日志    tail -f /var/log/nginx/access.log | awk '$9>400{print $1,$7,$9}'
    查找大文件    find / -type f -size +500M -exec ls -lh {} \; 2>/dev/null
    统计TCP连接状态    netstat -ant | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

2. 磁盘管理紧急处理

LVM扩容操作流程‌：


# 查看卷组空间  
vgs  

# 扩展物理卷  
pvcreate /dev/sdb1  
vgextend centos /dev/sdb1  

# 扩展逻辑卷  
lvextend -L +50G /dev/centos/root  
xfs_growfs /dev/centos/root  # XFS文件系统专用

三、安全加固与入侵检测

1. SSH安全防护方案

bash
Copy Code
# /etc/ssh/sshd_config 关键配置  
Port 57222                           # 非标准端口  
PermitRootLogin no                   # 禁止root登录  
MaxAuthTries 3                       # 最大认证尝试次数  
ClientAliveInterval 300              # 会话超时设置  
AllowUsers opsadmin@192.168.1.0/24   # IP白名单限制

生效命令： systemctl restart sshd

2. 入侵检测系统（IDS）部署

基于Fail2ban的防护配置‌：


ini
Copy Code
# /etc/fail2ban/jail.d/nginx-cc.conf  
[nginx-cc]  
enabled = true  
port = http,https  
filter = nginx-cc  
action = iptables-multiport[name=nginx-cc, port="http,https"]  
logpath = /var/log/nginx/access.log  
maxretry = 50  
findtime = 600  
bantime = 3600

监控指标：

实时封禁IP列表：fail2ban-client status nginx-cc

四、性能调优与故障排查

1. 系统性能分析工具栈

bash
Copy Code
# CPU瓶颈分析  
mpstat -P ALL 1  

# 内存泄漏检测  
vmstat 1 10  

# IO负载观测  
iostat -xmt 1  

# 综合监控仪表盘  
nmon

2. 内核级网络优化

TCP协议栈调优参数‌：

bash
Copy Code
# /etc/sysctl.conf  
net.ipv4.tcp_fin_timeout = 30  
net.ipv4.tcp_tw_reuse = 1  
net.ipv4.tcp_max_syn_backlog = 8192  
net.core.netdev_max_backlog = 50000

五、自动化运维体系构建

1. Ansible核心场景示例

批量更新系统补丁‌：

yaml
Copy Code
- name: Security Patch Update  
  hosts: webservers  
  become: yes  
  tasks:  
    - name: Update security packages  
      yum:  
        name: '*'  
        security: yes  
        update_cache: yes  
      register: yum_result  

    - name: Reboot if kernel updated  
      reboot:  
        msg: "Kernel updated, rebooting..."  
        pre_reboot_delay: 30  
      when: "'kernel' in yum_result.changes"

2. 监控告警体系架构

text
Copy Code
                          ┌──────────────┐  
                          │ Prometheus   │  
                          │ (数据采集存储) │  
                          └──────┬───────┘  
                                 │  
                  ┌──────────────┴──────────────┐  
                  ▼                             ▼  
           ┌──────────────┐             ┌──────────────┐  
           │ Alertmanager │             │ Grafana      │  
           │ (告警路由管理) │             │ (可视化仪表盘) │  
           └──────┬───────┘             └──────────────┘  
                  │  
                  ▼  
           ┌──────────────┐  
           │ 企业微信/钉钉 │  
           │ (告警通知渠道) │  
           └──────────────┘

六、灾备与高可用方案

1. 数据库主从同步配置

MySQL主从配置要点‌：

# 主库配置  
[mysqld]  
server-id=1  
log-bin=mysql-bin  
binlog_format=row  

# 从库配置  
[mysqld]  
server-id=2  
relay-log=mysql-relay-bin  
read_only=1  


同步状态检查命令： SHOW SLAVE STATUS\G

2. Keepalived高可用方案
bash
Copy Code
# keepalived.conf 主节点配置  
vrrp_instance VI_1 {  
    state MASTER  
    interface eth0  
    virtual_router_id 51  
    priority 100  
    advert_int 1  
    authentication {  
        auth_type PASS  
        auth_pass 1111  
    }  
    virtual_ipaddress {  
        192.168.1.100/24  
    }  
}

运维能力进阶路线‌：

掌握Linux性能优化圣经：《Systems Performance: Enterprise and the Cloud》
熟练使用eBPF进行深度内核观测（BCC工具集）
构建完整的CI/CD流水线（Jenkins + GitLab + Artifactory）
学习云原生运维体系（Kubernetes + Istio + Prometheus）

通过本手册的系统实践，可帮助运维工程师构建从基础维护到架构优化的完整知识体系。建议重点关注自动化运维与可观测性体系的建设，这是现代云原生运维的核心竞争力。

Linux 系统运维实战手册