各位大老们,帮我看一下,这种爬虫要怎么封禁它?

Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.929 {89b6c95b89b115182c47ac1cc023299d} 404 "GET https www.xabaotu.com /search/ht6ebug6by.html HTTP/1.1" 140.884, 202.21.110.110, "https://www.xabaotu.com/search/d4061ed1e3.html", "Mozilla/5.0 (Linux; Android 10; M2004J19C) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Mobile Safari/537.36 OPR/54.2.2672.50007", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.945 {5eefaf5c89b115182d47ac1c210b0c16} 404 "GET https www.xabaotu.com /search/yt279722by.html HTTP/1.1" 133.088, 114.7.9.102, "https://www.xabaotu.com/search/cbd92dcb39.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.1.1140 Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.954 {5ead465d89b115182e47ac1ccd09b98f} 404 "GET https www.xabaotu.com /search/b4eehmn080.html HTTP/1.1" 126.507, 196.202.217.10, "https://www.xabaotu.com/search/bdf5567a16.html", "Mozilla/5.0 (Linux; arm_64; Android 9; SM-G955F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.328 YaBrowser/24.1.0.328.00 SA/3 Mobile Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.966 {488cfa5d89b115182f47ac1cb8e93c91} 404 "GET https www.xabaotu.com /search/2q83kagz0a.html HTTP/1.1" 174.706, 177.220.237.178, "https://www.xabaotu.com/search/57c565a6cd.html", "Mozilla/5.0 (Linux; arm; Android 12; M2101K7BL) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.133 YaBrowser/23.11.5.133.00 SA/3 Mobile Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.973 {9546675e89b115183047ac1c0e6f64bc} 404 "GET https www.xabaotu.com /search/j9pqg57htz.html HTTP/1.1" 130.695, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.973 {7ab4675e89b115183147ac1c2ac397e1} 404 "GET https www.xabaotu.com /search/3vlruk127z.html HTTP/1.1" 155.420, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.974 {182e725e89b115183247ac1c3d2caa5b} 404 "GET https www.xabaotu.com /search/m3im5t2tgj.html HTTP/1.1" 121.907, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.975 {3d107d5e89b115183347ac1c54066c6a} 404 "GET https www.xabaotu.com /search/9wlh6k06zr.html HTTP/1.1" 95.408, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.975 {8ab2855e89b115183447ac1c8775250d} 404 "GET https www.xabaotu.com /search/qpc2monebf.html HTTP/1.1" 96.130, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.976 {4aac905e89b115183547ac1c063bbbe5} 404 "GET https www.xabaotu.com /search/buya6t1gop.html HTTP/1.1" 90.579, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.977 {e9ec9b5e89b115183647ac1cd4ba7757} 404 "GET https www.xabaotu.com /search/s3en98pbvj.html HTTP/1.1" 89.707, 182.23.41.226, "https://www.xabaotu.com/search/c04bb881c9.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.635 (beta) Yowser/2.5 Safari/537.36", 65, "Not Found", ""
Dec 30 00:07:14 VM-0-3-ubuntu server[3689789]: 2024-12-30 00:07:14.977 {e51a9c5e89b115183847ac1ce6f39668} 404 "GET https www.xabaotu.com /search/cil2i8cenk.html HTTP/1.1" 89.507, 202.57.210.65, "https://www.xabaotu.com/search/57c565a6cd.html", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; Chromium GOST) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36", 65, "Not Found", ""
阅读 450
1 个回答

建议使用行为分析结合IP封禁

1.行为分析:

通过分析访问日志,识别出异常的访问模式。例如,短时间内大量的404错误请求、频繁访问特定页面等。

步骤1:收集访问日志

  • 确定你的服务器记录所有访问日志,包括IP地址、请求路径、时间戳、User-Agent等信息。

步骤2:分析日志

  • 使用脚本或日志分析工具(如AWStats、GoAccess)来分析访问日志,识别异常行为。例如,短时间内大量的404错误请求、频繁访问特定页面等。

示例代码(Python):

使用Python脚本分析日志文件,识别频繁访问404页面的IP地址。
日志文件logfile.log改成自己的
生成的ip文档suspicious_ips.txt名字自定义

analyze_logs.py

import re
from collections import defaultdict

# 日志文件路径
log_file = 'path/to/your/logfile.log'

# 正则表达式模式
ip_pattern = re.compile(r'(\d+\.\d+\.\d+\.\d+)')
status_pattern = re.compile(r'\" (\d{3}) ')

# 记录IP请求次数和404错误次数
ip_requests = defaultdict(int)
ip_404s = defaultdict(int)

# 读取日志文件并分析
with open(log_file, 'r') as file:
    for line in file:
        ip_match = ip_pattern.search(line)
        status_match = status_pattern.search(line)
        if ip_match and status_match:
            ip = ip_match.group(1)
            status = status_match.group(1)
            ip_requests[ip] += 1
            if status == '404':
                ip_404s[ip] += 1

# 设定阈值
request_threshold = 100
error_threshold = 50

# 识别可疑IP地址
suspicious_ips = [ip for ip, count in ip_requests.items() if count > request_threshold or ip_404s[ip] > error_threshold]

# 输出可疑IP地址
print("Suspicious IPs:", suspicious_ips)

# 将可疑IP地址写入文件
with open('suspicious_ips.txt', 'w') as f:
    for ip in suspicious_ips:
        f.write(ip + '\n')

2.IP封禁

一旦识别出异常行为,将可疑IP地址添加到Nginx配置文件中进行封禁。

更新Nginx配置

这里介绍两种方式

1.手动更新

在Nginx配置文件中添加以下内容,并创建 suspicious_ips.conf文件名字自定义,路径改成自己的:

http {
    ...
    geo $block_ip {
        default 0;
        include /path/to/suspicious_ips.conf;
    }

    server {
        ...
        if ($block_ip) {
            return 403;
        }
    }
}

2.自动化更新
使用Python脚本将 suspicious_ips.txt 文件中的IP地址添加到 suspicious_ips.conf 文件中:

update_nginx_conf.py

with open('suspicious_ips.txt', 'r') as f:
    ips = f.readlines()

with open('/path/to/suspicious_ips.conf', 'w') as f:
    for ip in ips:
        f.write(f"{ip.strip()} 1;\n")

3.更新防火墙规则

使用 iptables 封禁可疑IP地址:rules.v4文件名自定义

while IFS= read -r ip; do
    sudo iptables -A INPUT -s "$ip" -j DROP
done < /path/to/suspicious_ips.txt

sudo iptables-save > /etc/iptables/rules.v4

4.重载Nginx配置

sudo nginx -s reload

补充

自动化整个过程

将上述步骤整合到一个脚本中:

update_security.sh

#!/bin/bash

python3 /path/to/your/analyze_logs.py
python3 /path/to/your/update_nginx_conf.py

while IFS= read -r ip; do
    sudo iptables -A INPUT -s "$ip" -j DROP
done < /path/to/suspicious_ips.txt

sudo nginx -s reload
sudo iptables-save > /etc/iptables/rules.v4

设置定时任务

crontab -e

# 添加以下行,每小时运行一次脚本
0 * * * * /path/to/update_security.sh
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题
宣传栏