通过爬虫快速获取可用代理IP

做安全测试时经常需要通过切换IP来探测或者绕过一些安全防护策略，有一些网站会提供免费或者付费的代理IP，而无论是免费还是付费的都不能完全保证代理服务器的可用性，如果一个个手动尝试将会是一件很痛苦的事情。因此我们可以通过脚本，自动化地从这些网站上抓取代理IP并测试其可用性，最终过滤出一批可用的代理IP。

代码托管在Github

Introduction

Proxy Server Crawler is a tool used to crawl public proxy servers from proxy websites. When crawled a proxy server(ip::port::type), it will test the functionality of the server automatically.

Currently supported websites:

Currently supported testing(for http proxy)

ssl support
post support
speed (tested with 10 frequently used sites)
type(high/anonymous/transparent)

Requirements

Python >= 2.7
Scrapy 1.3.0 (not tested for lower version)
node (for some sites, you need node to bypass waf based on javascript)

Usage

cd proxy_server_crawler
scrapy crawl chunzhen

log

[ result] ip: 59.41.214.218  , port: 3128 , type: http, proxy server not alive or healthy.
[ result] ip: 117.90.6.67    , port: 9000 , type: http, proxy server not alive or healthy.
[ result] ip: 117.175.183.10 , port: 8123 , speed: 984 , type: high
[ result] ip: 180.95.154.221 , port: 80   , type: http, proxy server not alive or healthy.
[ result] ip: 110.73.0.206   , port: 8123 , type: http, proxy server not alive or healthy.
[  proxy] ip: 124.88.67.54   , port: 80   , speed: 448 , type: high       , post: True , ssl: False
[ result] ip: 117.90.2.149   , port: 9000 , type: http, proxy server not alive or healthy.
[ result] ip: 115.212.165.170, port: 9000 , type: http, proxy server not alive or healthy.
[  proxy] ip: 118.123.22.192 , port: 3128 , speed: 769 , type: high       , post: True , ssl: False
[  proxy] ip: 117.175.183.10 , port: 8123 , speed: 908 , type: high       , post: True , ssl: True

License

The MIT License (MIT)

通过爬虫快速获取可用代理IP

Introduction

Requirements

Usage

log

License

xelz

引用和评论

🔥全程不用写代码，我用 AI 程序员写了一个飞机大战

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

科学计算编程涉及到的技术栈简介

Python 描述符

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时

Python3 格式化时间（qbit）