我按照如下代码打算使用代理ip进行抓取数据,但是始终抓取失败,不知道原因是什么?
php
应用代理的核心代码如下:
curl_setopt_array([
CURLOPT_HTTPPROXYTUNNEL => true ,
// 代理认证
CURLOPT_PROXYAUTH => CURLAUTH_BASIC ,
// 代理类型
CURLOPT_PROXYTYPE => CURLPROXY_HTTP ,
// 代理ip
CURLOPT_PROXY => '106.75.164.15' ,
// 代理端口
CURLOPT_PROXYPORT => 3128
]);
我的代理是从 西刺免费代理IP 中 http
代理里面获取的,没有一个是能够正常抓取到数据的!
请问的 curl 该如何使用代理ip抓取数据?
完整代码如下
代理 ip 列表
[
[
'ip' => '106.75.164.15' ,
'port' => '3128' ,
'type' => CURLPROXY_HTTP ,
] ,
[
'ip' => '221.237.51.128' ,
'port' => '31447' ,
'type' => CURLPROXY_HTTP
] ,
[
'ip' => '60.169.220.111' ,
'port' => '808' ,
'type' => CURLPROXY_HTTP
] ,
[
'ip' => '221.6.138.154' ,
'port' => '30893' ,
'type' => CURLPROXY_HTTP
] ,
[
'ip' => '223.84.179.198' ,
'port' => '58202' ,
'type' => CURLPROXY_HTTP
]
]
抓取代码
function ajax($url , $data = [] , $header = []){
static $loop_count = 0;
static $disabled = [];
if (empty($url)) {
throw new Exception("请提供待爬取的 url");
}
$ip_list = config('app.proxy');
if ($loop_count == 0) {
foreach ($ip_list as $v)
{
if (in_array($v['ip'] . ':' . $v['port'] , $disabled)) {
$loop_count++;
continue ;
}
}
}
if (!isset($ip_list[$loop_count])) {
exit("很遗憾,所有代理 ip 全部被封!");
}
$url = is_string($url) ? $url : $url['url'];
$method = is_string($url) ? 'get' : strtolower($url['method']);
$res = curl_init();
$cur = $ip_list[$loop_count];
curl_setopt_array($res , [
CURLOPT_RETURNTRANSFER => true ,
CURLOPT_HEADER => false ,
CURLOPT_URL => $url ,
// 要发送的请求头
CURLOPT_HTTPHEADER => $header ,
CURLOPT_POST => $method == 'post' ,
CURLOPT_POSTFIELDS => $data,
// user-agent 必须携带!
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" ,
// 要携带的 cookie,不知道能够坚持多久??
CURLOPT_COOKIE => "_lxsdk_cuid=166527efb46c8-09383c52b934ce-333b5602-1fa400-166527efb46c8; _lxsdk=166527efb46c8-09383c52b934ce-333b5602-1fa400-166527efb46c8; _hc.v=46801334-09d9-22c2-4af0-c28dc5058ce4.1538982346; aburl=1; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; s_ViewType=10; cy=261; cye=anshun; _lxsdk_s=166527efb48-047-aa7-f72%7C%7C574" ,
CURLOPT_SSL_VERIFYPEER => false ,
// 启用 http 代理隧道
CURLOPT_HTTPPROXYTUNNEL => true ,
CURLOPT_PROXYAUTH => CURLAUTH_BASIC ,
CURLOPT_PROXYTYPE => $cur['type'] ,
CURLOPT_PROXY => $cur['ip'] ,
CURLOPT_PROXYPORT => $cur['port'] ,
]);
$str = curl_exec($res);
if ($str == false) {
$disabled[] = $cur['ip'] . ':' . $cur['port'];
$loop_count++;
return ajax($url , $data , $header);
}
return $str;
}
ip被封杀?