拉勾网 限制 反爬

from ip_pool import get_ip
import requests
headers={"Cookie":'_ga=GA1.2.174518896.1523111183; user_trace_token=20180407222623-a5c90692-3a6f-11e8-b740-5254005c3644; LGUID=20180407222623-a5c90b3f-3a6f-11e8-b740-5254005c3644; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22167a6ed15993d2-015970814fc80b-35667607-2073600-167a6ed159a938%22%2C%22%24device_id%22%3A%22167a6ed15993d2-015970814fc80b-35667607-2073600-167a6ed159a938%22%7D; index_location_city=%E5%8C%97%E4%BA%AC; JSESSIONID=ABAAABAAAGFABEF2514709505FB85F0FC824310BC7C43F2; _gid=GA1.2.1492847185.1548121054; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1546789367; TG-TRACK-CODE=index_search; SEARCH_ID=8cc1b952a94a496892284ac7a525daea; _gat=1; LGSID=20190123153732-bedacc90-1ee1-11e9-9486-525400f775ce; PRE_UTM=; PRE_HOST=; PRE_SITE=; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2Fgongsi%2F0-1-0-0; LG_LOGIN_USER_ID=d809bbbe54ac48bf0a9ce5888befc8dbdd72485efb1d041a; _putrc=528CDA7A1053B994; login=true; unick=%E5%B2%B3%E5%BA%B7; showExpriedIndex=1; showExpriedCompanyHome=1; showExpriedMyPublish=1; hasDeliver=138; gate_login_token=b729a3ea436639fccaac9cdae984ae92c4562ed3d14bb148; LGRID=20190123153826-ded128bf-1ee1-11e9-b748-5254005c3644; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1548229075',"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36","Referer": "https://www.lagou.com/gongsi/0-1-0-0","Origin": "https://www.lagou.com","Host": "www.lagou.com"}
form_data={'first': 'false', 'pn': '3', 'sortField': '0', 'havemark': '0'}
res = requests.get('https://www.lagou.com/gongsi/0-1-0-0.json', headers=headers, data=form_data,proxies=get_ip())
print(res.text)


clothes git:(ss) ✗ python3 lagou.py
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"114.238.130.249","state":2402}

➜ clothes git:(ss) ✗ python3 lagou.py
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"125.109.192.20","state":2402}

➜ clothes git:(ss) ✗ python3 lagou.py
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"180.126.205.239","state":2402}

➜ clothes git:(ss) ✗ python3 lagou.py
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"121.239.239.172","state":2402}

➜ clothes git:(ss) ✗ python3 lagou.py
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"117.62.104.126","state":2402}

ip 每次都换,开始被反爬了我就换ip cookie 换了还是这样,求大神这是为啥?

阅读 4.3k
5 个回答

你的 ip 池是透明代理还是匿名代理还是高匿代理,这个是有区别的。

透明代理而言,转发至服务器的数据包里会记录发起方 ip 的。

再一个如果某个代理池(特别是免费的)被多数人使用过,那么是非常容易上黑名单的,服务端发现你 ip 来自某个黑名单代理池就直接拒绝服务,或者给你一个 操作太频繁 这种模糊的说明,毕竟对服务器来说告诉你真实原因是没必要的,糊弄一下就行了。

花点钱买个收费代理吧,效果好很多。

页面爬取要模拟正常的请求先生成可请求的cookie之后,再去请求需要爬取的页面,你现在这种情况是cookie已经失效

新手上路,请多包涵

想问一下,后来如何解决的

做下代理,还有爬取时间隔 也不能太频繁,服务端很容易 识别 封杀
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题