请求头:
Host: pubs.rsc.org
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://www.baidu.com/s?wd=rsc&rsv_spt=1&issp=1&f=8&rsv_bp=0&rsv_idx=2&ie=utf-8&tn=monline_5_dg
Cookie: ShowEUCookieLawBanner=true; __utma=245083418.574632152.1421932930.1425088577.1425088577.1; __utmz=245083418.1425088577.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); OAX=cFSCsVTA+KgABIGI; ki_t=1421932573619%3B1425001916480%3B1425039683552%3B11%3B64; ki_r=; _ga=GA1.2.574632152.1421932930; __atuvc=2%7C4%2C1%7C5%2C8%7C6%2C0%7C7%2C10%7C8; X-Mapping-hhmaobcf=3D85FAB7A4929E1DC99D1C005DA6212A; ASP.NET_SessionId=e2rq4u5ezbhocigv02b1mpjn
Connection: keep-alive
这里面有几个必须带的?
除了这些,还有哪些注意的?
......................补充..........................
功能:用scrapy去爬数据
用个phantomjs老老实实模拟操作 如果你不想去算各种防机器人的参数。。phantomjs本身就是无界面浏览器