Can we set a proxy for the spider using the scrapy_splash?

When I implemented a spider using Scrapy, I wanted to change the proxy of it so that the server wouldn't forbid my request according to the frequent requests from an ip. I also knew how to change the proxy with Scrapy, using middlewares or directly change the meta when I request.

However, I used the package scrapy_splash to execute the Javascript for my spider, then I found it difficult to change the proxy because in my opinion, the scrapy_splash use a proxy server to render the JS of the website for us.

In fact, when I only use Scrapy, the proxy goes well, but turns to be unuseful when I use scrapy_splash.

So is there any way to set a proxy for the request of the scrapy_splash?

HELP ME,PLZ,THANK YOU

modified 4 hours later:

I have set the related settings in the setting.py and written this in the middlewares.py. As I mentioned before, this only works for scrapy but not scrapy_splash:

class RandomIpProxyMiddleware(object):
    def __init__(self, ip=''):
        self.ip = ip
        ip_get()
        with open('carhome\\ip.json', 'r') as f:
            self.IPPool = json.loads(f.read())

    def process_request(self, request, spider):
        thisip = random.choice(self.IPPool)
        request.meta['proxy'] = "http://{}".format(thisip['ipaddr'])

And here is the code in the spider with scrapy_splash:

    yield scrapy_splash.SplashRequest(
            item, callback=self.parse, args={'wait': 0.5})

Here is the code in the spider without this pluguin:

    yield scrapy.Request(item, callback=self.parse)
阅读 2k
1 个回答

If you want to build your private proxy pool, you can try this solution which can makes your android device or your home pc as an proxy server: https://github.com/xapanyun/p...

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进