scrapy crawlspider 中的deny设置无效？

Question

scrapy crawlspider 中的deny设置无效？

发布于
2019-06-27

Rule(LinkExtractor(allow=rule.get("allow", None), restrict_xpaths=rule.get("restrict_xpaths", ""),deny=('guba','f10','data','fund.*?\.eastmoney\.com/\d+\.html','quote','.*so\.eastmoney.*','life','/gonggao/')),callback=rule.get("callback", ""),follow=rule.get('follow',True))

Rule设置如上，deny拒绝guba、data等链接，但是实际运行中，还是有这些链接（包含了guba的链接）：

2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of166401.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)
2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of164206.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)
2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of161823.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)

请问是哪里设置的不对么

python scrapy

阅读 1.8k

1 个回答

得票最新

看近行远

271868110

发布于
2019-06-27

deny要使用正则，否则必须完全匹配才可以。

撰写回答