最近在学习scrapy爬虫的用管道持久化存储时,遇到了这个问题,只知道这个创建的fp一直为none
import scrapy
import sys
sys.path.append(r'D:\project_test\PyDemo\demo1\xunlian\mySpider\qiubai')
from ..items import QiubaiItem
class BiedouSpider(scrapy.Spider):
name = "biedou"
#allowed_domains = ["www.xxx.com"]
start_urls = ["https://www.biedoul.com/wenzi/"]
def parse(self, response):
#pass
dl_list = response.xpath('/html/body/div[4]/div[1]/div[1]/dl')
for dl in dl_list:
title = dl.xpath('./span/dd/a/strong/text()')[0].extract()
content = dl.xpath('./dd//text()').extract()
content = ''.join(content)
#完成数据解析
item = QiubaiItem()
item['title'] = title
item['content'] = content
yield item #将item提交给管道
break
接下来分别是item.py的
import scrapy
class QiubaiItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
content = scrapy.Field()
这个是pipelines.py:
class QiubaiPipeline(object):
def __init__(self):
self.fp = None
def open_spdier(self,spider):#重写父类的文件打开方法
print("开始爬虫")
self.fp = open('./biedou.txt','w',encoding='utf-8')
def close_spider(self,spider):
print("结束爬虫")
self.fp.close()
def process_item(self, item, spider):
title = str(item['title'])
content = str(item['content'])
self.fp.write(title+':'+content+'\n')
return item
以下是我的报错:
PS D:\project_test\PyDemo\demo1\xunlian\mySpider\qiubai> py -m scrapy crawl biedou
2024-04-11 10:36:12 [scrapy.core.scraper] ERROR: Error processing {'content': '笑点不同怎么做朋友。。', 'title': '笑点太低了吧'}
Traceback (most recent call last):
File "C:\Users\空条承太郎\AppData\Roaming\Python\Python312\site-packages\twisted\internet\defer.py", line 1078, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "C:\Users\空条承太郎\AppData\Roaming\Python\Python312\site-packages\scrapy\utils\defer.py", line 340, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
TypeError: Object of type QiubaiItem is not JSON serializable
结束爬虫
2024-04-11 10:36:12 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
File "C:\Users\空条承太郎\AppData\Roaming\Python\Python312\site-packages\twisted\internet\defer.py", line 1078, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "D:\project_test\PyDemo\demo1\xunlian\mySpider\qiubai\qiubai\pipelines.py", line 24, in close_spider
self.fp.close()
AttributeError: 'NoneType' object has no attribute 'close'
真的找了很久的问题了,像是那种重写父类方法的问题我也比对过感觉自己重写的方式是正确的,还有就是setting文件中pipelines也有手动打开,但是始终不知道自己创建的这个fp为什么是None,一直无法写入,连txt文件都没法创建,希望有大佬能帮我解决,十分感激!
方法名拼错
open_spdier
->open_spider