用scrpay写爬虫，同样的代码-o保存成CSV时一切正常，但同时写入Mysql数据库时很多数据重复且不齐全

用scrpay写爬虫用大众点评练手，同样的代码-o保存成CSV时一切正常，但同时写入Mysql数据库时很多数据重复且不齐全。

图片描述

csv文件一切正常。

图片描述

Mysql里面的数据库一塌糊涂。

同一段代码，完全没有头绪，请高手帮忙。

class MySQLStorePipeline(object):
    """docstring for MySQLstor"""
    def __init__(self):
 
        self.dbpool = adbapi.ConnectionPool('MySQLdb',
            host = 'localhost',
            db = 'dianping',
            user = 'root',
            passwd = 'root',
            cursorclass = MySQLdb.cursors.DictCursor,
            charset = 'utf8',
            use_unicode = True
        )
    def process_item(self, item, spider):
        #print spider
        # run db query in thread pool
        query = self.dbpool.runInteraction(self._conditional_insert, item)
        query.addErrback(self.handle_error)

        return item

    def _conditional_insert(self, tx, item):
        if item.get('user_id'):
         
            tx.execute(\
                "insert into testtable_gz (city, store_name, store_id, book, group_buy, branch, average_spend, style, store_area, store_addr, store_url, comment_url, store_phone, user_id, star, taste, environment, service, comment, comment_date, user_url)\
                values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",
 
                (item['city'],
                 item['store_name'],
                 item['store_id'],
                 item['book'],
                 item['group_buy'],
                 item['branch'],
                 item['average_spend'],
                 item['style'],
                 item['store_area'],
                 item['store_addr'],
                 item['store_url'],
                 item['comment_url'],
                 item['store_phone'],
                 item['user_id'], 
                 item['star'], 
                 item['taste'],
                 item['environment'],
                 item['service'],
                 item['comment'],
                 item['comment_date'],
                 item['user_url']
                ))
 
    def handle_error(self, e):
        log.err(e)

这是Pinelines.py代码。

阅读 4.6k

用scrpay写爬虫，同样的代码-o保存成CSV时一切正常，但同时写入Mysql数据库时很多数据重复且不齐全

你尚未登录，登录后可以

Qt中布局是否只有5种呢？

我们知道MySQL字段可以存放纯文本，但是富文本有图片，有标题样式等情况，一般是怎么进行存储的呢？

这段代码为什么不能获取到数据？

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

如何在SpringBoot/MySQL事务中并行执行多条SQL？

请问一下，如何理解reduce函数呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？