如何控制Scrapy yield的顺序

我想从一个json接口爬点数据 http://china.fathom.info/data/data.json,比如看这个代码,如果yield放在parse里,我想当然的认为应该是for循环完yield Request再yield group_item,然而这样parse_member里的东西就不会存进group_item了,但如果yield group_item放在parse_member里,数据又会变成
{'A':1, 'members':[{'id':11}]}, {'A':1, 'members':[{'id':22}]}
而不是
{'A':1, 'members':[{'id':11}, {'id':22}]}

请问该如何解决这个问题。如果用urllib2顺序写一下比较好理解,但如果非得用scrapy呢?感觉自己对scrapy的request的请求顺序理解得很混乱。

start_urls = [
    "http://china.fathom.info/data/data.json"
]

def parse(self, response):
    groups = json.loads(response.body)['group_members']
    for i in groups:
        group_item = GroupItem()
        group_item['name'] = groups[i]['name']
        group_item['chinese'] = groups[i]['chinese']
        group_item['members'] = []

        members = groups[i]['members']
        for member in members:
            yield Request(self.person_url % member['id'], meta={'group_item': group_item, 'member': member},
                          callback=self.parse_member, priority=100)
        yield group_item

def parse_member(self, response):
    group_item = response.meta['group_item']
    member = response.meta['member']
    person = json.loads(response.body)
    ego = person['ego']
    group_item['members'].append({
        'id': ego['id'],
        'name': ego['name'],
        'chinese': ego['chinese'],
        'role': member['role']
    })
阅读 7.5k
1 个回答

楼主解决了吗????

目测不好解决

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题