1

问题

你有一个字典或者实例的序列,然后你想根据某个特定的字段来分组迭代访问。

解决方案

使用itertools.groupby() 函数

假设有下列的字典列表:

rows = [
    {'address': '5412 N CLARK', 'date': '07/01/2012'},
    {'address': '5148 N CLARK', 'date': '07/04/2012'},
    {'address': '5800 E 58TH', 'date': '07/02/2012'},
    {'address': '2122 N CLARK', 'date': '07/03/2012'},
    {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
    {'address': '1060 W ADDISON', 'date': '07/02/2012'},
    {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
    {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

现在按照date字段来分组访问,就可以使用itertools.groupby()

from itertools import groupby
from operator import itemgetter
rows.sort(key=itemgetter('date'))
for key, group in groupby(rows, key=itemgetter('date')):
    print(key)
    for item in group:
        print(4 * ' ', item)

输出

07/01/2012
     {'address': '5412 N CLARK', 'date': '07/01/2012'}
     {'address': '4801 N BROADWAY', 'date': '07/01/2012'}
07/02/2012
     {'address': '5800 E 58TH', 'date': '07/02/2012'}
     {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}
     {'address': '1060 W ADDISON', 'date': '07/02/2012'}
07/03/2012
     {'address': '2122 N CLARK', 'date': '07/03/2012'}
07/04/2012
     {'address': '5148 N CLARK', 'date': '07/04/2012'}
     {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}

讨论

itertools.groupby迭代器和一个可选的key参数,按key来分组的,如果key是None的话,则按元素分组

itertools.groupby返回每个不同的key和一个迭代器对象,这个迭代器对象就是key对应的一组元素

并且itertools.groupby要求分组之前,迭代器的所有元素必须是有序的,原因跟itertools.groupby的实现有关

itertools.groupby()大致实现:

class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey))
    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)

来源

Python Cookbook

关注

欢迎关注我的微信公众号:python每日一练


python每日一练
351 声望754 粉丝