Python 处理文件的性能优化

Question

Python 处理文件的性能优化

发布于
2014-08-03

现在有一个list包含有1500个topic，另外一个文件包含一亿个微博数据，现在我想统计，1500个topic中每个topic分别有多少条微博包含它们，我写的代码如下，但是运行起来需要非常久的时间，有什么办法可以优化吗？

    f = file("largefile")
    for line in f:
        try:
            tweet_time = line.split(',',3)[2].split()[0]  # 微博发布时间
            tweet = line.split(',',3)[-1]  # 微博内容
            for topic in topics:
                topic_items = topic.split()  # 每个topic可能有多个词组成
                isContain = True
                for item in topic_items:
                    if item not in tweet:
                        isContain = False
                        break
                if isContain:
                    pass   # 该微博包含该topic
        except:
            continue
    f.close()

编程

python

阅读 4.6k

1 个回答

得票最新

万能Lambda

95612

发布于
2014-08-03