如何最高效的实现这样的python算法

我有一个list1,数据如下:
["a","b","c","e"]

一个list2,数据如下
["b","c","f"]

我的目标是最快效率的找出list2中有,但是list1中没有的数据,简单来说,就是数据过滤吧...

但是怎么做效率最高??
我用for,感觉效率很低:

for i in list2:
    if i not in list1:
        print (i)

当然也可以用列表解析~~,但是感觉效率都是非常低!有没有一种高效的方式呢?

阅读 3.5k
4 个回答

假如希望代码看起来简单,那么就很方便实现:

[k for k in list2 if k not in list1]

如果希望提高效率,那么假设这时候list1 是个非常长的list;由于需要遍历list2,这部分时间是不可避免的,那么问题就转变为怎么在list1中寻找元素让时间消耗最短。我做了下面的例子:

# -*- coding: utf-8 -*-
import time

list1 = range(0, 10000)
list2 = [1, 20, 40, 60, -10, -20]
LIMIT = 10000  # 循环次数来验证时间

def method1():
    then = time.time()
    for i in range(0, LIMIT):  # 循环
        [k for k in list2 if k not in list1]
    print 'cost time:', time.time() - then
def method2():
    then = time.time()
    data = dict([(k, 1) for k in list1])
    for i in range(0, LIMIT):
        [k for k in list2 if not data.get(k)]
    print 'cost time:', time.time() - then

method1()
method2()

结果输出时间,不出意外,第二个好快些。因为采用了dict的方式,也就是hash的路由匹配方式,内部实现机制估计是二叉树,比遍历list1来寻找快多了。
cost time: 4.26041412354 # method1
cost time: 0.0187389850616 # method2

list(set(list2)-set(list1))
a = ["a","b","c","e"]
b = ["b","c","f"]
li = [ item for item in b if item not in a]

1楼,先转换为set的方法,会改变数据的顺序,而且还会去重:

>>> l1 = ["a","b","c","e"]
>>> l2 = ["b","c","f","d","f"]
>>> list(set(l2) - set(l1))
['d', 'f']

2楼,列表解析的方法,会保持数据顺序,也不会去重:

>>> l1 = ["a","b","c","e"]
>>> l2 = ["b","c","f","d","f"]
>>> result = [i for i in l2 if i not in l1]
>>> result
['f', 'd', 'f']
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题