头图
The list in is an ordered set. It has no requirements on the type of elements in it. Almost everything can be placed in a list. Four common operations of list are discussed here:
  1. How to remove elements from the list;
  2. Is the index and slice of the list a deep copy or a shallow copy;
  3. The intersection, union, difference, and symmetric difference of two lists;
  4. The sorting method of the list.

1. Delete an element in the list

To delete an element in the list, you can use del and remove . del is to delete elements according to subscripts, and remove is to delete elements according to values. Only the first element that matches a certain value can be deleted.

list1 = ['a','b','hello','world','hello'] # 按照下标删除元素
del list1[2]
print(list1)
['a', 'b', 'world', 'hello']

list1 = ['a','b','hello','world','hello'] 
list1.remove('hello') # 按照值删除元素,只能删除匹配某值的第一个元素
print(list1)
['a', 'b', 'world', 'hello']

If you want to delete all the "hello" in the list, you cannot use the for loop as shown below, otherwise, if there are continuous "hello" in the list, it will not be deleted.

list1 = ['a','b','hello','world','c','hello','hello','d']
for i in list1:
    if i == 'hello':
        list1.remove(i)
print(list1)

['a', 'b', 'world', 'c', 'hello', 'd']

From the result, in the for loop, when remove deletes a matched element, i has already pointed to the next element (this is the same as the iterator of vector in C language), so if you encounter two consecutive "hello", i skipped the second "hello". The verification is as follows, the value of i has changed after after remove:

list1 = ['a','b','hello','world','c','hello','hello','d']
for i in list1:
    print('current item : '+i)
    if i == 'hello':
        list1.remove(i)
        print('after remove : '+i)
print(list1)

current item : a
current item : b
current item : hello
after remove : hello
current item : c
current item : hello
after remove : hello
current item : d
['a', 'b', 'world', 'c', 'hello', 'd']

If you need to delete all the matching elements in the list, you can make a deep copy of the for traversal, while the original list is used to delete elements. An example is as follows:

from copy import deepcopy
list1 = ['a','b','hello','world','c','hello','hello','d']
list2 = deepcopy(list1)
for i in list2:
    print('current item : '+i)
    if i == 'hello':
        list1.remove(i)
        print('after remove : '+i)
print(list1)

current item : a
current item : b
current item : hello
after remove : hello
current item : world
current item : c
current item : hello
after remove : hello
current item : hello
after remove : hello
current item : d
['a', 'b', 'world', 'c', 'd']

2. Is the slice and index of the list a deep copy or a shallow copy?

The above example mentioned the deep copy of the list, so is the slice and index of the list a deep copy or a shallow copy?
First, let's look at the basic usage of list index and slice: the index of list in Python can be a negative number, the negative number is the reverse order, and the reverse order starts from -1.

# 下标索引和切片
list1 = ['a','b','c','d']
print(list1[0])
print(list1[1:3])
print(list1[-4:-3])

a
['b', 'c']
['a']

Is the index or slice of the list a deep copy or a shallow copy? Here we need to use a id method, which can give the memory address of the object.
Copying a list in Python actually copies a reference to the list, and the original object and the new object will point to the same memory address. changes the value of an element in one of the list objects, and the other will also be changed. is shown below, list1 and list2 actually point to the same memory address, so once the value of the element in list2 is changed, list1 is also changed.

list1 = ['a','b','c']
list2 = list1
print(id(list1))
print(id(list2))
list2[1] = 'd'
print(list1)

140356459153200
140356459153200
['a', 'd', 'c']

If you want list1 and list2 to be copy the original object 16135763552f12, so that the memory address of list2 is indeed different. Change the value of the element in list2, and list1 will not change.

# 使用切片的方法复制
list1 = ['a','b','c']
list2 = list1[:]
print(id(list1))
print(id(list2))
list2[1] = 'd'
print(list1)

140356987974432
140356459153040
['a', 'b', 'c']

But is everything worry-free like this? If the object in the list is a complex structure, such as a list, is there any problem with using slice copy?

list1 = [['a','b'],['c','d']]
list2 = list1[:]
print(id(list1))
print(id(list2))
list2[1][1] = 'x'
print(list1)

140356987975872
140356458496720
[['a', 'b'], ['c', 'x']]

If you encounter a nested list (two-dimensional array), even if you use the slicing method to copy list2 and modify the elements in list2, list1 will still be changed. Because the elements in the list, such as list1[0], are a list, an object, and a reference. If you look at the memory addresses of the two, you will find that they are actually the same.

list1 = [['a','b'],['c','d']]
list2 = list1[:]
print(id(list1[0]))
print(id(list2[0]))

140356717561408
140356717561408

So, when the list is object, sliced modify elements will change the value of the original list of the insurance approach is to use deep copy .

from copy import deepcopy
list1 = [['a','b'],['c','d']]
list2 = deepcopy(list1)
print('list 的内存地址:')
print(id(list1))
print(id(list2))
print('list[0] 的内存地址:')
print(id(list1[0]))
print(id(list2[0]))
list2[1][1] = 'x'
print(list1)

list 的内存地址:
140356987985824
140356987984384
list[0] 的内存地址:
140356459155120
140356451242944
[['a', 'b'], ['c', 'd']]

3. List intersection, union, difference, symmetric difference set

This is also a relatively common problem. Given two lists, require their intersection, union, difference, and symmetric difference. Here are several methods and compare performance. The two lists are as follows:

list1 = ['hello','world','day','night','world']
list2 = ['day','hello','spring']

The first is seeking intersection , namely to identify both appear in list1 in list2 also an element that appears. Here are three ways to write, the first two are implemented with the help of set (recommended), and the latter is the list traversal method. With the help of the set method, if there are multiple identical elements in the original list, multiple copies will not be retained, and the order of the elements in the list will no longer be retained.

# 交集 
list3 = list(set(list1) & set(list2))
print(list3)

list4 = list(set(list1).intersection(set(list2)))
print(list4)

list5 = [x for x in list1 if x in list2]
print(list5)

['hello', 'day']
['hello', 'day']
['hello', 'day']

Find the , that is, the elements that appear in list1 or list2.

# 并集 
list3 = list(set(list1) | set(list2))
print(list3)

list4 = list(set(list1).union(set(list2)))
print(list4)

list5 = list(set(list1 + list2))
print(list5)

['night', 'day', 'spring', 'hello', 'world']
['night', 'day', 'spring', 'hello', 'world']
['day', 'spring', 'night', 'hello', 'world']

Find the difference set list, that is, the elements that appear in list1 but are not in list2

# 差集
list3 = list(set(list1).difference(set(list2))) 
print(list3)

list4 = list(set(list1)-(set(list2))) 
print(list4)

# 不求唯一 保持顺序
list5 = [x for x in list1 if x not in list2]
print(list5)

['night', 'world']
['night', 'world']
['world', 'night', 'world']

Find the symmetric difference set list, which only belongs to the elements of list1 and only belongs to the elements of list2

# 对称差集
list3 = list(set(list1).symmetric_difference(set(list2))) 
print(list3)

list4 = list(set(list1)^(set(list2))) 
print(list4)

# 不求唯一 保持顺序
list5 = [x for x in list1 if x not in list2] + [x for x in list2 if x not in list1]
print(list5)

['night', 'world', 'spring']
['night', 'world', 'spring']
['world', 'night', 'world', 'spring']

In terms of performance, because the set has a hash table inside, it is much higher than just processing with a list. The performance of the two writing methods of set is not much different. Here is a small experiment. Both list1 and list2 are lists with 100,000 numbers. Use different methods to solve their intersections, and use time to time them. In this order of magnitude, the performance of only the list method is slower, so if the result is not required to retain all elements and maintain the original order, borrowing set is the more recommended method.

import random

list1 = []
list2 = []
for i in range(100000):
    n = random.randint(0, 100000)
    list1.append(n)
    m = random.randint(5000, 105000)
    list2.append(m)

%%time
# 交集1
list3 = list(set(list1) & set(list2))
CPU times: user 26.4 ms, sys: 1.86 ms, total: 28.2 ms
Wall time: 27.6 ms

%%time
# 交集2
list4 = list(set(list1).intersection(set(list2)))
CPU times: user 33.5 ms, sys: 1.17 ms, total: 34.7 ms
Wall time: 34 ms

%%time
# 交集3
list5 = [x for x in list1 if x in list2]
CPU times: user 2min 20s, sys: 243 ms, total: 2min 20s
Wall time: 2min 20s

4. List sorting operation

The sorting method of the list can use the built-in sort and sorted, sorted has a return value, and returns the sorted list; sort changes the order of the list itself, and has no return value.
sorted method

list1 = [5, 2, 3, 1, 4]
list2 = sorted(list1)
print(list2)

[1, 2, 3, 4, 5]

sort method

list1 = [5, 2, 3, 1, 4]
list1.sort()
print(list1)

[1, 2, 3, 4, 5]

You can also key parameter when sorting. This function will be called before each element is compared, so the list of complex objects can be sorted by specifying the key. For example, sort by a certain component, sort by the length of a certain component, and so on.

list1 = [[1,'c','hello'],[2,'a','morning'],[3,'a','cat']]
# 按元素中的某一分量排序
list1.sort(key=lambda x:x[1])
print(list1)
[[2, 'a', 'morning'], [3, 'a', 'cat'], [1, 'c', 'hello']]

# 按元素的某一个分量的函数值排序
list1.sort(key=lambda x:len(x[2]))
print(list1)
[[3, 'a', 'cat'], [1, 'c', 'hello'], [2, 'a', 'morning']]

Note: The sorting result is stable. When the key is the same, the element that appears first in the list is also first in the sorting result.

If the elements in the list are objects of a certain class, you can also get the attributes of the elements or objects itemgetter and attrgetter An example is as follows. If the elements of the list can be indexed by a subscript (such as a nested list), you can use itemgetter to get the weight.

from operator import itemgetter
list1 = [[1,'c','hello'],[2,'a','morning'],[3,'b','cat']]
# 对可以使用下标索引的 如按第1个分量排序
list1.sort(key=itemgetter(1))
print(list1)

[[1, 'a', 'morning'], [3, 'b', 'cat'], [1, 'c', 'hello']]

If the list is a complex class object, you can use attrgetter to get the value of the attribute according to the attribute name, and sort by this. For example, first create a list of Person objects:

class Person:
    def __init__(self, name, age, work):
        self.name = name
        self.age = age
        self.work = work
    def __repr__(self):
        return repr((self.name, self.age, self.work))
    
list1 = [Person('赵赵',45,'月亮中学'), Person('李李', 20, '宇宙电子厂'),Person('王王', 35, '宇宙电子厂')]    

Then sort according to the age attribute of Person

from operator import attrgetter
# 对对象的某个属性排序
list2 = [Person('赵赵',45,'月亮中学'), Person('李李', 20, '宇宙电子厂'),Person('王王', 35, '宇宙电子厂')]   
list2.sort(key=attrgetter('age'))
print(list2)

[('李李', 20, '宇宙电子厂'), ('王王', 35, '宇宙电子厂'), ('赵赵', 45, '月亮中学')]

The more convenient point of itemgetter and attrgetter is that they support multi-level sorting , that is, you can pass in multiple keys, sort by the first key, and then sort by the second key if the first key is the same.

# 先按第0个元素排序,再按第1个元素排序
list1 = [[1,'c','hello'],[1, 'a','morning'],[3, 'b','cat']]
list1.sort(key=itemgetter(0,1))
print(list1)
[[1, 'a', 'morning'], [1, 'c', 'hello'], [3, 'b', 'cat']]

# 先按work排序,再按age排序
list2 = [Person('赵赵',45,'月亮中学'), Person('李李', 20, '宇宙电子厂'),Person('王王', 35, '宇宙电子厂')]   
list2.sort(key=attrgetter('work','age'))
print(list2)
[('李李', 20, '宇宙电子厂'), ('王王', 35, '宇宙电子厂'), ('赵赵', 45, '月亮中学')]

summary

This article discusses four common operations of list: 1. How to safely delete elements from the list; 2. When the list is a complex structure object, the slice and index are not deep copies; 3. Use set to solve the intersection and union of two lists , Difference, Symmetric Difference Set; 4. Multiple sorting methods of list.

My python version

>>> import sys
>>> print(sys.version)
3.7.6 (default, Jan  8 2020, 13:42:34) 
[Clang 4.0.1 (tags/RELEASE_401/final)]

_流浪猫猫_
144 声望16 粉丝

个人订阅号Python拾贝,不定期更新