基本概念
优先队列(priority queue)是一种特殊的队列,取出元素的顺序是按照元素的优先权(关键字)大小,而不是进入队列的顺序,堆就是一种优先队列的实现。堆一般是由数组实现的,逻辑上堆可以被看做一个完全二叉树(除底层元素外是完全充满的,且底层元素是从左到右排列的)。
堆分为最大堆和最小堆,最大堆是指每个根结点的值大于左右孩子的节点值,最小堆则是根结点的值小于左右孩子的值。
实现
Python中堆的是以小根堆的方式实现,本文主要介绍Python源码中入堆和出堆方法的实现:
"""
0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
"""
# push 将newitem放入底部,从底部开始把父结点往下替换
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap)-1)
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
# pop 顶部元素被弹出,把最后一个元素放到顶部,在把子节点(小于自己的)往上替换
def heappop(heap):
"""Pop the smallest item off the heap, maintaining the heap invariant."""
lastelt = heap.pop() # pop tail in list
# raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
_siftup(heap, 0)
return returnitem
return lastelt
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
对于_siftup方法实现的解释
可以看到python中_siftup方法的实现是和教科书以及直觉不太吻合的:
- siftup只关心哪个子节点小,就把它往上移(并不关心子节点是否大于自己)
- 最后当节点到达叶子节点后在通过siftDown把大于自己的父结点往下移
如下是示意图,按理来说每次_siftup的时候直接和子节点比较似乎更高效。
"""
3
4 6
9 15 [8]
[8]
4 6
9 15
4
[8] 6
9 15
4
9 6
[8] 15
_siftdown
4
[8] 6
9 15
"""
接下来,先贴上源码中的解释
# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too. We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way. During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early). See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples. Comparisons are thus potentially
# expensive.
从这里的描述,我们可以这么理解,由于位于 pos 的 ball 是从 end 处移动到此的,它的所有 child indices 都是 heap,通过 _siftup bubbling smaller child up till hitting a leaf, then use _siftdown to move the ball to place. 这个过程会发现和教科书中讲的不一样,一般情况下 _siftup 过程中需要比较当前 ball 与 both childs,这样可以尽快从 loop 中 break,但是事实上,ball 的值一般来说会很大(因为在heap中被排在的end),那么把它从 root 处开始和 child 开始比较通常没有意义,也就是由于可能会一直 compare 直到比较底部甚至可能到叶子处,这样来看通常并不会让我们 break loop early(作者提到在The Art of Computer Programming volume3 中有具体的阐述和比较)。此外,由于comparison methods实现方法的不同(比如在tuples,或者自定义类中),comparisons 有可能 more expensive,所以python在对heap的实现上是采用 siftup till leaf then siftdown to right place 的方法。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。