java - The fastest sorting algorithm in the world - Timsort - 个人文章

background

Timsort is a hybrid and stable sorting algorithm. In short, it is a mixture of merge sort and binary insertion sorting algorithm. It is known as the best sorting algorithm in the world. Timsort has always been Python's standard sorting algorithm. The Timsort API was added after Java SE 7, and we can see from Arrays.sort that it is already the default sorting algorithm for non-primitive types of arrays . So whether it is advanced programming learning or interview, it is more important to understand Timsort.

 // List sort()    
default void sort(Comparator<? super E> c) {
        Object[] a = this.toArray();
              //数组排序
        Arrays.sort(a, (Comparator) c);
              ...
    }

// Arrays.sort
    public static <T> void sort(T[] a, Comparator<? super T> c) {
        if (c == null) {
            sort(a);
        } else {
              // 废弃版本
            if (LegacyMergeSort.userRequested)
                legacyMergeSort(a, c);
            else
                TimSort.sort(a, 0, a.length, c, null, 0, 0);
        }
    }

    public static void sort(Object[] a) {
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a);
        else
            ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
    }

Pre-knowledge

Understanding Timsort requires reviewing the following knowledge.

Index search

Exponential search, also known as doubling search, is an algorithm created to search for elements in large arrays. It's a two-step process. First, the algorithm tries to find the range in which the target element exists (L，R) , and then uses a binary search within this range to find the exact location of the target. The time complexity is $O(\lg n)$. This search algorithm works well in a large number of sorted arrays.

Binary Insertion Sort

The insertion sort algorithm is very simple. The general process is to start with the second element and move forward in order to swap elements until a suitable position is found.

The optimal time complexity of insertion sorting is also $O(n)$, we can use binary search to reduce the number of element comparisons during insertion, and reduce the time complexity to $\log n$. Note, however, that the binary search insertion sort still needs to move the same number of elements, but the time consumption of copying the array is lower than the one-by-one swap operation.

Features: The main advantage of binary insertion sorting is that the sorting efficiency is very high in small dataset scenarios.

 public static int[] sort(int[] arr) throws Exception {
        // 开始遍历第一个元素后的所有元素
        for (int i = 1; i < arr.length; i++) {
            // 需要插入的元素
            int tmp = arr[i];
            // 从已排序最后一个元素开始，如果当前元素比插入元素大，就往后移动
            int j = i;
            while (j > 0 && tmp < arr[j - 1]) {
                arr[j] = arr[j - 1];
                j--;
            }
            // 将元素插入
            if (j != i) {
                arr[j] = tmp;
            }
        }
        return arr;
    }

    public static int[] binarySort(int[] arr) throws Exception {
        for (int i = 1; i < arr.length; i++) {
            // 需要插入的元素
            int tmp = arr[i];
            // 通过二分查找直接找到插入位置
            int j = Math.abs(Arrays.binarySearch(arr, 0, i, tmp) + 1);
            // 找到插入位置后，通过数组复制向后移动，腾出元素位置
            System.arraycopy(arr, j, arr, j + 1, i - j);
            // 将元素插入
            arr[j] = tmp;
        }
        return arr;
    }

merge sort

Merge sort is an algorithm that uses a divide and conquer strategy and includes two main operations: split and merge . The general process is to recursively divide the array in half until it can no longer be divided (that is, the array is empty or has only one element left), and then merge sort. Simply put, the merge operation is to continuously take two smaller sorted arrays and combine them into a larger array.

Features: Merge sort is a sorting algorithm designed mainly for large data set scenarios.

 public static void mergeSortRecursive(int[] arr, int[] result, int start, int end) {
        // 跳出递归
        if (start >= end) {
            return;
        }
        // 待分割的数组长度
        int len = end - start;
        int mid = (len >> 1) + start;
        int left = start; // 左子数组开始索引
        int right = mid + 1; // 右子数组开始索引
        // 递归切割左子数组，直到只有一个元素
        mergeSortRecursive(arr, result, left, mid);
        // 递归切割右子数组，直到只有一个元素
        mergeSortRecursive(arr, result, right, end);
        int k = start;
        while (left <= mid && right <= end) {
            result[k++] = arr[left] < arr[right] ? arr[left++] : arr[right++];
        }
        while (left <= mid) {
            result[k++] = arr[left++];
        }
        while (right <= end) {
            result[k++] = arr[right++];
        }
        for (k = start; k <= end; k++) {
            arr[k] = result[k];
        }
    }

    public static int[] merge_sort(int[] arr) {
        int len = arr.length;
        int[] result = new int[len];
        mergeSortRecursive(arr, result, 0, len - 1);
        return arr;
    }

Timsort execution process

The general process of the algorithm, if the length of the array is less than the specified threshold ( MIN_MERGE ), directly use the binary insertion algorithm to complete the sorting, otherwise perform the following steps:

Start from the left side of the array and perform an ascending run to get a subsequence .
Put this subsequence on the run stack and wait for the merge to be performed .
Check the run stack for subsequences , and perform the merge if the merge condition is met.
Repeat the first step for the next ascending run .

run in ascending order

Ascending order operation is the process of finding a continuous ascending (ascending) or descending (descending) subsequence from an array. If the subsequence is descending, it will be reversed to ascending order. This process can also be referred to as run . For example, in the array [2,3,6,4,9,30], you can find three subsequences, [2,3,6], [4,9], [30], or three run .

Several key thresholds

MIN_MERGE

This is a constant value, which can be simply understood as the minimum threshold for performing merging. If the length of the entire array is less than it, there is no need to perform such a complicated sorting, and direct binary insertion is enough. In Tim Peter's C implementation it is 64, but in practical experience setting it to 32 works better, so this value is 32 in java.

To ensure stability when reversing descending order, the same elements will not be reversed.

minrun

When merging sequences, if run the number is equal to or slightly less than the power of 2, the merging efficiency is the highest; if it is slightly larger than the power of 2, the efficiency will be significantly reduced. Therefore, in order to improve the merging efficiency, it is necessary to control the length of each run as much as possible, by defining a minrun to represent the minimum length of each run , if the length is too short, use binary insertion sort Insert the elements after run into the front run .

Generally, before executing the sorting algorithm, this minrun will be calculated first (it adjusts itself according to the characteristics of the data), and minrun will select a number from 32 to 64, so the size of the data divided by minrun is equal to or slightly less than a power of 2 power. For example, if the length is 65, the value of minrun is 33; if the length is 165, the value of minrun is 42.

Look at the implementation in Java, if the data length (n) < MIN_MERGE, return the data length. Returns MIN_MERGE/2 if the data length is exactly a power of 2
That is 16, otherwise returns a value k in the range MIN_MERGE/2 <= k <= MIN_MERGE, so that n/k is close to but strictly less than a power of 2.

 private static int minRunLength(int n) {
        assert n >= 0;
        int r = 0;      // 如果低位任何一位是1，就会变成1
        while (n >= MIN_MERGE) {
            r |= (n & 1);
            n >>= 1;
        }
        return n + r;
    }

MIN_GALLOP

MIN_GALLOP is a threshold set to optimize the merging process, the control enters the GALLOP mode, and the GALLOP mode will be discussed later.

The following is the execution flow chart of Timsort

 graph TB
A[开始排序] 
    A --> B{MIN_MERGE?}
    B -->|MIN_MERGE >= 32| C[归并排序]
    B -->|MIN_MERGE < 32| D[二分排序]
    D --> E[升序运行]
    E --> F[执行排序]
    C --> G11[计算minRun]
    G11 --> G1[升序运行]
    G1 --> G2[得到子序列run]
    G1 -->|runLen < minRun| G22[补充run长度]
    G22 -.-> id1
    
    G22 --> G2
    G2 -->|放入|Z

       Z --> |检查run|Z1{合并条件}
       Z1 -->|满足| Z2[执行合并]
       Z2 -->|不满足| G3
       Z1 -->|不满足| G3
    G3[准备下一个升序运行]
    G3 -->|循环|G1
    
    id1[\用二分插入排序把 run 后面的元素插入到前面的 run 里面\]
    Z[(运行堆栈)]

run merge

When the stack run meets the merge condition, it merges the two adjacent runs in the stack.

Merge conditions

In order to perform a balanced merge (to make the size of the merged runs as equal as possible), Timsort has formulated a merge rule. For the three runs at the top of the stack, use X, Y and Z to represent their lengths, where X is at the top of the stack and they are The following two rules must be maintained at all times:

$$ Z > Y + X \\ Y > X $$

Once one of the conditions is not satisfied, combine Y with the smaller of X or Z to generate a new run , and check again whether the top of the stack still satisfies the condition. If it is not satisfied, it will continue to merge until the three elements at the top of the stack meet these two conditions, if there are only two left run , then Y > X can be satisfied.

Example below

When Z <= Y+X, merge X+Y, and only two runs are left.
When Y < X is detected, merge is performed. At this time, only X is left, and merge detection is exited.

Representation_of_stack_for_merge_memory_in_Timsort

Let's take a look at the merge implementation in Java

 private void mergeCollapse() {
      
       // 当存在两个以上run执行合并检查
        while (stackSize > 1) {

          // 表示 Y
            int n = stackSize - 2; 
           
          // Z <= Y + X 
            if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) { 

              // 如果 Z < X 合并Z+Y ，否则合并X+Y
              if (runLen[n - 1] < runLen[n + 1])
                    n--;
                
                // 合并相邻的两个run，也就是runLen[n] 和 runLen[n+1]
                mergeAt(n); 
            } else if (runLen[n] <= runLen[n + 1]) {
                
                // Y <= X 合并 Y+X
                mergeAt(n);
            } else {
                
                // 满足两个条件，跳出循环
                break; 
            }
        }
    }

Combined memory overhead

The original merge sort space complexity is $O(n)$ which is the data size. In order to realize the intermediate item, Timsort performs a merge sort, and the time overhead and space overhead are smaller than $O(n)$.

The optimization is to minimize data movement, take up less temporary memory, first find the elements that need to be moved, then copy the smaller sequence to temporary memory, sort and fill in the combined sequence in the final order.

For example, we need to combine X [1, 2, 3, 6, 10] and Y [4, 5, 7, 9, 12, 14, 17], the largest element in X is 10, we can determine it by binary search, it needs Inserting into the 5th position of Y can guarantee the order, and the smallest element in Y is 4, it needs to be inserted into the 4th position in X to ensure the order, then you know [1, 2, 3] and [12, 14, 17] No need to move, we just need to move [6, 10] and [4, 5, 7, 9], and then we just need to allocate a temporary storage of size 2.

Merge optimization

Merging two arrays in the merge sort algorithm needs to compare each element one by one. In order to optimize the merging process, a threshold MIN_GALLOP is set. When the elements in B are merged into A, if the consecutive MIN_GALLOP elements in A are more than a certain number in B If an element is small, enter GALLOP mode.

According to the benchmark test, for example, it is better to switch to this mode when more than 7 consecutive elements in A are smaller than a certain element in B, so the initial value is 7.

When entering GALLOP mode, the search algorithm becomes an exponential search, which is divided into two steps. For example, if you want to determine the position of element x in A in B

First find a suitable index interval in B$(2^{k − 1}, 2^{k+1} - 1)$ so that the x element is within this range;
Then the corresponding position is found by binary search within the range found in the first step.

Galloping is only beneficial if the initial element of one run is not one of the first seven elements of another run. This means that the initial threshold is 7.

The fastest sorting algorithm in the world - Timsort

background

Pre-knowledge

Index search

Binary Insertion Sort

merge sort

Timsort execution process

run in ascending order

Several key thresholds

run merge

Merge conditions

Combined memory overhead

Merge optimization

编程码农

引用和评论

微信公众号的AI时代：使用扣子机器人连接大模型

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性