How to write high-performance code (1) Make good use of algorithms and data structures

For the same piece of logic, the performance of the code implemented by different people will vary by orders of magnitude; for the same piece of code, you may fine-tune a few characters or the order of a line of code, and the performance will be improved several times; for the same piece of code, it may also be There are also several times the performance difference running on different processors; ten times programmers are not just legends, but probably all around us. Ten times is reflected in the programmer's method, and the code performance is the most intuitive aspect.

The "How to write high-performance code" series originated from a sharing I did in the group. This series will try to help everyone write higher-performance code based on my previous experience. The original ppt shared a wide and shallow surface, so here the original ppt is divided into 5 independent parts, which are written separately, and also serve as an extension and supplement to the original ppt. This article is the first one - make good use of algorithms and data structure.

Xunzi said in the exhortation: "A gentleman is not different, and he is good at things. The general idea is that the aptitude of a gentleman is no different from that of ordinary people, but he is only good at using external things. For programmers, data structures are probably the most commonly used in our daily coding process. The development libraries of modern languages basically encapsulate various data structures, and we basically do not need to implement them ourselves. But the wrong use of data structures can lead to a significant drop in code performance.

Here I give three examples of performance loss in Java due to not considering the underlying implementation.
在这里插入图片描述

The above code itself does not have any functional problems, but the ArrayList in Java will trigger expansion when the capacity is insufficient during the adding process, and the expansion process will consume additional CPU resources. But after I specified the initialization capacity of ArrayList as 100 in the above code, I found a 33% performance improvement with JMH pressure test.

In Java, many containers have the feature of dynamic expansion, and the process of expansion involves the copying of memory, which consumes a lot of performance. Therefore, it is recommended that if the size of the data can be predicted, an initial capacity is given when the container is initialized. This point is also clearly put forward in the coding specifications of many companies now. The following figure is from the Alibaba Java development manual.
阿里巴巴Java开发手册

Let's look at a performance problem caused by the wrong use of LinkedList.
在这里插入图片描述

     // jdk LinkedList中的get(int index）
    public E get(int index) {
        checkElementIndex(index);
        return node(index).item;
    }
    Node<E> node(int index) {
        // assert isElementIndex(index);

        if (index < (size >> 1)) {
            Node<E> x = first;
            // 这里会从前到后遍历链表
            for (int i = 0; i < index; i++)
                x = x.next;
            return x;
        } else {
            Node<E> x = last;
            for (int i = size - 1; i > index; i--)
                x = x.prev;
            return x;
        }
    }

LinkedList is not affected by dynamic expansion, but its underlying implementation uses linked list, and the biggest problem of linked list is that it does not support random traversal, so the underlying implementation of get(int index) in LinkedList uses traversal, and the time complexity is O(n), and the underlying implementation of ArrayList is an array, and its get time complexity is O(1). In the above code, after I changed LinkedList to ArrayList, the pressure test did get a performance improvement of more than ten times.
在这里插入图片描述
In Java, both Set and List provide the contains() method, whose role is to verify whether a certain item exists in the set, but its contains implementation method is completely different. In HashSet, contains is directly searched from the hash table, and its time complexity is only O(1). In ArrayList and LinkedList, it is necessary to traverse the full amount of data once to get the result, the time complexity is O(n), the code will not be repeated here, you can check it yourself.
In my actual test, the performance difference between Set and List contains is indeed very obvious. I tested with JMH and found that when there are 100 elements, the performance of HashSet.contains is 10 times that of ArrayList and 20 times that of LinkedList, and the difference will be more obvious when the amount of data is larger.

The examples of the above three errors are often encountered in our daily code. Maybe you can read the project code now, and it is easy to find the improper use of List and Set. Maybe you will argue that the performance of these APIs in the JDK is extremely high, and this part is only a very, very small part of the business logic. If used incorrectly, it may only lead to a difference of one percent or even one thousandth of the whole, but Without accumulating small steps, you cannot reach a thousand miles, and without accumulating small streams, you cannot make a river.

The following figure shows the time and space complexity of various operations of various common data structures for your reference:
在这里插入图片描述
Algorithms and data structures are the foundation of a programmer. Although we rarely implement a specific algorithm or data structure ourselves, we use various encapsulated algorithms or data structures all the time. We should do Familiar with various algorithms and data structures, including their time complexity, space complexity, and scope of application.

How to write high-performance code (1) Make good use of algorithms and data structures

How to write high-performance code series of articles

xindoo

引用和评论

一文了解知识库背后的技术RAG

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性