How to write high-performance code (2) Skillfully use data features

Introduction

For the same piece of logic, the performance of the code implemented by different people will vary by orders of magnitude; for the same piece of code, you may fine-tune a few characters or the order of a line of code, and the performance will be improved several times; for the same piece of code, it may also be There will also be several times the performance difference running on different processors; ten times programmers are not just legends, but probably all around us. Ten times is reflected in the programmer's method, and the code performance is the most intuitive aspect.

This article is the second in the "How to Write High-Performance Code" series. This article will tell you how to take advantage of several features of data to improve code performance.

reusability

Most of the data we use in the code can be reused. This kind of data that can be reused should not be acquired or initialized repeatedly. For example:
在这里插入图片描述
In the above figure, the getSomeThing() function is called in the for loop, and this function has nothing to do with the loop. It can be placed outside the loop, and the result can be reused. The above code is called 99 times in vain when placed in the loop. , if getSomeThing() is a very time-consuming or CPU-consuming function, the performance will be checked nearly a hundred times.
在这里插入图片描述
In Java code, we often use enumeration classes. Most of the enumeration classes may often have interfaces to obtain all enumeration information. Most people may write code like getList() above. However, although there is no problem with this way of writing, a new List will be generated every time it is called. If the calling frequency is high, it will have a significant impact on performance. The correct approach is to statically initialize an immutable list, and then reuse it directly.

Reminder: Here I specially marked an immutable one. In the case of object reuse, you need to pay extra attention to whether there is a place to change the content of the object. If the object needs to be changed, it cannot be reused. You can change it after deepcopying. Of course, if the object is inherently changeable, there is no need to reuse it.

unnecessary

Non-essential means that some data may not need to be initialized. Take a simple example:
在这里插入图片描述
In the above code, after the sth object is obtained, the validity of the parameters is verified. In fact, if the parameters are illegal, sth does not need to be initialized, and sth is unnecessary here. Code similar to the above is actually very common. I have encountered it many times in our company code base. The basic model is to obtain some data first, but then some filtering or checking logic causes the code to jump out, and then these data It's totally useless.
One solution to deal with the unnecessary is lazy initialization , which is also called lazy loading or lazy loading in some places. For example, in the above code, you only need to move getSomeThing() after the parameter validation to avoid this performance problem. Like the checkstyle plugin we are using in Java, it provides a rule VariableDeclarationUsageDistance . The function of this rule forces the declaration and use of the code to not be separated by too many lines, thus avoiding the above declaration but Performance issues caused by unused.
In fact, lazy initialization is a very common mechanism. For example, the famous copy on write is actually a model of lazy initialization. In addition, many collections in Jdk are basically lazy initialized. Take HashMap as an example. When you execute new HashMap(), you just create an empty shell object. Only when the put() method is called for the first time, the entire map is created. will be initialized.

 // new HashMap()只是初始化出来一个空壳hashmap
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                       initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                       loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
    
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 第一次put触发内部真正的初始化
    if ((tab = table) == null || (n = tab.length) == 0) 
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
       // 省略其它代码
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

locality

在这里插入图片描述
Locality is also an old-fashioned feature. There are many kinds of locality, data locality, spatial locality, temporal locality... It can be said that it is because of the existence of locality that the world can run more efficiently. For more information on locality, you can refer to an article I wrote earlier: Principles of Locality - The cornerstone of various optimizations .
Let’s talk about data locality first. In most cases, only a small amount of data is frequently accessed, commonly known as hot data. The easiest way to deal with hot data is to add cache and shards to it, and the specific solution depends on the specific problem. Let me give an example that is very common in Internet companies. A lot of business data is stored in the database. However, the database is a bit powerless in the face of a large number of requests. Because of the existence of locality, only a small amount of data is frequently accessed. , we can cache this part of data in Redis, thereby reducing the pressure on the database.
In addition, let's talk about a point that is easy to ignore, code locality . Only a small amount of code in the system is repeatedly executed, and if the system has performance problems, it is also caused by a small amount of code, so as long as this part of the code is found and optimized, the system performance can be significantly improved. This part of the code can be easily found by relying on some performance analysis tools, such as arthas flame graph (other tools will be introduced in the fifth article of this series).

How to write high-performance code (2) Skillfully use data features

Introduction

reusability

unnecessary

locality

Read more and write less

How to write high-performance code series of articles

xindoo

引用和评论

一文了解知识库背后的技术RAG

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性