Introduction
For the same piece of logic, the performance of the code implemented by different people will vary by orders of magnitude; for the same piece of code, you may fine-tune a few characters or the order of a line of code, and the performance will be improved several times; for the same piece of code, it may also be There will also be several times the performance difference running on different processors; ten times programmers are not just legends, but probably all around us. Ten times is reflected in the programmer's method, and the code performance is the most intuitive aspect.
This article is the second in the "How to Write High-Performance Code" series. This article will tell you how to take advantage of several features of data to improve code performance.
reusability
Most of the data we use in the code can be reused. This kind of data that can be reused should not be acquired or initialized repeatedly. For example:
In the above figure, the getSomeThing() function is called in the for loop, and this function has nothing to do with the loop. It can be placed outside the loop, and the result can be reused. The above code is called 99 times in vain when placed in the loop. , if getSomeThing() is a very time-consuming or CPU-consuming function, the performance will be checked nearly a hundred times.
In Java code, we often use enumeration classes. Most of the enumeration classes may often have interfaces to obtain all enumeration information. Most people may write code like getList() above. However, although there is no problem with this way of writing, a new List will be generated every time it is called. If the calling frequency is high, it will have a significant impact on performance. The correct approach is to statically initialize an immutable list, and then reuse it directly.
Reminder: Here I specially marked an immutable one. In the case of object reuse, you need to pay extra attention to whether there is a place to change the content of the object. If the object needs to be changed, it cannot be reused. You can change it after deepcopying. Of course, if the object is inherently changeable, there is no need to reuse it.
unnecessary
Non-essential means that some data may not need to be initialized. Take a simple example:
In the above code, after the sth object is obtained, the validity of the parameters is verified. In fact, if the parameters are illegal, sth does not need to be initialized, and sth is unnecessary here. Code similar to the above is actually very common. I have encountered it many times in our company code base. The basic model is to obtain some data first, but then some filtering or checking logic causes the code to jump out, and then these data It's totally useless.
One solution to deal with the unnecessary is lazy initialization , which is also called lazy loading or lazy loading in some places. For example, in the above code, you only need to move getSomeThing() after the parameter validation to avoid this performance problem. Like the checkstyle plugin we are using in Java, it provides a rule VariableDeclarationUsageDistance
. The function of this rule forces the declaration and use of the code to not be separated by too many lines, thus avoiding the above declaration but Performance issues caused by unused.
In fact, lazy initialization is a very common mechanism. For example, the famous copy on write is actually a model of lazy initialization. In addition, many collections in Jdk are basically lazy initialized. Take HashMap as an example. When you execute new HashMap(), you just create an empty shell object. Only when the put() method is called for the first time, the entire map is created. will be initialized.
// new HashMap()只是初始化出来一个空壳hashmap
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// 第一次put触发内部真正的初始化
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
// 省略其它代码
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
locality
Locality is also an old-fashioned feature. There are many kinds of locality, data locality, spatial locality, temporal locality... It can be said that it is because of the existence of locality that the world can run more efficiently. For more information on locality, you can refer to an article I wrote earlier: Principles of Locality - The cornerstone of various optimizations .
Let’s talk about data locality first. In most cases, only a small amount of data is frequently accessed, commonly known as hot data. The easiest way to deal with hot data is to add cache and shards to it, and the specific solution depends on the specific problem. Let me give an example that is very common in Internet companies. A lot of business data is stored in the database. However, the database is a bit powerless in the face of a large number of requests. Because of the existence of locality, only a small amount of data is frequently accessed. , we can cache this part of data in Redis, thereby reducing the pressure on the database.
In addition, let's talk about a point that is easy to ignore, code locality . Only a small amount of code in the system is repeatedly executed, and if the system has performance problems, it is also caused by a small amount of code, so as long as this part of the code is found and optimized, the system performance can be significantly improved. This part of the code can be easily found by relying on some performance analysis tools, such as arthas flame graph (other tools will be introduced in the fifth article of this series).
Read more and write less
In addition to locality, data has another very significant feature, that is, more reads and fewer writes. This is also in line with everyone's intuition and habits. For example, most people read articles instead of writing articles. You will read more on the website and change less. This is a rule that is almost universal. law. So what does this feature mean for us to write code? This feature means that there is a high probability that your code locality occurs in the code that reads the data, so pay extra attention to this part of the code.
Of course, it does not mean that writing data is not important. Here we have to mention another feature of more reading and less writing, that is, the cost of writing is much higher than the cost of reading, and the importance of writing is also much higher than that of reading. . The importance is self-evident. Going to the bank only if you can't see the balance is acceptable, but if you can't withdraw money, it's definitely not acceptable. So why is the cost of writing data much higher than the cost of reading data? It is simple to understand that due to the blessing of data locality, many reads can be optimized by various means, while writing is not a big deal, and writing may have many additional side effects, which need to be added with logic such as a lot of verification to avoid Unnecessary side effects.
The above is the whole content of this article, I hope everyone can gain something.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。