This wave of performance optimization is too bursting!

Hello, I am why.

No, this is not me. I am young and handsome than him.

This is the protagonist of today's article.

He is called Brett Wooldridge, you probably don't know him.

But I show you his github screenshot, you must know the open source project he wrote:

see it?

He is the father of the famous HikariCP.

And if you look at his github profile, it feels good to write:

Father of an angel who fell to Earth and somehow into my life.

The father of an angel who fell on the earth, she unknowingly entered my life.

The picture should be his child. He smiled like an old father beside him.

The picture at the beginning of the article is what I found from this report:

https://blog.jooq.org/2017/02/21/jooq-tuesdays-brett-wooldridge-shows-what-it-takes-to-write-the-fastest-java-connection-pool/

The first question asked was this:

You created one of the most popular connection pools in Java, HikariCP. So what makes your library so popular?

In the next section, I will tell you how this old man answered this question from a first-person perspective.

Why write HikariCP?

What, why did you ask me to write HikariCP?

Oh my god, it's not because I didn't find someone to take advantage of it.

When I wrote the code a few years ago, I needed to use a database connection pool, so like most developers, programming for the browser, I found an open source pool on the Internet and used it.

Don't tell me, it looks pretty good.

But when I was doing performance testing on the project, I slowly discovered that this pool was not good enough, and I would always encounter deadlocks and incorrect connection status.

I wonder if this stuff is cheating?

But the connection pool used at the time was open source. In the spirit of open source, I wanted to pull down the code to see if I could help fix it.

When I opened the code, good guy, the amount of code was huge, at least a few thousand lines more than I expected.

No matter how much code you have, you can read it with the forbearance.

The magic is the code logic.

I went to troubleshoot the deadlock problem, but I found that the locks were one set of one set.

Sometimes a lock is acquired in a method, and I just can't find a place to release it.

Finally, I saw the place where the lock was released at a distance of tens of thousands of miles.

I was like this at the time:

Because I know, I have no way to find where the deadlock is lurking in the code.

Even if I solve the current problem, according to the wording of the project, sooner or later I will encounter other problems.

So I made a decisive decision and decided...

Find another one online.

This time I learned well, and after finding a new connection pool, I first looked at its code.

Because I was afraid of being deadlocked, I paid special attention to the part about locks.

The semantics of the newly found connection pool lock is indeed clearer, but the amount of code is still more than twice as much as I expected.

In addition, all the link pools I have studied violate the JDBC contract in various ways.

For example, one of the most common problems I found is this.

When a link is used up and put back into the pool. Some pools did not clean up the messages in this link, such as auto-commit, transaction isolation level, etc., resulting in a "dirty" link when the next consumer gets the link again.

I was thinking:

Really? Is this the status quo of connection pools in the Java ecosystem? No more, I have to do it myself. So out of need and frustration, I created HikariCP.

Back to the first question.

As mentioned above, before I wrote HikariCP, there were actually many mature connection pools. So how did HikariCP become popular?

In my opinion, if I focus on correctness and reliability, it is actually not a good selling point, because I think it is a must-have.

So I focused on promoting performance. Promote it on my various social media.

Sometime in 2015, the Wix engineering team wrote a blog about using HikariCP.

In this wave, I took off directly. HikariCP can be regarded as entering everyone's field of vision.

Finally, I do hope that over time, more users will pay equal attention to correctness and reliability. Without these properties, there is no meaning.

As far as I am concerned, I intend to write more articles about these aspects of HikariCP.

Why is the performance awesome?

As mentioned earlier, the selling point of HikariCP is powerful performance.

So why is its performance so awesome?

In fact, the answer is written on HikariCP's github homepage:

https://github.com/brettwooldridge/HikariCP

Before entering how we do it here , let's briefly talk about the name of this project.

You can see a big Chinese character: light.

Regarding the source of this name, it was actually mentioned in the aforementioned report:

HikariCP, translated as "light", in English, in the context of HikariCP, it is a pun. In this project, "light" not only means fast speed, but also means that the amount of code is small.

Hikari is pronounced Hi-ka-lee.

Please remember this.

I remember that in an interview, an interviewer mentioned this connection pool, but he didn't know how to read it.

He said: It is the connection pool at the end of CP that H opens. I forgot to read it.

But I reacted immediately.

I said: Well, I know which connection pool you are talking about, you continue to talk about it.

In fact, I didn't know how to read it at the time, so it was very embarrassing.

Okay, next, let's take a look at why the performance is so awesome.

The answer, the author wrote in github:

https://github.com/brettwooldridge/HikariCP/wiki/Down-the-Rabbit-Hole

First of all, the title of this article is very interesting:

wath mean is Down the Rabbit Hole?

The literal translation is "in the rabbit hole".

I think it’s not that simple, so I checked it out:

Oh, down the rabbit hole turned out to be a metaphor for venturing into an unknown world. From the famous book "Alice in Wonderland".

Generally, we use down the rabbit hole to describe a situation in which we are caught in an increasingly strange, confusing or unexpected situation, and one thing prompts another thing to happen, one after another, so that the deeper and deeper the sinking, the impossible to get out of the scene.

A little English slang for everyone.

After knowing the meaning of the title, after you have read the article written by the author, and you look at the title of this "rabbit hole" again, you will find that it is really appropriate.

After reading the full text and understanding, I found that there are four reasons why the author wants to express so quickly:

Bytecode level optimization-try to use JIT inline methods
Bytecode level optimization-use instructions that are easier to optimize by the JVM
Code-level optimization-use the modified FastList instead of ArrayList
Code-level optimization-using lock-free ConcurrentBag

Let's watch them one by one.

Bytecode level optimization

At the beginning of the article, the author said: My wave of operations is in bytecode, so I will ask you if you are awesome.

Simply translate the key points:

In order to make HikariCP faster, I optimized it at the bytecode level.
I took out all the techniques I know to use JIT optimization to help you.
I studied the bytecode output of the compiler, and even the assembly output of the JIT, to limit the critical program to be less than the inline-threshold of the JIT.

This place the author mentioned the inline optimization of JIT.

What is inline?

Inlining is actually an action.

Select a called method and copy its content to the called place.

For a simple example, suppose the code is like this:

int result = add(a,b);

private int add(int x,int y){
    return x+y;
}

Then after JIT's inline optimization, the code will become like this:

int result= a + b;

In this way, the overhead of calling the add method is saved.

Inlining, also known as the optimization method, has established a very good foundation for other optimization methods, so in addition to the example written above, there are many more advanced ways of reflection, such as escape analysis and looping. Expansion, lock elimination:

.png)

So what is the cost of a call?

I think it is nothing more than these steps:

First of all, you need to set the parameters that need to be passed in the method call, right?
With parameters, do I have to query which method to call, right?
Then if there are methods like local variables or evaluation, you have to create a new call stack frame and create a new runtime data structure, right?
Finally, it may be necessary to return a result to the caller, right?

Some friends will just say it, as for? Doesn't this cost seem to be large?

Yes, it's really not big, but when a small optimization point is multiplied by a huge amount of calls, the final result is very impressive.

I think everyone understands this truth.

The author also said in the article:

HikariCP contains many micro optimizations, which are almost impossible to measure individually, but combined can improve overall performance.

Even in millions of calls, the level of optimization is measured in milliseconds.

Maybe this is the boss.

I think that the pursuit of the ultimate in performance is nothing more than that.

Next, talk about another bytecode level optimization:

invokevirtual vs invokestatic

I think this wave of optimization is simply in the atmosphere.

The author gave an example.

Previously, proxy objects of Connection, Statement, ResultSet were obtained through singleton factory methods.

It's similar to this:

ROXY_FACTORY is a static field.

The bytecode of the above code is roughly like this:

As you can see from the bytecode, first there is a getstatic call to get the value of the static field PROXY_FACTORY.

There is also a call of the invokevirtual instruction, which corresponds to the getProxyPreparedStatement() method of the ProxyFactory instance:

15: invokevirtual #69  // Method com/zaxxer/hikari/proxy/ProxyFactory.getProxyPreparedStatement:(Lcom/zaxxer/hikari/proxy/ConnectionProxy;Ljava/sql/PreparedStatement;)Ljava/sql/PreparedStatement;

Is there any room for optimization in this place?

The author modified the code to look like this:

The ProxyFactory is generated through Javassist.

So if you look at the source code of ProxyFactory, it's all empty implementations.

Its real implementation logic is the class corresponding to the source code, so I won’t show it in detail. Those who are interested can come down and have a look:

com.zaxxer.hikari.util.JavassistProxyFactory

Then, the getProxyPreparedStatement method was made static.

Then the bytecode becomes like this:

The magical thing happened:

The getstatic instruction disappeared
Invokevirtual is replaced with invokestatic call, which makes it easier to be optimized by JVM.
Finally, what may not have been noticed at first glance is that the stack size is reduced from 5 to 4. This is because in the case of invokevirtual, the instance of ProxyFactory is implicitly passed to the stack (that is, this object), and there is an additional popping operation from the stack when getProxyPreparedStatement() is called.

Points 1 and 3 should not be a big problem. Everyone can understand what is going on.

But this second point: invokevirtual is replaced with invokestatic, which makes it easier to be optimized by JVM.

Seriously, it was like this when I first saw it:

why?

I still remember what invokevirtual and invokestatic do.

But will the performance of invokestatic be better?

So I took this question to look through "In-depth understanding of the JVM virtual machine", and did not find the answer directly.

But there are still unexpected gains. Just wrote this article: "Report! There is a bug in the book"

Otherwise, why do you think I suddenly turned to this part of the book? There is an opportunity.

Although the answer is not written directly in the book, there is a passage like this in the relevant part:

I understand that it is the invokevirtual instruction. It is necessary to query the virtual method table to determine the direct reference of the method.

When invokestatic is loaded, it can be converted from a symbolic reference to a direct reference.

In this way, invokestatic is indeed better than invokevirtual.

Then the problem comes again.

What is the process of class loading?

Load, verify, prepare, parse, and initialize.

In which process does invokestatic do things?

It must be the analysis stage, friends.

The parsing stage is the process in which the JVM replaces the symbolic references in the constant pool with direct references.

Pulling away, come back.

The above is just a bit of my guess. I believe that I am definitely not the only one who has questions about invokevirtual vs invokestatic after reading the author's "Rabbit Hole" article.

So, I went to check it around.

Sure enough. (Sorry for being rude, but I did look for it for a long time.)

Found this link, the first half of the link is exactly the same as my question:

https://github.com/brettwooldridge/HikariCP/issues/464

The author’s reply is as follows:

The latter paragraph is additionally easy to understand.

As mentioned earlier, one less stack for static calls, one less push/pull operation at runtime, which further improves performance.

Mainly because of the previous paragraph, it is a little bit difficult to understand.

He said: In short, when the JVM makes inline calls, even if it is a monomorphic inline, it must install a trap to prevent another implementation from appearing and turn the call into polymorphism.

The setting and clearing of this trap adds a little overhead to the call.

How is it, ignorant?

In fact, I personally understand his words, they are talking about Java's dynamic dispatch, and they are talking about JVM's CHA (Class Hierarchy Analysis) technology.

The answer is written on page 417 of "In-Depth Understanding of the Java Virtual Machine (Third Edition)", turn it over:

You have to ask me what the evidence is, then these two words echo together. How coincidental do you say this is?

Invokevirtual calls a virtual method. According to the book, the trap mentioned earlier is actually the "escape door" here:

The sentence This trap setting and clearing adds slightly more overhead to the invocation (the setting and clearing of this trap adds a little more overhead to the invocation), in fact, corresponds to this:

Now you know why invokestatic is easier to optimize than invokevirtual for JVM?

Optimization refers to inlining.

Invokestatic calls static methods. For non-virtual methods, JVM can directly inline, and this kind of inlining is 100% safe.

Invokevirtual calls a virtual method. For inlining of a virtual method, you have to use the CHA mechanism and set the escape door.

Although they are all inline, it doesn't cost much performance.

Inlining is already optimized for performance, allowing the code to be better inlined and optimized for performance optimization.

This wave operates in the atmosphere.

Well, the above is the optimization at the bytecode level, and then we look at the optimization at the code level.

Optimization at the code level

The most famous one at the code level is the fact that FastList replaces ArrayList.

First, I went to check the submission record of the project. On January 15, 2014, the author made a submission:

We should be familiar with the latter part of the remarks, as we have already talked about.

The previous is the submission that replaced ArrayList with FastList.

The Java ArrayList performs a range check every time get(int index) is called. In the HikariCP project, it is possible to ensure that the index is in the correct range, so this check is meaningless, so it is removed:

For another example, the remove(Object o) method of ArrayList sweeps from beginning to end.

Suppose you want to delete the last element, you need to traverse the entire array.

Coincidentally, such as HikariCP's Statement, according to our coding habits, delete (close) should delete the last one first.

So FastList optimizes the remove(Object element) method to change the search order to reverse search:

On the whole, the optimization points of FastList are the get and remove methods mentioned above.

Next, look at another code-level optimization:

The author listed a few points:

A lock-free design
Thread local cache
Steal queue
Optimization of direct handover

The author's introduction is very simple, in fact, there are still a lot of things in it.

An important trick is that ConcurrentBag pre-allocates a connection through ThreadLocal.

To a certain extent, the competition of shared resources is avoided through ThreadLocal.

If you look at the code for yourself, mainly look at the add (free connection joins the queue), borrow (acquisition connection), and return (release connection) methods.

There are also many corresponding articles on the Internet to introduce, if you are interested, you can learn about it, I will not write it here.

Oh, you don't want to read other articles, just want to wait for me to tell you?
Okay, owe it first, owe it.
Be lazy, the article is too long and no one reads it.

Fight

In the process of writing the article, I also saw such an issue. It feels a bit interesting to write about it.

https://github.com/brettwooldridge/HikariCP/issues/232

A little brother said:

Hello, I think your analysis of the Java database pool is of great reference value. I happened to come across this druid thread pool from Alibaba (it is known as the fastest database pool in Java!). From my quick browsing, it seems to have some cool features. Any thoughts on this. thanks.

The author of HikariCP quickly responded:

At least in his benchmark tests, Druid was the slowest in acquiring and returning connections, and the third fastest in creating and closing statements. The benchmark page in their wiki does not show what configuration they are running with, but I suspect they have disabled borrowed tests. Although I will not say that this is "cheating", it is not how I use it in production. As far as I know, they did not provide their test source code.

This is a bit interesting.

Although I would not say that this is "cheating", it is like this: There is a sentence I don't know if it is improper to say it.

Then it went on to say it.

Then another netizen said:

Druid's design theory is focused on monitoring and data access enhancements (such as automatic database slicing). It provides a SQL parser to analyze the user's SQL query, and collects a lot of data for monitoring. Therefore, if you need a JDBC monitoring solution, you can try Druid.

The author of HikariCP also said that there is nothing wrong with this sentence, but he emphasized that his HikariCP also left a gap for monitoring:

This is a valid point of view. I want to point out that HikariCP also provides monitoring data, but the metrics provided are "pool-level" metrics, not specific to query execution time, etc.

The above conversations all took place in January 2015.

But a year and a half later, on July 26, 2016, this question was activated by another person:

Wenshao, who came?

This person is one of Druid's fathers, who is known as Wen Shao.

Maybe you don't know Wen Shao, maybe you don't know Druid by Wen Shao, but you must know another masterpiece by Wen Shao:

The problem is a little bit more, but it does not prevent others from being a great god.

You can serve tea directly:

First Wen Shao said: If you configure the maxWait attribute, druid will use fair locks, so performance is reduced.

As for why this is the case, it is because of some problems encountered in the production environment and the design is so.

Then he went on to mention Taobao:

Click the link in, and the title is like this:

Talking about Tmall Double 11 in 2015.

The title is translated as:

Alibaba Group sold $5 billion in the first 90 minutes of Singles’ Day sales.

I also saw Papa Ma who hadn’t seen him for a long time in the link:

I understand that Wen Shaofang’s link means that druid is used internally in Ali. Tmall Double Eleven is a very awesome scene. Druid has withstood the test of such a scene.

The author of HikariCP did not reply to Wen Shao.

Until the help of another crowd who eats melons:

The author of HikariCP wondered, this is a battle of data volume.

Then I'm welcome.

HikariCP is one of the most widely used connection pools in the world, used by some of the largest companies, serving billions of users every day.

As for Druid, I'm sorry, I'm a bit straightforward: it's rare outside of China.

But for his answer, someone soon questioned:

Some of the biggest companies are using it, serving billions of users every day? For example?

Do you want data? Sit firmly:

Wix.com hosts more than 109 million websites and handles more than 1 billion requests every day.
Atlassian's products have millions of customers.
HikariCP is the default connection pool of spring boot.
HikariCP parses more than 300,000 times from the central maven repository every month.

These companies are all using:

After this answer, neither side spoke.

The battle between the two parties is over.

But there are still people who continue to follow posts. I think this guy is a sober eater:

Another old brother’s answer is interesting:

Stop arguing, stop arguing. I am here to learn technology, not to come and discuss the difference between "capitalist" tools and "communist" tools.

In my opinion, this battle is really meaningless.

In terms of technology selection, there is no best, only suitable.

Druid and HikariCP each have their own advantages.

One last word (seeking attention)

Okay, after seeing this, please pay attention. Zhou is very tired and needs a little positive feedback.

Thank you for reading, I insist on originality, very welcome and thank you for your attention.

This wave of performance optimization is too bursting!

Why write HikariCP?

Why is the performance awesome?

Bytecode level optimization

Optimization at the code level

Fight

One last word (seeking attention)

why技术

引用和评论

面试场景题：一次关于线程池使用场景的讨论。

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性