Without 20 years of skill, it is impossible to write the "seemingly useless" code of Thread.sleep(0)!

Hello, I'm Wai Wai, who has been quarantined at home for seven days.

This article starts with a strange note, which is the following picture:

We can ignore the specific code logic and just look at the for loop.

In the loop, there is a special variable j to record the current number of loops.

After the first loop and every 1000 loops thereafter, enter an if logic.

Above this if logic, a comment is marked: prevent gc.

Prevent, if students who do not know this word remember it, the exam must be taken:

The translation of this comment is: prevent the GC thread from garbage collection.

The specific implementation logic is as follows:

The core logic is actually this line of code:

Thread.sleep(0);

This can achieve prevent gc?

Confused?

It's right to be stupid, and it means it's worth playing with.

This code snippet is actually from the RocketMQ source code:

org.apache.rocketmq.store.logfile.DefaultMappedFile#warmMappedFile

It should be noted in advance that I did not find the person who wrote this code to ask what his intentions were, so I can only speculate his intentions based on my own understanding. If the speculation is wrong, please advise.

Although this is the source code of RocketMQ, based on my understanding, this little trick has nothing to do with the RocketMQ framework, and can be completely separated from the framework.

The amendments I gave are as follows:

Change the int to long, and then you can directly delete the if logic in the for loop.

Doesn't it make you more confused?

Don't panic, next, I will give you a cocoon.

In addition, before "peeling the cocoon", let me say the conclusion:

The theoretical standpoint of this modification scheme is the knowledge related to Java's safety point, that is, safepoint.
In the end, the official did not adopt this modification plan.
It doesn't matter whether the official picks or not, the important thing is that I have to "peel a cocoon" for you.

explore

When I knew this code snippet belonged to RocketMQ, the first thing that came to my mind was to look for the answer in the code commit log.

See if the committer stated his intent when submitting the code.

So I pulled the code down and saw that the commit record was like this:

I knew there would be no answer here.

Because this logic was included when the class was submitted for the first time, and there was a lot of code corresponding to this submission, and the corresponding functions were not specified.

No useful information was obtained from the commit log.

So I turned my attention to the issue of github and searched for the keyword prevent gc.

Didn't find much useful information other than the first link:

And the issues corresponding to the first link are this:

https://github.com/apache/rocketmq/issues/4902

This issues is actually what we raised in the process of discussing this issue, which is the modification plan that appeared earlier:

In other words, I want to find the authoritative answer to this question through the source code or github, but I can't find it.

So I went to this amazing site again and found this 2018 question in it:

https://stackoverflow.com/questions/53284031/why-thread-sleep0-can-prevent-gc-in-rocketmq

The question is exactly the same as ours, but here is the answer to this question:

This answer is not good, because I think it missed the point, but it doesn't matter, I can just use this answer as a starting point to align the poor point and empower it.

Look at the first sentence of this answer: It does not (it does not).

The question arises: Who is "it"? "there is nothing?

"It" refers to the code we presented earlier.

"No" means that GC threads are not prevented from garbage collection.

This answer says: The purpose of calling Thread.sleep(0) is to give the GC thread a chance to be selected by the operating system for garbage cleaning. The side effect of this is that it may run the GC more frequently, after all you have a chance to run the GC every 1000 iterations, but the benefit is that it prevents long garbage collections.

In other words, this code wants to "trigger" a GC, not "avoid" a GC, or rather "avoid" a very long GC. From this point of view, the comments in the program are actually lying or incomplete.

It is not prevent gc, but adopts the idea of "break up operation, cut peaks and fill valleys" for gc, thus prevent long time gc.

But think about it, when we program ourselves, we never have the idea of "this place should trigger a GC" under normal circumstances, right?

Because we know that for Java programmers, the virtual machine has its own GC mechanism. We don't need to manage memory by ourselves like writing C or C++. We only need to focus on business code, and we don't pay special attention to the GC mechanism.

Then comes the most crucial question in this article: why should we pay special attention to GC in the code here, and want to try to "trigger" GC?

Let's talk about the answer first: safepoint, safe point.

For the description of security points, we can look at section 3.4.2 of "In-depth Understanding of JVM Virtual Machines (Third Edition)":

Note the description in the book:

With the setting of the safe point, it is determined that the user program execution cannot be stopped at any position in the code instruction stream to start garbage collection, but it is mandatory that the execution must reach the safe point before it can be suspended.

In other words: without reaching the safe point, STW cannot be performed, so that GC can be performed.

If in your cognition the GC thread can run at any time. Then you need to refresh your knowledge.

Next, let's turn our attention to section 5.2.8 of the book: Long pauses caused by safe spots.

There is this passage in it:

I took out the underlined part separately, and you read it carefully:

In order to avoid the heavy burden caused by too many security points, the HotSpot virtual machine also has an optimization measure for the loop. It is believed that if the number of loops is small, the execution time should not be too long, so the int type or a smaller range is used. Loops with data types as index values are not placed safepoints by default. This kind of loop is called a Counted Loop. Correspondingly, a loop that uses a long or a larger data type as an index value is called an Uncounted Loop, and a safety point will be placed. .

This means that in the case of a Counted Loop, the HotSpot virtual machine has made an optimization, that is, the thread will not enter the safe point until the loop is over.

Conversely, if the loop does not end, the thread will not enter the safe point, and the GC thread has to wait for the current thread loop to end and enter the safe point before it can start working.

What is a Counted Loop?

The case in the book comes from this link:

https://juejin.cn/post/6844903878765314061
HBase combat: remember a safepoint that led to a long-term STW trip

If you have time, I suggest you take a complete look at this case, I only intercept the part that solves the problem:

The while(i < end) in the screenshot is a countable loop. Since the thread executing this loop needs to enter Safepoint after the loop ends, the thread that first enters Safepoint needs to wait for it. Thus affecting the operation of the GC thread.

Therefore, the modification plan is to modify the int to long.

The principle is to make it an Uncounted Loop, so that you can enter Safepoint during the loop without waiting for the loop to end.

Then we pull our eyes back here:

This loop is also a countable loop.

The code of Thread.sleep(0) looks inexplicable, but I can make a bold guess: Did the person who wrote this code intentionally put a Safepoint here to avoid the GC thread waiting for a long time, thus lengthening The purpose of stop the world time?

So, I only need to find evidence that sleep will enter Safepoint next to prove my conjecture.

guess what?

I originally wanted to take a look at the source code, but after a slap, I found it directly in the comments of the source code:

https://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/runtime/safepoint.cpp

The comment says that when the program enters Safepoint, the Java thread may be in five different states framed, and there are different solutions for different states.

Originally, I wanted to translate them one by one, but the amount of information was too large, and it was a bit difficult for me to digest them, so I won't talk nonsense.

Mainly focus on the second point related to this article: Running in native code.

When returning from the native code, a Java thread must check the safepoint _state to see if we must block.

The first sentence is the answer, which means that a thread must perform a safepoint detection after returning to the Java thread after running the native method.

At the same time, I saw this answer by R on Zhihu. There is such a sentence in it, which also confirms this point:

https://www.zhihu.com/question/29268019/answer/43762165

Then, it's time to witness the miracle:

According to R, the thread that is executing the native function is regarded as "has entered the safepoint", or this situation is called "in the safe-region".

The sleep method is a native method, do you think it's a coincidence?

So, here we can be sure: the thread calling the sleep method will enter the Safepoint.

Also, I found a 2013 R major thread discussing similar issues:

https://hllvm-group.iteye.com/group/topic/38232?page=2

Here is the point of name and surname: Thread.sleep(0).

This reminds me of a previous interview question asking: what is the use of Thread.sleep(0).

At that time I thought: This question is really difficult (S) ah (B). Now I found out that it was me who was not good enough, and that the clown was actually me.

Really useful.

practice

All of the above are actually theories.

In this part, let's take a run with code practice, and take the "It's absolutely amazing!" I shared before. This code is manipulated by the JVM! " The case in the article.

 public class MainTest {

    public static AtomicInteger num = new AtomicInteger(0);

    public static void main(String[] args) throws InterruptedException {
        Runnable runnable=()->{
            for (int i = 0; i < 1000000000; i++) {
                num.getAndAdd(1);
            }
            System.out.println(Thread.currentThread().getName()+"执行结束!");
        };

        Thread t1 = new Thread(runnable);
        Thread t2 = new Thread(runnable);
        t1.start();
        t2.start();
        Thread.sleep(1000);
        System.out.println("num = " + num);
    }
}

This code, you can directly paste it into your idea and it can run.

According to the code, the main thread will output the result after sleeping for 1000ms, but the actual situation is that the main thread has been waiting for the execution of t1 and t2 before continuing.

This loop belongs to the Counted Loop mentioned earlier.

What happened to this program?

1. Two long, uninterrupted loops are started (no safepoint checks inside).
2. The main thread goes to sleep for 1 second.
3. After 1000 ms, the JVM tries to stop at Safepoint to allow the Java thread to clean up periodically, but it cannot do so until the countable loop is complete.
4. The Thread.sleep method of the main thread returns from native and finds that the safe point operation is in progress, so it suspends itself until the operation ends.

So, when we change int to long, the program behaves normally:

Inspired by the RocketMQ source code, we can also directly take its code:

This way, even if the object of the for loop is of type int, it will execute as expected. Because we are equivalent to inserting Safepoint in the loop body.

In addition, I tested the time-consuming of the two schemes in a non-rigorous way :

Run it a few times on my machine, and the time difference is not big.

But if you want to talk about force, you have to write prevent gc on the right. Without 20 years of skill, it is impossible to write this line of "seemingly useless" code!