8
头图

Hello, I am a crooked person who only leaves the community gate once in thirteen days at home.

This article is here to fill in the pits. When I used to write articles, I would also fill in some of the previous pits, but due to procrastination, most of them would be separated by a few months.

This time, this pit is relatively new, that is, the previously released "Without 20 years of skill, I can't write this line of "seemingly useless" code! " In this article, too many friends asked the same question after reading it:

First of all, I am very grateful to the friends who read my article, and I also especially thank the friends who brought their own thinking and raised valuable questions during the reading process. This is a kind of positive feedback for me.

I really didn't think of this question when I wrote it, so when I asked it suddenly, I probably knew the reason. Since I didn't verify it, I didn't dare to answer it rashly.

So I searched for the answer to this question, so let's start with the conclusion:

It is related to the JIT compiler. Since the code in the loop body is determined to be hot code, the chance of the getAndAdd method entering the safe point is optimized after JIT compilation, so the thread cannot enter the safe point in the loop body.

Yes, it's optimized, it feels cruel to me to type that word.

Next, I'm going to write a "next episode" to tell you how I came to this conclusion. But in order to let you slip into the drama, let me take you a brief review of "Part One".

In addition, let's talk about it first, this knowledge point is the kind that may not be encountered in a lifetime. Therefore, I divided it into the "No Eggs Series" I wrote, and it was a pleasure to watch.

Well, in the previous article, I gave a test case like this:

 public class MainTest {

    public static AtomicInteger num = new AtomicInteger(0);

    public static void main(String[] args) throws InterruptedException {
        Runnable runnable=()->{
            for (int i = 0; i < 1000000000; i++) {
                num.getAndAdd(1);
            }
            System.out.println(Thread.currentThread().getName()+"执行结束!");
        };

        Thread t1 = new Thread(runnable);
        Thread t2 = new Thread(runnable);
        t1.start();
        t2.start();
        Thread.sleep(1000);
        System.out.println("num = " + num);
    }
}

According to the code, the main thread will output the result after sleeping for 1000ms, but the actual situation is that the main thread has been waiting for the execution of t1 and t2 before continuing.

The result of running is this:

In fact, I buried an "easter egg" here. Although this code can be run by pasting it directly, if your JDK version is higher than 10, the running result will be different from what I said earlier.

Judging from the results, there are still many people who have discovered this "easter egg":

So when you read the article, if you have the opportunity to verify it yourself, there may be unexpected gains.

For the problem of inconsistent program performance and expectations, the first solution is this:

Change int to long and you're done. As for why, it has been explained in the previous article, so I won't repeat it here.

The key is the solution below, around which all the controversies revolve.

Inspired by the RocketMQ source code, I modified the code to look like this:

From the running results, even if the object of the for loop is of type int, it can be executed as expected.

why?

Because in the last episode about sleep, I came to the following two conclusions by consulting the information:

  • 1. The thread that is executing the native function can be regarded as "has entered the safepoint".
  • 2. Since the sleep method is native, the thread calling the sleep method will enter Safepoint.

The arguments are clear, the arguments are reasonable, the reasoning is perfect, and the facts are clear, so the last episode ends here...

Until, many friends asked this question:

But the bottom layer of num.getAndAdd is also a native method call?

Yes, like the sleep method, this is also a native method call, which is completely in line with the previous conclusion. Why does it not enter the safe point, and why should it be treated differently?

bold assumption

When I saw the problem, my first reaction was to throw the pot to JIT first. After all, apart from it, I really thought (no) not (no longer) to (solution).

Why do I directly think of the JIT?

Because this line of code in the loop is a typical hotspot code:

num.getAndAdd(1);

Referring to the description in "In-depth Understanding of JVM Virtual Machine", hot codes are mainly divided into two categories:

  • A method that is called multiple times.
  • The body of the loop that is executed multiple times.

The former is easy to understand. The more a method is called, the more times the code in the method is executed, and it is a matter of course that it becomes a "hot code".

The latter is to solve the problem when a method is only called once or a few times, but there is a loop body with a large number of loops inside the method body, so that the code of the loop body is also repeatedly executed many times, so these codes should also be Considered "hot code". Obviously, this is the case with our example code.

In our example code, the loop body triggers the compilation of the hot code, and the loop body is only a part of the method, but the compiler must still use the entire method as the compilation object.

Because the compiled target object is the entire method body, not a separate loop body.

Since both types are "whole method bodies", what is the difference?

The difference is that the execution entry (execution from the first bytecode instruction of the method) will be slightly different, and the Byte Code Index (BCI) of the execution entry point will be passed in when compiling.

This compilation method is vividly called "On Stack Replacement" (OSR) because the compilation occurs during the execution of the method, that is, the method's stack frame is still on the stack, and the method is replaced.

Speaking of OSR sounds a little familiar to you, doesn't it? After all, it also occasionally appeared in the interview session, as some high-level (pretend) (forced) interview questions.

In fact, that's what happened.

Well, let's talk about the concept first. If you want to know more about the rest, you can go to the "Compile Objects and Trigger Conditions" section in the book.

My main purpose is to bring out the point that the virtual machine has done some optimizations for hot code.

Based on the previous foreshadowing, I can fully assume the following two points:

  • 1. Since the bottom layer of num.getAndAdd is also a native method call, there must be a safe point.
  • 2. Since the virtual machine determines that num.getAndAdd is a hot code, there is a wave of optimization. After optimization, the security points that should have existed were eliminated.

Check carefully

In fact, it is very simple to verify. Didn't I say earlier that it is the ghost of JIT optimization, then I just turn off the JIT function and run it again, don't I know the conclusion?

If the main thread continues to execute after sleeping for 1000ms after turning off the JIT function, what does it mean?

It shows that the safepoint can be entered in the loop body, and the execution result of the program is as expected.

So what is the result?

I can turn off the JIT with this parameter:

-Djava.compiler=NONE

Then run the program again:

It can be seen that after closing the JIT, the main thread does not wait for the child thread to finish running before outputting num. The effect is equivalent to changing int to long as mentioned above, or adding code such as Thread.sleep(0).

So are the two assumptions I made earlier?

OK, then the problem comes. The good thing is to be careful to verify, but I just used one parameter to turn off the JIT. Although I see the effect, I always feel that there is something missing in the middle.

What is missing?

I have verified the previous program: after JIT optimization, the security points that should exist have been eliminated.

But this sentence is actually too general. What does it look like before and after JIT optimization? Can you see from where the security point is really gone?

I can't say it's gone if it's gone, it has to be seen.

Hey, you said it was a coincidence.

I just happened to know how to look at this "before and after optimization".

There's a tool called JITWatch that does just that.

https://github.com/AdoptOpenJDK/jitwatch

If you haven't used this tool before, you can check out the tutorial. It's not the focus of this article, so I won't teach it, it's just a tool, not complicated.

I paste the code into the sandbox of JITWatch:

Then click Run, and finally you can get such an interface.

The left is the Java source code, the middle is the Java bytecode, and the right is the assembly instruction after JIT:

The parts I framed are the different assembly instructions after the JIT layered compilation.

Among them, C2 compilation is a high-performance instruction after full compilation, and it is different from the assembly code compiled by C1 in many places.

If you haven't touched this part before, it's okay if you don't understand it, it's normal, after all, you won't be able to take the test.

I cut these pictures for you to show that, as long as you know, I can now get the assembly instructions before and after optimization, but there are many differences between them, so what are the differences that I should pay attention to?

It's like giving you two texts and letting you find the difference, it's easy. But among the many differences, which one do we care about?

This is the key question.

I don't know either, but I found the following article that led me to the truth.

key articles

Well, there are some things that are not painful or itchy, and this article here is the key point:

http://psy-lob-saw.blogspot.com/2015/12/safepoints.html

Because in this article, I found out what the "difference points" should be concerned with after JIT optimization.

The title of this article is "The Significance, Side Effects, and Costs of Safepoints":

The author is a big guy named nitsanw. From the articles in his blog, he has deep attainments in JVM and performance optimization. The above articles are published on his blog.

Here is his github address:

https://github.com/nitsanw

The avatar used is a yak, so I will call him Niu Ge, after all, it is a real cow.

At the same time, Brother Niu works for Azul and is a colleague with R University:

His article is a clean slate of safety points, but there is a lot of content, so I can't cover everything. I can only briefly describe the places that are very relevant to this article, but I really strongly recommend you to read the original text. The article is also divided into upper and lower episodes. This is the address of the next episode:

http://psy-lob-saw.blogspot.com/2016/02/wait-for-it-counteduncounted-loops.html

After reading it, you will know what is thorough and what is:

Brother Niu's article is divided into the following subsections:

  • What's a Safepoint?
  • When is my thread at a safepoint?
  • Bringing a Java Thread to a Safepoint. (bringing a Java thread to a safe point)
  • All Together Now. (make a few examples and run)
  • Final Summary And Testament. (summary and exhortations)

Related to the focus of this article is the section "Bringing a Java thread to a safe point".

Let me parse it for you:

This paragraph mainly says that the Java thread needs to poll a "safe point identifier" every time interval. If the identifier tells the thread to "please go to the safe point", then it enters the state of the safe point.

However, this polling has a certain consumption, so it is necessary to keep safepoint polls to a minimum, which means to reduce the polling of safepoints. Therefore, it is very particular about the time when the safe point polling is triggered.

Since polling is mentioned here, let's talk about the sleep time in our sample code:

Some readers changed the time to a shorter time, such as 500ms, 700ms, etc., and found that the program ended normally?

Why?

Because the polling time is controlled by the -XX:GuaranteedSafepointInterval option, which defaults to 1000ms:

Therefore, when your sleep time is much less than 1000ms, the polling of the safe point has not yet started, and your sleep has ended. Of course, you cannot observe the phenomenon of the main thread waiting.

Well, this is just a casual mention. Going back to Brother Niu's article, he said that considering various factors, the polling of security points can be carried out in the following places:

First place:

Between any 2 bytecodes while running in the interpreter (effectively)

Safepoint polling is possible between any 2 bytecodes when running in interpreter mode.

To understand this sentence, you need to understand the interpreter mode, the previous picture:

As can be seen from the figure, the interpreter and the compiler are complementary to each other.

Alternatively, the -Xint startup parameter can be used to force the virtual machine to run in "interpreted mode":

We can totally try this parameter:

The program stopped normally, why?

Just said:

Safepoint polling is possible between any 2 bytecodes when running in interpreter mode.

Second place:

On 'non-counted' loop back edge in C1/C2 compiled code

After the end of each loop body of a "non-count" loop in C1/C2 compiled code.

I have already said about this "counting loop" and "non-computational loop" in the last episode, and I have also demonstrated it, that is, changing the int to long, so that the "counting loop" becomes a "non-computational loop", so I won't go into details. .

Anyway, we know that there is nothing wrong with what we say here.

Third place:

This is the first half of the sentence: Method entry/exit (entry for Zing, exit for OpenJDK) in C1/C2 compiled code.

At the method entry or exit in the C1/C2 compiled code (Zing is the entry, OpenJDK is the exit).

The first half of the sentence is easy to understand. For our commonly used OpenJDK, even after JIT optimization, a place where safe point polling can be performed is still set at the entrance of the method.

Mainly focus on the second half of the sentence:

Note that the compiler will remove these safepoint polls when methods are inlined.

The compiler removes these safepoint polls when methods are inlined .

Isn't that the case with our example code?

Originally there was a safety point, but it was optimized away. Show that this situation is real.

Then we look down and we can see the "difference" I've been looking for:

Brother Niu said that if someone wants to see security point polling, they can add this startup parameter:

-XX:+PrintAssembly

Then look for the following keywords in the output:

  • If it is OpenJDK, look for {poll} or {poll return}, which is the corresponding safepoint instruction.
  • If it is Zing, look for the tls.pls_self_suspend instruction

This is how it works:

I did find similar keywords, but there are too many compilations in the console output to be analyzed at all.

It doesn't matter, it doesn't matter, what matters is that I got to this key directive: {poll}

That is to say, if there is a {poll} instruction in the initial assembly, but in the fully optimized code after JIT, that is, the assembly instruction of the C2 stage mentioned above, the {poll} instruction cannot be found, it means that it is safe The point is indeed wiped out.

So, in JITWatch, when I choose to view the compilation results of the for loop (hot code) in the C1 stage, I can see that there is a {poll} instruction:

However, when I select the compilation result of the C2 stage, the {poll} instruction is indeed not found:

Then, if I change the code to this, it is the code that will end normally as mentioned earlier:

If it ends normally, it means that the loop body can enter the safe point, that is to say, there is a {poll} instruction.

So, looking at the C2 assembly again through JITWarch, I see it:

why?

Judging from the final output assembly, the existence of the Thread.sleep(0) line of code prevents the JIT from doing too aggressive optimization.

So why does sleep prevent the JIT from doing overly aggressive optimizations?

OK,

Stop asking,

come here,

ask again,

It's not polite.

Brother Niu's case

Brother Niu's article gives the following five cases, each of which has a corresponding code:

  • Example 0: Long TTSP Hangs Application
  • Example 1: More Running Threads -> Longer TTSP, Higher Pause Times
  • Example 2: Long TTSP has Unfair Impact
  • Example 3: Safepoint Operation Cost Scale
  • Example 4: Adding Safepoint Polls Stops Optimization

I mainly show you the 0th and 4th, which are very interesting.

Case 0

Its code is like this:

 public class WhenWillItExit {
  public static void main(String[] argc) throws InterruptedException {
    Thread t = new Thread(() -> {
      long l = 0;
      for (int i = 0; i < Integer.MAX_VALUE; i++) {
        for (int j = 0; j < Integer.MAX_VALUE; j++) {
          if ((j & 1) == 1)
            l++;
        }
      }
      System.out.println("How Odd:" + l);
    });
    t.setDaemon(true);
    t.start();
    Thread.sleep(5000);
  }
}

Brother Niu described the code like this:

He says that the code should end after 5 seconds, but in fact it will keep running unless you force it to stop with the kill -9 command.

But when I paste the code into IDEA and run it, after 5 seconds, the program stops, which is a little embarrassing.

I suggest you stick it out and run as well.

Why is the running result different from what Brother Niu said here?

This question was also asked in the comments section:

So Brother Niu wrote the next episode, explaining in detail why:

http://psy-lob-saw.blogspot.co.za/2016/02/wait-for-it-counteduncounted-loops.html

Simply put, he runs in Eclipse, and Eclipse does not use javac to compile, but uses its own compiler.

Compiler differences lead to differences in bytecode, which in turn lead to differences in running results:

Then Brother Niu gave such a piece of code through an analysis,

The only difference from the previous code is that before calling the countOdds method in the child thread, 10w run calls are made in the main thread.

In this way, after the transformation, the code will not stop after 5 seconds, and it must be forcibly killed.

why?

Don't ask, just ask is the answer is in his next episode, read it yourself, and write it in great detail.

At the same time, in the next episode, Brother Niu also very intimately posted the six types of loops he summed up for you. Those are considered "counted Loops", and it is recommended to identify them carefully:

4th case

This case is a benchmark, and Niu said it was an issue from Netty:

Why is Netty suddenly mentioned here?

Brother Niu gave a hyperlink:

https://github.com/netty/netty/pull/3969#issuecomment-132559757

There is a lot of discussion in this pr, and one of the points of contention is whether to use int or long for loops.

This dude wrote a benchmark that showed no difference between using int and long:

It should be noted that for the convenience of taking screenshots, I deleted the benchmark test of this brother when I took screenshots. If you want to see his benchmark code, you can find it through the link mentioned earlier.

Then this old man with thick hair directly summoned Brother Niu with a summoning technique:

After waiting for a day, Brother Niu wrote a very, very detailed reply, and I still only intercepted part of it:

He came up and said that the previous brother's benchmark test was a bit wrong, so it looked similar. If you look at the scores of the benchmarks I wrote, the gap is huge.

The benchmark test mentioned by Niu here is our fourth case.

So you can also look at this case in combination with Netty's very long pr to see what is called a professional.

Finally, once again, the two articles of Brother Niu mentioned in the article are recommended to be read carefully.

In addition, about the source code of the safety point, I also shared this article before, and it is recommended to eat it together, the taste is better: "The little thing about the safety point"

I'm just showing you a way, the rest of the way is up to you to go by yourself. It's dark and the road is slippery, and the lights are dim. Be careful of your steps, don't look into it, and look back in time, Amitabha!

Finally, thank you for reading my article. Welcome to the public account [why technology], the article will be published on the whole network.

This article participated in the Sifu technical essay , and you are welcome to join.

why技术
2.2k 声望6.8k 粉丝