7
头图

Hello, I am why.

I don’t know if you still have any impressions. I once wrote an article like this: "A technical problem that has troubled me for 122 days. I seem to know the answer." 》

The article I gave an example of this:

public class VolatileExample {

    private static boolean flag = false;
    private static int i = 0;
    public static void main(String[] args) {
        new Thread(() -> {
            try {
                TimeUnit.MILLISECONDS.sleep(100);
                flag = true;
                System.out.println("flag 被修改成 true");
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }).start();
        while (!flag) {
            i++;
        }
        System.out.println("程序结束,i=" + i);
    }
}

The above program will not end normally, because the variable flag is not modified by volatile.

In the 100ms when the child thread sleeps, the flag of the while loop is always false. After a certain number of cycles, the JVM's JIT compilation function (JIT) is triggered, and the loop expression hoisting is performed, resulting in an endless loop. .

However, if volatile is added to modify the flag variable to ensure the visibility of the flag, it will not be improved.

The verification scheme is to turn off the JIT function, and the corresponding command is -Xint or -Djava.compiler=NONE .

This is not the point, the point is that I have a few small changes next, and the results of the code are also different.

I said this in the last section of the article:

The question "about Integer" mentioned in the picture is the "metaphysics" mentioned in the article:

Yes, I came back to fill the hole.

Explore again

In fact, let me explore the cause of this problem again because in April someone trusted me privately and asked me if I had a conclusion about the metaphysical problem of Integer.

I can only say:

However, then I thought of a comment in this article:

Since the official account did not have a message function and used a third-party applet at that time, I did not pay much attention to the message reminder.

After this big guy left a message, I saw it after a long time. I also replied to the message:

Thank you for the analysis. When I have time, I will analyze it according to this idea.

But then I also put it on hold, because I felt as if I continued to delve into it and the benefits were not big anymore.

Unexpectedly, after such a long time, readers came to ask again.

So during May 1st, I revised the program according to the message and conducted a wave of research based on search engines.

Hey, guess what?

I really researched something interesting.

Let me start with the conclusion: the final keyword affects the results of the program.

In the above case, where is the final keyword?

When we change the int in the program to Integer, the i++ operation involves packing and unpacking operations. The corresponding source code in this process is here:

And new Interger(i) here is final,

The program can end normally, it is indeed the final keyword that affects the result of the program.

So how does final affect it?

After exploring this place, I found a certain deviation from what was said in the message.

The message says that because of the storestore barrier and the Happens-Before relationship, the flag will be flushed to the main memory.

Based on the help of search engines, I explored the conclusion that with final and without final, two sets of machine codes were generated, resulting in inconsistent running results.

But I have to add a premise here: the processor is an x86 architecture.

The test case based on this conclusion is as follows, also written in accordance with the idea given in the message:

Class contains a final attribute, which is assigned value in the constructor. Then continue to new the object in the while loop:

My operating environment is:

  • jdk1.8.0_271
  • win10
  • IntelliJ IDEA 2019.3.4

The result of the operation is:

  • If the age attribute is modified with final, the program can exit normally.
  • If the final modification is removed from the age attribute, the program loops indefinitely and cannot exit.

The animation is as follows:

You can also paste the code I gave above, run it, and see if the results are consistent with what I said.

Talk about final

When I transformed the program into the above-mentioned look, the conclusion was already obvious. The final keyword affected the operation of the program.

In fact, I was very excited when I came to this conclusion. A problem that has troubled me for more than a year is finally about to be solved by my own hands.

We have all the conclusions. Isn't it easy to find the reasoning process?

And I know where to find the answer, and the answer is hidden in a book on my desk.

So I opened "The Art of Concurrent Programming in Java", in which there is a section devoted to the memory semantics of the final domain:

.png)

I was impressed with this section, because the "overflow" in section 3.6.5 should be "escape". Based on this in the early years, I wrote this article:

"Tell the truth, I found that there is a mistake in this book!"

So I only need to find evidence in this section to prove the argument that the "storestore barrier plus Happens-Before relationship results in the flag will be flushed to the main memory" in the message.

However, things are far from as simple as I thought, because I found that I did not find evidence to prove the argument in the book, but found evidence to overturn the argument.

I will not carry a large part of the content in the book, just focus on the content of the subsection 3.6.6 The realization of final semantics in the processor:

Pay attention to the underlined sentence: In the X86 processor, the read/write of the final domain will not insert any memory barrier.

Since there is no memory barrier, the "storestore barrier" is also omitted. Therefore, under the premise of the X86 processor, the flag refresh caused by the memory semantics of the final domain does not exist.

So the previous argument is incorrect.

So where does the conclusion in this book that "in the X86 processor, the read/write of the final domain will not insert any memory barriers" comes from?

This is a coincidence, our old friend Doug Lee told the author.

You see JSR-133 is mentioned in section 3.6.7. Regarding JSR-133, the old man wrote such an article: "The JSR-133 Cookbook for Compiler Writers", literally translated as "JSR-133 Cookbook for Compiler Writers"

http://gee.cs.oswego.edu/dl/jmm/cookbook.html

In this recipe, there is a table like this:

It can be seen that in the x86 processor, LoadStore, LoadLoad, and StoreStore are all no-ops, that is, there is no operation.

On x86, any lock-prefixed instruction can be used as a StoreLoad barrier. (The form used in linux kernels is the no-op lock; addl $0,0(%%esp).) Versions supporting the "SSE2" extensions (Pentium4 and later) support the mfence instruction which seems preferable unless a lock-prefixed instruction like CAS is needed anyway. The cpuid instruction also works but is slower.

The translation is: on x86, any instruction with the lock prefix can be used as a StoreLoad barrier. (The form used in the Linux kernel is no-op lock; addl $0,0(%%esp).) The version that supports the "SSE2" extension (Pentium4 and higher) supports the mfence instruction, which seems to be better , Unless you need a lock-prefixed instruction like CAS anyway. The cpuid instruction is also possible, but the speed is slower.

When I found this, I was almost stunned, and the little thoughts that I had finally sorted out were blocked again.

Let me give you a stroke.

Can we be very clear that the barrier (StoreStore) brought by final is a no-op in X86 processors and cannot have any impact on memory visibility.

So why does the program stop after adding final?

The program stops, indicating that the main thread must have observed the change of flag?

So why can't the program stop after removing the final?

The program does not stop, indicating that the main thread must not observe the change of flag?

In other words, if you don't stop, it has a direct relationship with whether there is final.

But the barrier brought by the final domain is a no-op in the X86 processor.

Is this really metaphysics?

After going around, why did you go back again?

This wave, really, irritated me, it took me so much time to go around and come back again?

Do it.

stackoverflow

After the previous analysis, the conclusion mentioned in the message cannot be verified.

But I can already know very clearly that it must be the final keyword.

So, I am going to find a circle on stackoverflow to see if there is any accidental discovery.

Sure enough, Huang Tian paid off. I probably read hundreds of posts, and on the verge of giving up, I found a post that shocked me.

After the tiger's body shook, he took another breath: my mother, this is a bug of JVM! ?

Please click here in advance, let me talk about how I searched for problems in stackoverflow.

First of all, under the current situation, the keywords I can determine are Java and final .

But when I searched with these two keywords, there were too many results. After looking through a few, I found that this is undoubtedly looking for a needle in a haystack.

So I changed my strategy. Search on stackoverflow has the tag function:

If I divide this question into a label, the labels are nothing more than Java , JVM , JMM , JIT .

So, I java-memory-model , which is JMM:

It was this treasure issue that promoted the development of the next plot:

https://stackoverflow.com/questions/57427531/in-java-what-operations-are-involved-in-the-final-field-assignment-in-the-cons

I know that when you saw this place, there was no turbulence in your heart. Hearing my body shocked, you even wanted to laugh.

But when I saw this question, I would not exaggerately say: my hands are shaking.

Because I know that this metaphysical problem can be solved here.

The reason why I took a sigh of relief is that the sample code in this question is exactly the same as my code. The Simple in his code corresponds to the Why in my code. The problem you want to verify is even more the same.

The description in the question says this:

Actually, I know the storing "final" field would not emit any assembly instructions on x86 platform. But why this situation came out? Are there some particular operations I don't know ?

In fact, I know that the "final" field will not issue any assembly instructions on x86 processors. But why does this happen? Is there any special operation I don’t know?

the truth

Below is an answer to the stackoverflow question mentioned above, which is the science behind metaphysics:

Let me translate it for you:

Brother, I saw the screenshot in your question, and your posture for checking the question is wrong.

What is the screenshot?

These are the two screenshots attached to the question by the asker:

final case is like this:

The screenshot of non-final case

By the way, as a digression, the source of screenshots is the JITWatch tool, a very powerful tool.

From your screenshot, although runMethod has been compiled, it has not been actually executed. What you need to pay attention to is the% mark in the assembly output, which represents OSR (on-stack replacement) stack replacement.

If you don't know what OSR is, don't worry, just talk about it later.

For adding and not adding final, the final assembly code is different. After I compile, I only keep the relevant parts as follows:

It can be seen from the screenshot that when final is not added, the assembly code is actually an infinite loop. After adding final, the flag field will be loaded every time.

But you see, in both cases, there is no instance allocation to the Simple class, and no field allocation.

Therefore, this is not a question of the compiler's final field assignment, but an optimization method of the compiler.

In the whole process, there is no Simple class at all, and there is no final field at all. But adding final does affect the results of the program.

This problem has been fixed in the newer JVM version (it means a BUG?).

Therefore, if you run the same code on the JDK 11 version, the program will not exit normally no matter whether you add final or not.

Okay, so much has been said above, in fact, the reason is already very clear.

The root cause is that with or without final, two sets of different machine codes are generated in my example environment.

The deep-seated reason is caused by the OSR mechanism.

verification

After the previous analysis, the new investigation direction has now come out.

I now have to verify if the guy who answered the question is talking nonsense.

So I went to verify his sentence first:

If you run the same example on JDK 11, there will be an infinite loop in both cases, regardless of the final modifier.

Use a higher version of the JDK to run with final and without final modifiers.

The program is indeed stuck in an endless loop.

The animation is as follows, you can see that my JDK version is 15.0.1:

The first point verification is complete. The same code, JDK8 and JDK15 run inconsistent results (in fact, JDK9 runs inconsistent).

I have reason to believe that maybe this is a JVM, not a bug, it should be a defect. (Wait... isn't a defect a BUG?)

The second point of verification is his sentence:

Instead, execution jumps from the interpreter to the OSR stub.

The result of running with JDK8 is different because there is a replacement on the stack, so I can use the following command to close the replacement on the stack:

-XX:-UseOnStackReplacement

After removing the final, run the program again, and the program stops.

The second point verification is complete.

The third point of verification is his place:

I also worked out my compilation to see if there is anything like this.

How to assemble it?

Use the following command:

-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation -XX:LogFile=jit.log

At the same time, you also need a hsdis dll file. There are many on the Internet, and you can find it in a search. I believe that if you want to verify it yourself, then finding this file will not trouble you.

When the final field is not added, the assembly looks like this:

What does the jmp command do?

Jump unconditionally.

So, here is an endless loop.

After adding the final field, the assembly looks like this:

The first jump is je instead of jmp.

The jump of je is conditional, which means "jump if equal".

Before the je instruction, there is also the movzbl instruction, which is to read the value of the flag variable.

Therefore, after the final statement is added, the value of the flag variable will be read every time, so the change of the flag value can be seen by the main thread in time.

At the same time, I also took a look at JITWatch. For the new Why(18) statement in the loop, the compiler analyzed that the statement was useless, so it was optimized:

Therefore, we did not see the relevant instructions to allocate the Why object in the assembly, which verified his sentence:

You see, in both cases there is no Simple instance allocation at all, and no field assignment either.

Since then, the problem of metaphysics has been scientifically explained.

If you insist on seeing this, then congratulations, you have learned another point of knowledge that is useless.

If you want to learn something related and useful to this article, then I suggest to take a look at these places:

  • Section 3.6 of "The Art of Java Concurrent Programming"-the memory semantics of the final domain.
  • The fourth part of "In-depth understanding of the Java virtual machine"-program compilation and code optimization.
  • "In-depth analysis of Java virtual machine HotSpot" Chapter 7-Compilation Overview, Chapter 8-C1 Compiler, Chapter 9-C2 Compiler.
  • Chapter 10 of "Java Performance Optimization Practice"-Understanding Just-in-Time Compilation

After reading the above, you will at least understand the two processes of compiling a Java program from source code into bytecode, and then compiling from bytecode to native machine code.

Be able to understand JVM hot code detection scheme, HotSpot's just-in-time compilation, compilation trigger conditions, and how to observe and analyze even compiled data and results from outside the JVM.

You will also learn about some compiler optimization techniques, such as: method inlining, hierarchical compilation, stack substitution, branch prediction, escape analysis, lock elimination, lock expansion... etc. These are basically not available. But you know the knowledge points that appear to be lofty.

In addition, push this column of R big:

https://www.zhihu.com/column/hllvm

This article in the column, treasure:

https://zhuanlan.zhihu.com/p/25042028

For example, the on-stack replacement (OSR) involved in this article, R has answered:

To put it bluntly, OSR is very useful for running scores, but not for normal programs:

It mentioned this passage:

JIT has done a very radical optimization of the code.

In fact, back to our article, whether the final keyword is added or not, it seems that two different sets of machine codes are generated, but in essence, the final keyword prevents the JIT from performing radical optimization.


why技术
2.2k 声望6.8k 粉丝