22
头图

Hello, I am crooked.

I was getting crazy output from the keyboard that day:

Suddenly a message popped up on WeChat, which was sent to me by a reader.

I click on it:

Ah, this familiar taste, at first glance, it is HashMap, where the eight-legged literary dream begins.

But the question he asked did not seem to be an eight-part text belonging to HashMap:

Why do you want to assign the table variable to tab here?

As we all know, table is a member variable of HashMap, and the data put into map is stored in this table:

In the putVal method, the table is first assigned to the local variable of tab, and then the local variable is operated in the method.

In fact, it is not just the putVal method. In the source code of HashMap, there are as many as 14 writes such as "tab= table", for example, the same usage in getNode:

Let's think about it first, if we don't use the local variable tab and directly operate the table, will there be a problem?

From the point of view of code logic and function, there will be no faults.

If someone else wrote it like this, I would think it might be his programming habit.

But this thing was written by Doug Lea, and I vaguely felt that there must be a deep meaning in it.

So why write it this way?

Coincidentally, I think I just know what the answer is.

Because I have seen this way of assigning member variables to local variables in other places, and in the comments, I have noted why I wrote it like this.

And this place is Java's String class:

For example, the trim method of the String class assigns the value of String to the local variable val in this method.

Then there is a very brief note next to it:

avoid getfield opcode

The story of this article starts with a line of comments and goes all the way back to 2010, when I finally found the answer to the question.

One line comment, just to avoid getfield bytecode.

Although I don't know what it means, at least I got a few keywords, and I found a "thread", and the next thing is very simple, just follow the thread and it's done.

And I intuitively tell me that this is an extreme optimization at the bytecode level, and it must be a slapstick operation in the end.

So let me tell you the conclusion first: this code was indeed written by Doug Lea, and it was indeed an optimization method at the time, but times have changed, and now it is really useless.

The answer is hidden in the bytecode

Since the operation of bytecode is mentioned here, the next idea is to compare what the bytecodes of these two different writing methods look like. Isn't it clear?

For example, I first come to a test code like this:

 public class MainTest {

    private final char[] CHARS = new char[5];

    public void test() {
        System.out.println(CHARS[0]);
        System.out.println(CHARS[1]);
        System.out.println(CHARS[2]);
    }

    public static void main(String[] args) {
        MainTest mainTest = new MainTest();
        mainTest.test();
    }
}

The test method in the above code, after compiling into bytecode, looks like this:

It can be seen that the three outputs correspond to three such bytecodes:

Just find a JVM bytecode instruction table on the Internet, and you can know what these bytecodes are doing:

  • getstatic: Get the static field of the specified class and push it to the top of the stack
  • aload_0: push the first reference type local variable to the top of the stack
  • getfield: Get the instance field of the specified class and push its value to the top of the stack
  • iconst_0: push int 0 to the top of the stack
  • caload: push the value of the specified index of the char array to the top of the stack
  • invokevirtual: invoke an instance method

If I modify the test program according to the above-mentioned writing method and regenerate the bytecode file, it is like this:

As you can see, the getfield bytecode appears only once.

From three times to once, this is the specific meaning of "avoid getfield opcode" written in the comments.

It does reduce the generated bytecode, which is theoretically an extreme bytecode-level optimization.

Specifically for the getfield command, what it does is to get the member variable of the specified object, and then put the value or reference of the member variable on the top of the operand stack.

More specifically, the getfield command is accessing the CHARS variable in our MainTest class.

The bottom layer is that if there is no local variable to take over, every time the getfield method is used, the data in the heap must be accessed.

To let a local variable take over, you only need to get it once for the first time, and then "cache" the data on the heap into the local variable table, that is, get it into the stack. After that, you only need to call the aload_<n> bytecode each time, and load this local variable onto the operation stack and you are done.

The operation of aload_<n> is a more lightweight operation than getfield.

This can also be seen from the length of the description of these two instructions in the JVM documentation:

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.getfield

I won't go into details, but you should understand here: assigning member variables to local variables before performing operations is indeed an optimization method, which can achieve the purpose of "avoid getfield opcode".

Seeing this, your heart is starting to move a little bit. I feel that this code is great. Can I do a wave too?

Don't worry, there are even better ones, I haven't finished telling you yet.

stackoverflow

In Java, we can actually see this way of writing in many places, such as the HashMap and String we mentioned earlier, if you look carefully at the source code in the JUC package, many of them are written in this way.

However, there is also a lot of code that is not written that way.

For example, there is such a question on stackoverflow:

The buddy who asked the question said why BigInteger didn't use String's trim method "avoid getfield opcode"?

The answer below says this:

In the JVM, String is a very important class, and this tiny optimization might improve startup speed a bit. On the other hand, BigInteger is not important for JVM startup.

So, if you read this article and want to use such a "stick" in your code, think twice.

Wake up, you only have a few traffic, is it worth optimizing to this level?

Moreover, I will tell you that there is optimization at the bytecode level in the front, and we have seen it believing.

But this old man reminded me:

He mentions the JIT, saying this: these tiny optimizations are usually unnecessary, it just reduces the bytecode size of the method, once the code gets hot enough to be JIT optimized, it doesn't really affect the final generated compilation.

So, I rummaged around on stackoverflow, and finally found the most valuable one among thousands of clues.

This question is exactly the same as what the readers at the beginning of the article asked me:

https://stackoverflow.com/questions/28975415/why-jdk-code-style-uses-a-variable-assignment-and-read-on-the-same-line-eg-i

This buddy said: In the jdk source code, more specifically, in the collection framework, there is a small coding habit, that is, before reading a variable in an expression, assign it to a local variable. Is this just a simple little fetish, or is there something more important hidden in it that I haven't noticed?

Then, someone else added a few words for him:

This code was written by Doug Lea, a little Lea, who often does some unexpected codes and optimizations. He is also famous for these "inexplicable" codes, and it's good to get used to it.

Then the answer to this question goes like this:

Doug Lea, one of the main authors of the collections framework and concurrent packages, tends to do some optimizations when he codes. But these optimizations can be counterintuitive and confusing to the average person.

After all, people are in the atmosphere.

Then he gave a piece of code with three methods to verify the different bytecodes generated by different writing methods:

The three methods are as follows:

I won't post the corresponding bytecode, just say the conclusion:

The testSeparate method uses 41 instructions
The testInlined method indeed is a tad smaller, with 39 instructions
Finally, the testRepeated method uses a whopping 63 instructions

The same function, but the last way of directly using member variables generates the most bytecodes.

So he came to the same conclusion as mine:

This way of writing does save a few bytes of bytecode, which is probably why it's used.

but...

Mainly, he's about to start but:

However, in either method, after being JIT optimized, the resulting machine code will be "unrelated" to the original bytecode.

One thing is for sure: all three versions of the code will eventually compile to the same machine code (assembly).

So his advice: don't use this style, just write "dumb" code that is easy to read and maintain. You'll know when it's your turn to use these "optimizations".

You can see he attached a hyperlink to "write dumb code", which I highly recommend you to read:

https://www.oracle.com/technical-resources/articles/javase/devinsight-1.html

In it, you can see Brian Goetz, author of "Java Concurrency in Practice":

His interpretation of the "dumb code" thing:

He said: Generally, the way to write fast code in a Java application is to write "dumb code" -- code that is simple, clean, and follows the most obvious object-oriented principles.

Obviously, the wording of tab = table is not "dumb code".

OK, back to the question. The old man went on to do further tests, and the test results were as follows:

He compared the JIT-optimized assembly of the testSeparate and TestInLine methods, which are identical.

However, what you have to figure out is that this little brother is talking about the testSeparate and testInLine methods here, both of which use local variables:

It's just that testSeparate is much more readable than testInLine.

The writing of testInLine is the writing of HashMap.

Therefore, he said: We programmers can only focus on writing more readable code, instead of doing these "saucy" operations. JIT will do these things for us.

Judging from the method naming of testInLine, you can also guess that this is an inline optimization.

It provides a (very limited, but sometimes convenient) form of "thread safety": it ensures that the length of an array (like the tab array in a HashMap's getNode method) does not change while the method is executing.

Why didn't he mention the testRepeated method, which we care more about?

He also mentioned this in his answer:

He made a minor correction/clarification of a previous statement.

What does it mean, the direct translation is to make a small correction or clarification. In my words, what I said earlier is a bit full, and now it's slapped in the face, so listen to my sophistry.

What did he say earlier?

He said: No need to look at it, the final assembly generated by these three methods must be exactly the same.

But now what he says is:

it can not result in the same machine code
it cannot produce the same assembly

Finally, this brother also added another benefit of this writing method in addition to bytecode level optimization:

Once n is assigned a value here, n will not change in the getNode method. If the length of the array is used directly, assuming that other methods also operate the HashMap at the same time, it is possible to perceive this change in the getNode method.

I believe everyone knows this little knowledge point, it is very intuitive, and I won't say much.

However, seeing this, we still do not seem to find the answer to the question.

Then dig down.

keep digging

The clues that continue to dig down have actually appeared before:

Through this link, we can come to this place:

https://stackoverflow.com/questions/2785964/in-arrayblockingqueue-why-copy-final-member-field-into-local-final-variable

Take a look at the code I've framed and you'll see that the problem raised here is the same as before.

Why should I take it out and say it again?

Since it's just a springboard, I'd like to elicit the following answer:

There are two things in this answer that caught my attention.

The first is the answer itself, he says: This is an extreme optimization that Doug Lea, the author of this class, likes to use. Here's a hyperlink that you can go and have a look at that will answer your question well.

The hyperlink mentioned here has a story:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-May/004165.html

But before I tell the story, I want to talk about the comments below this answer, the part I framed.

This comment has a clear point of view: "Extreme" needs to be emphasized! It's not a universal, good writing that everyone should emulate.

With my self-consciousness in stackoverflow for so many years, there are hidden dragons and crouching tigers here. Generally speaking, those who speak so confidently are big guys.

So I called his name, took a look, and sure enough, it was a big guy:

This guy is from Google and has participated in many projects. Among them is Guava, which we are very familiar with, and he is not an ordinary developer, but a lead developer. Also contributed to Google's Java style guide.

So what he said still had a lot of weight and had to be listened to.

Then, we go to the hyperlink with the story.

In this hyperlink is a question from a buddy named Ulf Zibis:

Ulf's question mentioned in the question: In the String class, I often see member variables copied to local variables. I'm thinking, why do such a cache, just don't trust the JVM, can anyone help me out?

Ulf's question is the same as the question in our article, and he asked this question in 2010, which should be the earliest place I can find about this question.

So you have to keep in mind that the conversations in the emails below are already 12 years ago.

In the dialogue, there is a more official answer to this question:

The person who answered his question is called Martin Buchholz, who is also one of the developers of JDK and a colleague of Doug Lea. He also appeared in the book "Java Concurrent Programming in Practice":

.png)

The JDK concurrency master from SUN company, just ask if you are afraid.

He said: This is a style of coding initiated by Doug Lea. This is an extreme optimization and probably unnecessary. You can expect the same optimization from the JIT. However, for this kind of very low-level code, it is also a very nice thing to write code that is closer to machine code.

Regarding this issue, these people have discussed back and forth for several rounds:

At the bottom of the email, there is a link like this that you can click to see what they discussed:

Mainly look at this email called Osvaldo to line Martin:

https://mail.openjdk.java.net/pipermail/core-libs-dev/2010-May/004168.html

Brother Osvaldo wrote so much, mainly to spray Martin's words: This is an extreme optimization and may not be necessary. You can expect the same optimization from the JIT.

He said he did experiments and came to the conclusion that the optimization made no difference for Hotspot running in Server mode, but was very important for Hotspot running in Client mode. In his test case, this way of writing led to a 6% performance improvement.

Then he said that the code he wrote now, including the next few years, should be running in Hotspot running in Client mode. So please don't mess with this optimized code that Doug wrote on purpose, and I thank you all.

At the same time, he also mentioned JavaME, JavaFX Mobile&TV, so I have to remind you again: this conversation happened 12 years ago, and these technologies he mentioned are already fleeting in my eyes. Pass.

Oh, I can't say I haven't seen it before. After all, I played games written by JavaME when I was in junior high school.

In the case of Osvaldo's more intense words, Martin still made a positive response:

Martin said thank you for the test, and I've incorporated this coding style into my code, but the thing I've been struggling with is whether to push people to do the same. Because I think we can optimize this thing at the JIT level.

Next, the last email came from an old man named David Holmes.

Coincidentally, the name of this brother can also be found in the book "Java Concurrent Programming Practice".

He is the author, and what I mean by introducing him is that I want to express his words with great weight:

Because of his email, it is a final answer to this question.

I take my own understanding and use my words to translate the full text for you. He said this:

I've referred this issue to hotspot-compiler-dev for them to follow up.

I know that the reason Doug wrote this at the time was because the compiler at that time did not have corresponding optimizations, so he wrote it like this to help the compiler optimize a wave. However, I think this problem has been solved long ago at least in the C2 stage. If C1 does not solve this problem, I think it needs to be solved.

Finally, for this way of writing, my suggestion is: at the Java level, you should not type code in this way.

There should not be a need to code this way at the Java-level.

So far, the problem has been sorted out very clearly.

The first conclusion is that this way of writing is not recommended.

Secondly, what Doug wrote back then was indeed an optimization, but with the development of compilers, this optimization sank to the compiler level, and it did it for us.

Finally, if you don't understand the C1, C2 mentioned above, then let me put it another way.

C1 is actually the Client Compiler, that is, the client compiler, which is characterized by a short compilation time but a low degree of optimization of the output code.

C2 is actually a Server Compiler, a server-side compiler, which is characterized by a long compilation time and a higher quality of output code optimization.

The previous Osvaldo said that he mainly uses the client compiler, which is C1. That's why David Holmes has been saying that C2 has optimized this problem. If C1 doesn't have it, you can follow up. Barabara's...

Regarding C2, just mention it briefly, remember it if you remember it, it doesn't matter if you can't remember it, this stuff is generally not tested in the interview.

Many of the "radical" optimizations that the JVM helps us do to improve performance, such as inlining, fast and slow path analysis, and peephole optimization, are all things that C2 does.

In addition, at the time of JDK 10, the Graal compiler was introduced, which was designed to replace C2.

As for why to replace C2, uh, one of the reasons you can read this link...

http://icyfenix.cn/tricks/2020/graalvm/graal-compiler.html

The history of C2 has been very long, dating back to the works of Cliff Click during his Ph.D. study. Although this compiler written in C++ is still effective, it is so complicated that even Cliff Click himself is unwilling to continue to maintain it.

Look at the characteristics of C1 and C1 I mentioned earlier, they just complement each other.

So in order to find a balance between program startup, response speed and program efficiency, after JDK 6, the JVM supports a mode called layered compilation.

It is also the fundamental reason and theoretical support for why people say: "Java code will run faster and faster, and Java code needs to be warmed up".

Here, I quote the content of subsection 7.2.1 [Layered Compilation] in the book "In-depth Understanding of Java Virtual Machine HotSpot", so that you can briefly understand what this is.

First, we can use -XX:+TieredCompilation to enable layered compilation, which introduces four additional compilation levels.

  • Level 0: Interpretation and execution.
  • Level 1: C1 compilation with all optimizations turned on (without Profiling). Profiling is profiling.
  • Level 2: C1 compilation, Profiling information with call count and edge count (restricted profiling).
  • Level 3: C1 compilation with all profiling information (full profiling).
  • Level 4: C2 compilation.

The common hierarchical compilation level conversion path is shown in the following figure:

  • 0→3→4: Common level transitions. Completely compile with C1, then move to level 4 if subsequent method executions are frequent enough.
  • 0→2→3→4: The C2 compiler is busy. First compile quickly at level 2, and then switch to level 3 after collecting enough profiling information, and finally switch to level 4 when C2 is no longer busy.
  • 0→3→1/0→2→1: Level 2/3 is converted to level 1 because the method is less important after compiling. Also goes to level 1 if C2 fails to compile.
  • 0→(3→2)→4: The C1 compiler is busy, and the compilation task can either wait for C1 or go to level 2 quickly, and then shift from level 2 to level 4.

If you didn't know about layered compilation before, it's okay, now you have such a concept.

Again, the interview will not be tested, don't worry.

Well, congratulations on getting here. Looking back, what did you learn?

Yep, didn't learn anything except a useless knowledge point.

This article was first published on the public account why technology, please indicate the source and link for reprinting.


why技术
2.2k 声望6.8k 粉丝