Interviewer: Why do you need a Java memory model?

Interviewer : Today I want to talk to you about the Java memory model. Have you understood this?

candidate : Well, let me briefly talk about my understanding. Then I will start with why there is a Java memory model.

Interviewer : Start your performance.

candidate : Then let me talk about the background first

candidate : 1. Existing computers are often multi-core, and each core has a cache. The birth of the cache is due to "the difference between the speed of the CPU and the memory (main memory)", the L1 and L2 caches are generally "exclusive to each core".

Candidate : 2. In order to increase the efficiency of the CPU, the processor may perform "out-of-order execution" of the input code, which is the so-called "instruction reordering"

candidate : 3. A modification operation of a value is often non-atomic (for example, i++ is actually divided into multiple instructions when the computer is executed)

candidate : Under always single thread, none of the above will have any problems, because single thread means no concurrency. And in a single thread, the compiler/runtime/processor must comply with as-if-serial semantics, and compliance with as-if-serial means that they will not reorder "data dependency operations".

candidate : For efficiency, the CPU has caches, instruction reordering, etc., and the entire architecture has become complicated. The programs we write must also want to "full" use of CPU resources! Ever since, we used multi-threading

candidate : Multithreading means concurrency, and concurrency means that we need to consider thread safety issues

candidate : 1. Cached data is inconsistent: multiple threads modify "shared variables" at the same time, and the cache under the CPU core is "not shared". How to synchronize data between multiple caches and memory?

candidate : 2. CPU instruction reordering under multi-threading will cause the code to execute unexpectedly, which will eventually lead to incorrect results.

candidate : For the "cache inconsistency" problem, the CPU also has its own solutions, and there are two commonly known by everyone:

candidate : 1. Use "bus lock": While a core is modifying data, other cores cannot modify the data in the memory. (Similar to the concept of exclusive memory, as long as a CPU is modifying, other CPUs have to wait for the current CPU to release)

candidate : 2. Cache coherency protocol (MESI protocol, there are actually many protocols, but you may have seen it). MESI disassembly English is (Modified (modified state), Exclusive (exclusive state), Share (shared state), Invalid (invalid state))

Candidate : I think the cache coherency protocol can be understood as a "cache lock", it is aimed at the "cache line" (Cache line) for "locking", the so-called "cache line" is actually the smallest unit of cache storage .

Interviewer : Well...

candidate : The principle of the MESI protocol is probably: before each CPU reads a shared variable, it will first identify the "object state" of the data (whether it is modified, shared, exclusive, or invalid).

candidate : If it is exclusive, it means that the variable data that the current CPU will get is the latest and not read by other CPUs at the same time

candidate : If it is shared, it means that the variable data that the current CPU will get is still the latest, and other CPUs are reading at the same time, but it has not been modified

candidate : If it is a modification, it means that the current CPU is modifying the value of the variable, and at the same time, it will send a notification to other CPUs that the data status is invalid (invalid). After the other CPU responds (other CPUs will share the data status from share) becomes invalid (invalid)), the current CPU will write the cached data to the main memory, and change its state from modify (modify) to exclusive (exclusive)

candidate : If it is invalid, it means that the current data has been changed, and the latest data needs to be re-read from the main memory.

candidate : In fact, what the MESI protocol does is to judge the "object state" and make different strategies according to the "object state". The key is that when a certain CPU modifies data, it needs to "synchronize" to notify other CPUs, indicating that this data has been modified by me and you can no longer use it.

candidate : Compared with the "bus lock", the "lock granularity" of the MESI protocol is smaller, and the performance will definitely be higher.

Interviewer : But as far as I know, the CPU is still optimized, do you still know?

candidate : Well, I still know a little bit.

Candidate : As mentioned above, it can be found that when the CPU modifies data, it needs to "synchronize" to tell other CPUs and wait for other CPUs to respond to receive invalid (invalid) before it can write cache data To main memory.

candidate : synchronization means waiting, waiting means nothing. The CPU is definitely not happy, so I optimized it again.

candidate : The optimization idea is to change from "synchronous" to "asynchronous".

candidate : When modifying, it will "synchronize" and tell other CPUs, but now write the latest modified value to the "store buffer", and notify other CPUs to remember to change the state, and then the CPU directly returns to do other things . Wait until the response message sent by other CPUs is received, and then update the data to the cache.

Candidate : When other CPUs receive an invalid notification, they will also put the received message in the "invalid queue". As long as they write to the "invalid queue", they will directly return to tell the CPU that modified the data has the status Set to "invalid"

candidate : And asynchrony will bring new problems: Then I am now after the CPU has modified the value of A and wrote to "store buffer", the CPU can do other things. Then if the CPU receives an instruction and needs to modify the value of A, but the last modified value is still in the "store buffer", it has not been modified to the cache.

candidate : So when the CPU reads, it needs to go to the "store buffer" to see if the memory exists, and if it exists, it will be directly fetched, and the main memory data will be read when it does not exist. 【Store Forwarding】

candidate : Well, the first problem caused by asynchrony is solved. (The same core reads and writes data. Due to asynchronous, it is likely that the old value is read the second time, so the "store buffer" is read first.

Interviewer : else?

candidate : Of course, that "asynchronization" will cause problems for the same core to read and write shared variables, and of course it will also cause problems for "different" cores to read and write shared variables.

candidate : CPU1 has modified the value of A, has written the modified value to the "store buffer" and notified CPU2 to perform an invalid (invalid) operation on the value, and CPU2 may have not received the invalid (invalid) notification, just do it Other operations caused CPU2 to read the old value.

candidate : Even if CPU2 receives an invalid notification, but the value of CPU1 has not been written to the main memory, then when CPU2 reads to the main memory again, the old value is still...

candidate : Variables often have "correlation" (a=1;b=0;b=a), which is imperceptible to the CPU...

candidate : In general, due to the asynchronous optimization of the "cache coherency protocol" performed by the CPU, "store buffer" and "invalid queue", it is likely that subsequent instructions may not be able to find the execution results of the previous instructions (each instruction The order of execution is not the order of code execution), this phenomenon is often referred to as "CPU out-of-order execution"

candidate : In order to solve the problem of disorder (can also be understood as a visibility problem, the modification is not synchronized to other CPUs in time), and the concept of "memory barrier" is introduced.

Interviewer : Well...

candidate : The "memory barrier" is actually to solve the problem of "asynchronous optimization" leading to "CPU out-of-order execution"/"cache not visible in time", how can it be solved? Well, it is to "disable" "asynchronous optimization" off (:

candidate : Memory barriers can be divided into three types: write barriers, read barriers, and universal barriers (including read and write barriers). The barrier can be simply understood as: when operating data, insert a "special instruction" into the data ". As long as this instruction is encountered, the previous operation must be "completed".

candidate : The write barrier can be understood like this: When the CPU finds a write barrier instruction, it will flush all write instructions that existed in the "store Buffer" "before" the instruction into the cache.

candidate : In this way, the data modified by the CPU can be immediately exposed to other CPUs, achieving the effect of "write operation" visibility.

candidate : The read barrier is also similar: when the CPU finds a read barrier instruction, it will process all the instructions that existed in the "invalid queue" "before" the instruction

candidate : In this way, you can ensure that the current CPU cache state is accurate, and the "read operation" must be the latest effect.

candidate : Because different CPU architectures have different cache systems, different cache coherency protocols, different reordering strategies, and different memory barrier instructions provided, in order to simplify the work of Java developers. Java encapsulates a set of specifications, this set of specifications is the "Java Memory Model"

candidate : In more detail, the "Java Memory Model" hopes to shield the access differences of various hardware and operating systems, and to ensure that Java programs have consistent effects on memory access under various platforms. The purpose is to solve the atomicity, visibility (cache coherency) and order problems of multithreading.

Interviewer : about the specification and content of the Java memory model?

Candidate : No, I'm afraid that a chat will last an afternoon, next time?

This article summarizes :

The three main causes of concurrency problems are visibility, orderliness, and atomicity.
Visibility: There is a cache under the CPU architecture, and the L1/L2 cache under each core is not shared (not visible)
Orderliness: There are three main aspects that may lead to disruption
- Compiler optimization leads to reordering (the compiler can adjust and reorder the order of code statements without changing the semantics of a single-threaded program)
- Parallel reordering of the instruction set (it is possible for the CPU to reorder the instructions natively)
- Memory system reordering (store buffer/invalid queue buffer is likely to exist under CPU architecture, this kind of "asynchronism" is likely to cause instruction reordering)
Atomicity: A statement in Java often requires multiple CPU instructions to complete (i++). Due to the thread switching of the operating system, the i++ operation may not be completed. Other threads operate on the shared variable i "in the middle", resulting in the final result that is not what we expected of.
At the CPU level, in order to solve the problem of "cache consistency", there are related "locks" to ensure, such as "bus lock" and "cache lock".
- The bus lock is to lock the bus, and the modification of the shared variable allows only one CPU to operate at the same time.
- Cache lock is to lock the cache line (cache line), among which is the MESI protocol, which marks the status of the cache line, and realizes the visibility and order of (cache line) data by means of "synchronization notification".
- But "synchronous notification" will affect performance, so there will be a memory buffer (store buffer/invalid queue) to achieve "asynchronous" and improve CPU efficiency
- After the memory buffer is introduced, there will be problems with "visibility" and "orderliness". In most cases, you can enjoy the benefits of "asynchrony", but in a few cases, you need strong "visibility" "And "Orderliness", you can only "disable" the optimization of the cache.
- "Disable" cache optimization has "memory barriers" at the CPU level, read barriers/write barriers/omnipotent barriers, essentially inserting a "barrier instruction" to make the buffer (store buffer/invalid queue) operate before the barrier instruction All have been processed, so that the read and write are visible and orderly at the CPU level.
Different CPUs implement different architectures and optimizations. In order to shield the various differences between hardware and operating system access to memory, Java has proposed the specification of the "Java Memory Model" to ensure that Java programs can access memory under various platforms. Can get consistent results

Welcome to follow my WeChat public [1616e189c7a3cc Java3y ] to talk about Java interviews. The online interviewer series is being updated continuously!

[Online Interviewer-Mobile] The series updated twice a week!

[Online Interviewer-Computer] The series updated twice a week!

Original is not easy! ! Seek three links! !

Interviewer: Why do you need a Java memory model?

Java3y

引用和评论

呵，老板不过如此，SQL还是得看我

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊

Jerry和您聊聊Chrome开发者工具

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

💢线上高延迟请求排查

每一个前端，都要拥有属于自己的埋点库~

高质量代码的原则

Interviewer: Why do you need a Java memory model?

Java3y

引用和评论

呵，老板不过如此，SQL还是得看我

Bitmap 和 布隆过滤器傻傻分不清？你这不应该啊

Jerry和您聊聊Chrome开发者工具

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

💢线上高延迟请求排查

每一个前端，都要拥有属于自己的埋点库~

高质量代码的原则

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊