Deep dive into JAVA thread safety issues

thread safety issue

Thread unsafety refers to the fact that a class will have some unknown results when running in a multi-threaded situation.
The main thread safety issues are: atomicity visibility ordering

atomicity

For operations involving shared variable access, threads other than the thread performing the operation appear to be indivisible, then this operation is called an atomic operation, and we call the operation atomic.

Two major elements of atomicity problem ( shared variable + multithreading )
- The atomic operation is for the operation of shared variables, and the local variable does not matter whether it is atomic or not (because the local variable is located in the stack frame is inside the thread, and there is no multi-threading problem).
- Atomic operations are aimed at a multi-threaded environment, and there is no thread safety problem in a single thread.
The meaning of the "indivisible" description of atomic operations
- For operations on shared variables, threads other than the operating thread either have not yet occurred or have finished, and they cannot see the intermediate results of the operation.
- Access to the same set of shared variables cannot be interleaved.
Two ways to achieve atomic operations: 1. Locks 2. CAS instructions of the processor
Locks are generally implemented at the software level, and CAS is usually implemented at the hardware level
In the Java language, the write operations of the two basic types of long/double are not atomic, and the other six basic types are atomic. Use the volatile keyword to modify long/double type to make it atomic.

visibility

In a multi-threaded environment, a thread updates a shared variable, and subsequent threads accessing the shared variable cannot obtain the updated result immediately, or even never get the result. This phenomenon is called visibility. question.

The read and write operations between the processor and the memory are not carried out directly, but through the register write buffer cache and invalidation queue and other components to perform the memory read and write operations.
```
cpu ==> 写缓存器 ==> 高速缓存 ==> 无效化队列
||        ||          ||
===========       缓存一致性协议
```
Cache synchronization: Although the contents of one processor's cache cannot be read by another processor, one processor can read the cache of other processors through the Cache Coherence Protocol (MESI) and will read the The received content is updated to its own cache, and this process is called cache synchronization.
The cause of the visibility problem
- Shared variables in the program may be allocated to the processor's registers for storage. Each processor has its own registers, and the contents of the registers cannot be accessed by other processors. So when two threads are allocated to different The processor and shared variables are stored in their own registers, which will cause one thread to never access updates to the shared variables by another thread, resulting in visibility problems.
- Even if the shared variable is allocated to the main memory for storage, the processor reads the main memory through the cache. When the processor A finishes operating the shared variable and updates the result to the cache, it must first pass the write buffer, and only update the operation result. When it comes time to write the buffer, processor B accesses the shared variable, and there will also be a visibility problem (the write buffer cannot be accessed by other processors).
- After the operation result of the shared variable is updated from the cache to the cache of another processor, it is put into the invalidation queue by this processor, so that the content of the shared variable read by the processor is still out of date, which also Visibility issues arise.
How visibility assurance is implemented
- flushes processor cache : When a processor makes an update to a shared variable, it must have its update eventually written to the cache or main memory.
- flushes the processor cache : When the processor operates a shared variable, other processors have updated the shared variable before, so the cache or main memory must be cached.
The role of volatile
- Prompt the JIT compiler that this volatile modified variable may be shared by multiple threads, avoiding the JIT compiler to optimize it that may cause the program to run abnormally.
- When reading the variables modified by volatile , the processor cache is refreshed first, and the processor cache is flushed after updating the variables modified by volatile .
Does a single processor have visibility issues?
When a single processor implements multi-threaded operation, it is implemented by context switching. When the switch occurs, the data in the register will also be saved and not accessed by the "below", so when the shared variable is stored in the register, there will also be visibility. question.

orderliness

The concept of : The order in which the processor performs operations is inconsistent with the order specified by our object code
Reordering has the following situations
- The bytecode sequence compiled by the compiler is inconsistent with the object code
- The execution order of bytecode instructions is inconsistent with the target code
- The object code is executed correctly, but other processors have a wrong perception of the execution order of the object code
  For example, processor A performs operation a first and then operation b, but from the point of view of processor B, processor A performs operation b first, which is a perceptual error.
  Reordering is generally divided into: instruction reordering and storage subsystem reordering
  
  Reordering is an optimization of memory access operations, it does not affect the correctness of the program running under a single thread, but it will affect the correctness of the program running under multiple threads.
instruction reordering
For the sake of performance, the compiler makes corresponding adjustments to the execution order of the instructions without affecting the correctness of the program, thus causing the execution order to be inconsistent with the source code order.
There are two compilers for the java platform:
- Static compiler (javac), which translates java source code into bytecode files (.class), basically does not reorder instructions during this period.
- Dynamic compiler (JIT), which dynamically compiles java bytecode into machine code, instruction reordering often occurs during this period.
  For the sake of execution efficiency, modern processors often do not execute instructions in program order, but dynamically adjust the order of instruction execution, so that which instruction is ready first, which instruction is executed first, which is called out-of-order execution. The execution results of these instructions will Before writing to the register or main memory, it will be stored in the reorder buffer, and then the reorder buffer will submit the instruction execution result to the register or main memory in program order, so out-of-order execution will not affect the single The execution result of the thread is correct, but unexpected results can occur in a multi-threaded environment.
Storage Subsystem Reordering (Memory Reordering)
Processor-0 Processor-1
data=1; //S1
ready=true; //S2
while(! ready){ }//L3
System.out.println(data); //L4
When there is no instruction reordering for Processor-0 and Processor-1, Processor-0 executes the program in the order of S1-S2, but Processor-1 senses that S2 is executed first, so Processor-1 may not When S1 is sensed, L3-L4 will be executed, then the program will print data=0 at this time, which causes thread safety problems.
The above situation is that memory reordering occurs in S1 and S2.
looks like serial semantics
Reordering is not an arbitrary order by the compiler and the processor to adjust the results of instructions and memory operations, but follows certain rules.
Compilers and processors following this rule will give single-threaded programs an "artifact" of sequential execution, which is called looks like serial semantics .
To ensure seemingly serial semantics, statements with data dependencies will not be reordered, and statements without data dependencies may be reordered.
Take the following example: statement ③ depends on statement 1 and statement 2 so that no reordering can occur between them, but statement 1 and statement 2 have no data dependency so statement 1 and statement 2 can be reordered.
```
float price = 59.0f; // 语句①
short quantity = 5; // 语句②
float subTotal = price * quantity; // 语句③
```
Statements with control dependency are allowed to be reordered, as follows:
There is a control dependency between flag and count and can be reordered, that is, in the case of not knowing the value of flag, in order to pursue efficiency, count++ may be executed first.
```
if(flag){
  count++;
}
```
uniprocessor systems be affected by reordering
1. Reordering during static compilation will affect the processing results of a single-processor system
Processor-0 Processor-1
data=1; //S1
ready=true; //S2 while(! ready){ }//L3
System.out.println(data); //L4
As shown in the figure above, after S1 and S2 are sorted at compile time
Processor-0 Processor-1
ready=true;//S2
data=1; //S1
while(! ready){ }//L3
System.out.println(data); //L4
When S2 is executed, the context switch of the program is switched from Processor-0 to Processor-1. Obviously, this reordering has caused unexpected results and caused thread safety problems.
2. Run-time reordering (JIT dynamic compilation, memory reordering) will not affect the processing results of the single-processing system.
When these reorderings occur, the relevant instructions have not been completely executed, and the system will not perform context switching. It will wait until the reordered instructions are executed and submitted, and then switch contexts. Therefore, reordering in a thread is important for post-switching The other thread has no effect.

Processor-0	Processor-1
data=1; //S1 ready=true; //S2
	while(! ready){ }//L3 System.out.println(data); //L4

Processor-0	Processor-1
data=1; //S1 ready=true; //S2	while(! ready){ }//L3 System.out.println(data); //L4

Processor-0	Processor-1
ready=true;//S2 data=1; //S1	while(! ready){ }//L3 System.out.println(data); //L4

context switch

The overhead required for context switching

Direct expenses include:

The overhead required by the operating system to save and restore the context, which is mostly processor time overhead.
The overhead of thread scheduling by the thread scheduler (for example, according to certain rules to determine which thread will occupy the processor to run).

Indirect overhead includes:

Processor cache reload overhead. A thread that was switched out may be switched in to continue running at a later time on another processor. Since the processor may not have run the thread before, the variables that the thread needs to access during its continuation still need to be reloaded into the cache by the processor from main memory or from other processors through the cache coherence protocol. middle. This takes some time.
A context switch may also cause the contents of the entire L1 cache to be flushed (Flush), that is, the contents of the L1 cache will be written to the next level cache* (such as L2 cache) or main memory (RAM) middle.

Deep dive into JAVA thread safety issues

thread safety issue

atomicity

visibility

orderliness

context switch

eacape

引用和评论

ByteBuddy入门笔记

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性