1

1 source

  • Source: "Detailed Multithreading and Architecture Design of Java High Concurrency Programming", written by Wang Wenjun
  • Chapters: Chapters Twelve and Thirteen

This article is a compilation of notes from two chapters.

2 CPU cache

2.1 Cache Model

All operations in the computer are completed by CPU , and the execution process of CPU instruction requires data read and write operations, but CPU can only access data in memory, and the speed of memory and CPU are far from the same Wait, so there is a cache model, that is, a cache layer is added between CPU and memory. Generally, the modern CPU cache layer is divided into three levels, which are called L1 cache, L2 cache and L3 cache. The schematic diagram is as follows:

在这里插入图片描述

  • L1 cache: The L3 cache has the fastest access speed, but the smallest capacity. In addition, the L1 cache is also divided into data cache ( L1d , data initials) and instruction caches ( L1i , instruction initials)
  • L2 cache: slower than L1 , but larger than L1 , in modern multi-core CPU , L2 is generally monopolized by a single core
  • L3 cache: The third-level cache has the slowest speed, but the largest capacity. In modern CPU , there are also L3 designs that are shared by multiple cores, such as the design of the zen3 architecture.

在这里插入图片描述

The appearance of the cache is to solve the inefficiency of CPU direct access to memory. CPU performs operations, it copies the required data from the main memory to the cache, because the access speed of the cache is faster than that of the memory. It is necessary to read the cache and update the result to the cache, and then refresh the result to the main memory after the operation, which greatly improves the calculation efficiency. The overall interaction diagram is briefly as follows:

在这里插入图片描述

2.2 Cache coherence problem

Although the emergence of cache greatly improves the throughput, it also introduces a new problem, which is cache inconsistency. For example, the simplest i++ operation needs to copy the memory data to the cache, CPU reads the cache value and updates it, first writes the cache, and then refreshes the new cache to the memory after the operation. The specific process is as follows :

  • Read i in memory into cache
  • CPU read the value in cache i
  • Add 1 to i
  • write the result back to the cache
  • Then flush the data to main memory

Such a i++ operation will not cause problems in a single thread, but in multi-threading, because each thread has its own working memory (also called local memory, which is the thread's own cache), the variable i is in the local memory of multiple threads. There is a copy in both, if two threads perform the i++ operation:

  • Assume that the two threads are A and B, and assume that the initial value of i is 0
  • Thread A reads the value of i from memory and puts it into the cache. At this time, the value of i is 0. Thread B does the same, and the value in the cache is also 0.
  • Two threads perform auto-increment operations at the same time. At this time, in the caches of threads A and B, the value of i is both 1
  • Two threads write i to main memory, which is equivalent to i being assigned to 1 twice
  • The end result is that i has a value of 1

This is a typical cache inconsistency problem. The mainstream solutions are:

  • bus lock
  • Cache Coherence Protocol

2.2.1 Bus lock

This is a pessimistic implementation. Specifically, the processor sends the lock instruction to lock the bus. After the bus receives the instruction, it will block the requests of other processors until the processor occupying the lock completes the operation. The characteristic is that only one processor that grabs the bus lock runs, but this method is inefficient. Once a processor acquires the lock, other processors can only block and wait, which will affect the performance of multi-core processors.

2.2.2 Cache Coherence Protocol

The diagram is as follows:

在这里插入图片描述

The most famous of the cache coherency protocols is the MESI protocol. MESI ensures that copies of shared variables used in each cache are consistent. The general idea is that when CPU operates the data in the cache, if it is found that the variable is a shared variable, the operation is as follows:

  • Read: do no other processing, just read the data in the cache into the register
  • Write: Signal other CPU to set the variable's cache line to an invalid state ( Invalid ), and other CPU need to go to main memory to get it again when reading the variable

Specifically, MESI specifies that cache lines use 4 status flags:

  • M : Modified , modified
  • E : Exclusive , exclusive
  • S : Shared , shared
  • I : Invalid , invalid

The detailed implementation of MESI is beyond the scope of this article. For more details, please refer to here or here .

3 JMM

After reading the CPU cache, let's take a look at JMM , which is the Java memory model, which specifies how JVM works with the computer's main memory, and also determines when a thread's writing to a shared variable is visible to other threads. JMM defines the thread The abstract relationship between the main memory and the main memory is as follows:

  • Shared variables are stored in main memory and can be accessed by every thread
  • Each thread has private working memory or local memory
  • Working memory only stores the thread's copy of the shared variable
  • A thread cannot directly operate the main memory, and can only write to the main memory after operating the working memory first.
  • Working memory, like the JMM memory model, is also an abstract concept, but it does not exist. It covers caches, registers, compile-time optimization, and hardware.

The sketch is as follows:

在这里插入图片描述

Similar to MESI , if a thread modifies the shared variable and refreshes it to the main memory, other threads find that the cache is invalid when reading the working memory, and will read it from the main memory to the working memory again.

The following figure shows the relationship between JVM and computer hardware allocation:

在这里插入图片描述

4 Three characteristics of concurrent programming

I've read most of the articles and haven't volatile yet? Don't worry, don't worry, let's take a look at three important features in concurrent programming, which are of great help to correctly understand volatile .

4.1 Atomicity

Atomicity is in one or more operations:

  • Either all operations are executed without interruption by any factor
  • or none of the operations are performed

A typical example is a transfer between two people. For example, A transfers 1,000 yuan to B. This includes two basic operations:

  • A's account deducted 1000 yuan
  • B's account is increased by 1000 yuan

These two operations either succeed or fail, that is, there cannot be a situation in which the A account deducts 1,000 but the B account's amount remains unchanged, nor can the A account's amount remain unchanged and the B account increases by 1,000.

Note that the combination of two atomic operations is not necessarily atomic, such as i++ . Essentially, i++ involves three operations:

  • get i
  • i+1
  • set i

All three operations are atomic, but combined ( i++ ) is not atomic.

4.2 Visibility

Another important feature is visibility. Visibility means that if a thread modifies a shared variable, another thread can immediately see the latest value after modification.

A simple example is as follows:

public class Main {
    private int x = 0;
    private static final int MAX = 100000;
    public static void main(String[] args) throws InterruptedException {
        Main m = new Main();
        Thread thread0 = new Thread(()->{
            while(m.x < MAX) {
                ++m.x;
            }
        });

        Thread thread1 = new Thread(()->{
            while(m.x < MAX){
            }
            System.out.println("finish");
        });

        thread1.start();
        TimeUnit.MILLISECONDS.sleep(1);
        thread0.start();
    }
}

The thread thread1 will always run, because thread1 reads x into the working memory, it will always judge the value in the working memory, because thread0 changes the value of the working memory of thread0 , and is not visible to thread1 , so 061c14 will never output finish . The result can also be seen with jstack :

在这里插入图片描述

4.3 Orderliness

Orderliness refers to the order in which the code is executed. Due to the optimization of JVM , the order in which the code is written may not necessarily be the order in which the code is run, such as the following four statements:

int x = 10;
int y = 0;
x++;
y = 20;

It is possible that y=20 is executed before x++ , which is instruction reordering. Generally speaking, in order to improve the efficiency of the program, the processor may optimize the input code instructions to a certain extent, and will not execute the code strictly in the order of writing, but it can ensure that the final operation result is the expected result during encoding. Sorting also has certain rules, which need to strictly abide by the data dependencies between instructions, and cannot be reordered arbitrarily, such as:

int x = 10;
int y = 0;
x++;
y = x+1;

y=x+1 cannot perform better than x++ first.

Reordering under a single thread will not cause the expected value to change, but under multiple threads, if the ordering is not guaranteed, then there may be big problems:

private boolean initialized = false;
private Context context;
public Context load(){
    if(!initialized){
        context = loadContext();
        initialized = true;
    }
    return context;
}

If reordering occurs, initialized=true sorted before context=loadContext() , assuming that two threads A and B access at the same time, and loadContext() takes a certain amount of time, then:

  • After thread A passes the judgment, it first sets the value of the Boolean variable to true , and then performs the loadContext() operation.
  • Since the boolean variable in thread B is set to true , it will directly return an unloaded context

5 volatile

Well, it's finally time for volatile . I said so much earlier, the purpose is to fully understand and understand volatile . This part is divided into four subsections:

  • Semantics of volatile
  • How to ensure order and visibility
  • Implementation principle
  • scenes to be used
  • Difference from synchronized

Let's first introduce the semantics of volatile .

5.1 Semantics

Instance variables or class variables modified by volatile have two levels of semantics:

  • Ensures the visibility of shared variable operations between different threads
  • Disable reordering of instructions

5.2 How to ensure visibility and orderliness

Conclusion first:

  • volatile guarantees visibility
  • volatile guarantees ordering
  • volatile does not guarantee atomicity

They are introduced separately below.

5.2.1 Visibility

Java ensures visibility in the following ways:

  • volatile : When a variable is modified by volatile , the read operation of the shared resource will be carried out directly in the main memory (to be precise, it will also be read in the working memory, but if other threads have modified it, it must be re-read from the main memory fetch), the write operation is to modify the working memory first, but it will be refreshed to the main memory immediately after the modification.
  • synchronized : synchronized can also ensure visibility, can ensure that only one thread acquires the lock at the same time, and then executes the synchronization method, and ensures that the modification of the variable is flushed to the main memory before the lock is released
  • Using explicit lock Lock : Lock method of lock can ensure that only one thread can acquire the lock at a time and then execute the synchronization method, and ensure that changes to variables can be flushed to main memory before the lock is released

Specifically, look at the previous example:

public class Main {
    private int x = 0;
    private static final int MAX = 100000;
    public static void main(String[] args) throws InterruptedException {
        Main m = new Main();
        Thread thread0 = new Thread(()->{
            while(m.x < MAX) {
                ++m.x;
            }
        });

        Thread thread1 = new Thread(()->{
            while(m.x < MAX){
            }
            System.out.println("finish");
        });

        thread1.start();
        TimeUnit.MILLISECONDS.sleep(1);
        thread0.start();
    }
}

As mentioned above, this code will run continuously without output, because the modified x is invisible to thread thread1 . If x is added to the definition of volatile , there will be no output, because at this time Modifications to x are visible to thread thread1 .

5.2.2 Orderliness

JMM allows reordering of instructions at compile time and the processor, which may cause problems in the case of multi-threading. For this reason, Java also provides three mechanisms to ensure ordering:

  • volatile
  • synchronized
  • Explicit lock Lock

In addition, it is necessary to mention the Happens-before principle about orderliness. Happends-before principle says that if the order of execution of two operations cannot be deduced from this principle, then ordering cannot be guaranteed, and JVM or the processor can reorder them arbitrarily. The purpose of this is to maximize the parallelism of the program. The specific rules are as follows:

  • Program order rule: within a thread, the code is executed in the order in which it was written, and the operations written after it occur and the operation written after the previous operation.
  • Locking rule: If a lock is in the locked state, the unlock operation must precede the lock operation on the same lock
  • volatile Variable Rule: Writes to a variable are preceded by subsequent reads of the variable
  • Delivery rule: If operation A precedes operation B, operation B precedes operation C, then operation A precedes operation C
  • Thread start rule: Thread object's start() method precedes any action on that thread
  • Thread interrupt rules: executing the interrupt() method on a thread is definitely better than catching the interrupt signal. In other words, if an interrupt signal is received, then interrupt() must be called before that
  • Thread termination rule: All operations in the thread must first occur in the termination detection of the thread, that is, the execution of the logical unit must occur before the termination of the thread
  • Object finalization rule: the completion of an object initialization occurs before finalize()

For volatile , reordering of instructions will be directly prohibited, but instructions without dependencies before and after volatile can be reordered at will, for example:

int x = 0;
int y = 1;
//private volatile int z;
z = 20;
x++;
y--;

Before z=20 , there is no requirement to define x or y first. It is only necessary to ensure that x=0,y=1 is executed when z=20 is executed. Similarly, x++ or y-- is not required to be executed first. After z=20 .

5.2.3 Atomicity

In Java , all reads and assignments to variables of primitive data types are atomic, as are the reads and assignments to variables of reference type, but:

  • The operation of assigning a variable to another variable is not atomic, because it involves reading a variable and writing a variable, and the combination of two atomic operations is not atomic.
  • Multiple atomic operations together are not atomic operations, such as i++
  • JMM only guarantees the atomic operation of basic reading and assignment, and the others are not guaranteed. If atomicity is required, you can use synchronized or Lock , or the atomic operation class under the JUC package

That is to say, volatile does not guarantee atomicity. Examples are as follows:

public class Main {
    private volatile int x = 0;
    private static final CountDownLatch latch = new CountDownLatch(10);

    public void inc() {
        ++x;
    }

    public static void main(String[] args) throws InterruptedException {
        Main m = new Main();
        IntStream.range(0, 10).forEach(i -> {
            new Thread(() -> {
                for (int j = 0; j < 1000; j++) {
                    m.inc();
                }
                latch.countDown();
            }).start();
        });
        latch.await();
        System.out.println(m.x);
    }
}

The final output value of x will be less than 10000 , and the results of each run are not the same. As for the reason, you can start from the two threads A and B to analyze, as shown below:

在这里插入图片描述

  • 0-t1 : Thread A reads x into working memory, at this time x=0
  • t1-t2 : Thread A has finished its time slice, CPU schedules thread B, and thread B reads x into the working memory, at this time x=0
  • t2-t3 : Thread B performs an auto-increment operation on x in the working memory and updates it to the working memory
  • t3-t4 : When the time slice of thread B ends, CPU schedules thread A. Similarly, thread A automatically increments x in the working memory.
  • t4-t5 : Thread A writes the value in the working memory back to the main memory, at this time the value in the main memory is x=1
  • After t5 : the time slice of thread A ends, CPU schedules thread B, thread B also writes its working memory back to the main memory, and x in the main memory to 1 again

That is to say, in the case of multi-threaded operation, there will be two auto-increment operations but only one value modification operation is actually performed. It is also very simple to change the value of 10000 x just add synchronized :

new Thread(() -> {
    synchronized (m) {
        for (int j = 0; j < 1000; j++) {
            m.inc();
        }
    }
    latch.countDown();
}).start();

5.3 Implementation principle

As we already know, volatile can guarantee order and visibility, so how does it work?

The answer is a lock; prefix, which is actually equivalent to a memory barrier, which provides the following guarantees for the execution of instructions:

  • Ensure instruction reordering does not enqueue code following it before the memory barrier
  • Ensures that instruction reordering does not enqueue code preceding it behind a memory barrier
  • Ensure that the preceding code is all executed when the memory barrier-modified instruction is executed
  • Force a value modification in the thread's working memory to be flushed to main memory
  • If it is a write operation, it will invalidate the cached data in the working memory of other threads

5.4 Usage scenarios

A typical usage scenario is to use switches to close threads. Examples are as follows:

public class ThreadTest extends Thread{
    private volatile boolean started = true;

    @Override
    public void run() {
        while (started){
            
        }
    }

    public void shutdown(){
        this.started = false;
    }
}

If the Boolean variable is not modified by volatile , then it is very likely that the new Boolean value will not be flushed to the main memory, resulting in the thread not ending.

5.5 Difference from synchronized

  • Difference in use: volatile can only be used to modify instance variables or class variables, but cannot be used to modify methods, method parameters, local variables, etc. In addition, the variables that can be modified are null . But synchronized cannot be used to modify variables, only methods or statement blocks, and monitor objects cannot be null
  • Atomicity guarantee: volatile does not guarantee atomicity, but synchronized
  • To ensure visibility: volatile and synchronized guaranteed visibility, but synchronized by means JVM instruction monitor enter / monitor exit guaranteed in monitor exit time all shared resources are flushed to main memory, and volatile by lock; machine instructions Implemented, forcing the working memory of other threads to fail, and needs to be loaded into main memory
  • Guarantee of ordering: volatile can prohibit JVM and the processor from reordering it, while the ordering guaranteed by synchronized is exchanged for serialized execution of the program, and the code in the synchronized code block also occurs instructions rearrangement
  • Other differences: volatile does not block the thread, but synchronized does

氷泠
420 声望647 粉丝