Preface

Before talking about the Volatile keyword, let's talk about the CPU multi-core concurrent cache architecture, then to JMM, the java memory model, and finally to the volatile keyword.

JMM (Java Memory Model)

The introduction of multi-core concurrent caching architecture

In order to solve the problem of the mismatch between the speed of the CPU and the main memory, the computer adds several levels of cache in the middle of the design (usually placed inside the CPU, here is to look good in the middle), the cache reading speed is very fast, The CPU interacts with the cache. After the program ends, the data in the cache will be synchronized to the main memory and then written back to the hard disk.

The memory model of Java threads is similar to the CPU cache model, which is based on the CPU cache model. The memory model of Java threads is standardized, shielding the differences between different underlying computers. As shown in the figure below:

Like the CPU, thread A will copy this shared variable to the thread's working memory in order to solve the problem of speed mismatch with the main memory. The thread reads the shared variable data to interact with the copy of the variable in the working memory. The working memory here is similar to the cache.

JMM data atomic operation

JMM has 8 atomic operations, sorted according to the usage process, as follows:

Introducing the atomic data operations of java here is to better pave the way for the following questions.

CPU cache inconsistency problem

For multi-core CPUs, when shared variables are loaded into the cache at the same time and are operated on multiple cores at the same time, when core A modifies variable a, core B does not know that a has been modified, and continues to advance core B In this way, the program will have problems, so there is the problem of cache inconsistency.

In order to solve the problem of CPU cache inconsistency, engineers mainly used two methods. In the early days, the bus locking method was mainly used.

Bus lock: That is, the CPU reads data from the main memory to the cache, and locks the data on the bus, so that other CPU cores cannot read or write this data until the CPU uses up the data and releases the lock. The cpu core can read the data. This method can be embodied by the java memory model and java data atomic operations, as shown in the following figure:

When a thread reads a variable in the main memory, it will lock the variable. Other CPUs (threads) who want to read the data of this variable from the main memory cannot read it until the lock is released. Get the value of this variable, and do calculations in other CPUs. The lock operation will be performed before the read, which is marked as a thread exclusive state, and an unlock operation will be performed when the write writes back to the main memory. After unlocking, other threads can lock the variable. In order to solve the problem of visibility and consistency, the early CPU turned a parallel execution program into serial execution.

Obviously this solution is not feasible. Later, engineers used the MESI cache coherency protocol to solve the problem

MESI cache coherency protocol: multiple CPUs read the same data from the main memory to their respective caches. When one of the CPUs modifies the data in the cache, the data will be synchronized back to the main memory immediately, and the other CPUs will sniff through the bus. The detection mechanism can perceive changes in data and invalidate the data in its own cache.
As shown below:

The CPU and the memory are connected by a bus. Each thread reads data from the main memory, achieving parallelism. When thread 2 modifies the initFlag variable and executes the store operation, the value of the modified initFlag=true variable in the working memory will be written back to the main memory at this time, and finally the write will be executed to replace the value in the main memory. Once the atomic operation of store is performed, the data will be written back to the main memory through the bus. The MESI cache coherency protocol has a CPU bus sniffing mechanism (implemented by hardware): when one of the threads (here, thread 2) modifies When the value of a variable is written back from the working memory to the main memory, as long as the data passes through the bus, other CPUs (here, thread 1) will monitor the bus, and monitor the data flow of the variables of interest in the bus, and find that there is When other CPUs (thread 1 here) are interested in variables, the MESI cache coherency protocol will invalidate the value of the same variable in the working memory of this other CPU (thread 1 here) through the bus sniffing mechanism . Then when thread 1 executes the loop operation again, it is found that the initFlag is invalid, and readinitFlag is re-read from the main memory. At this time, the initFlag in the main memory has been modified (true), and thread 1 can get the latest value NS. Through the MESI cache coherency protocol and bus sniffing mechanism, the program can achieve cache coherency.

java code demonstrates invisibility

After talking about the solution to CPU cache inconsistency, let's demonstrate the problem of cache inconsistency under multi-threading through java code, which is also called invisibility.

public class JMM {
    private static  boolean initFlag = false;

    public static void main(String[] args) throws InterruptedException {

        new Thread(() -> {
            while (!initFlag) {
            }
            System.out.println("hello...");

        }).start();

        TimeUnit.SECONDS.sleep(2);
        new Thread(() -> {
            System.out.println("......");
            initFlag = true;
            System.out.println("修改成功....");
        }).start();
    }
}
``````  

查看输出结果:代码运行后结果只输出线程二的信息。主要原因在于两个核CPU不可见性。
![](https://upload-images.jianshu.io/upload_images/23140115-ad3571b4abd1a88d.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
![](https://upload-images.jianshu.io/upload_images/23140115-288a10443772abc4.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



可以看出,多线程情况下,java代码的共享变量initFlag也存在不可见性,那么,java是怎么解决缓存不一致问题的呢?引入了Volatile关键字

# Volatile的作用
我们通过使用volatile修饰变量initFlag,查看代码运行状态。

public class JMM {

private static volatile boolean initFlag = false;

public static void main(String[] args) throws InterruptedException {

    new Thread(() -> {
        while (!initFlag) {
        }
        System.out.println("hello...");

    }).start();

    TimeUnit.SECONDS.sleep(2);
    new Thread(() -> {
        System.out.println("......");
        initFlag = true;
        System.out.println("修改成功....");
    }).start();
}

}

线程2对initFlag的修改,线程1中的initFlag是可以感知到的,即java的关键字volatile可以解决缓存一致性问题。
![](https://upload-images.jianshu.io/upload_images/23140115-1382dbc7f33eea02.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


# 那volatile是如何解决缓存一致性问题的呢?
Volatile缓存可见性实现原理:
底层实现主要是通过汇编lock前缀指令,该指令会锁定这块内存区域的缓存(缓存行锁定)并写回主内存。

IA-32架构软件开发者手册对lock指令的解释:
1)会将当前处理器缓存行的数据立即写回系统内存
2)这个写回内存的操作会引起其他CPU里缓存了该内存地址的数据无效(MESI)

# 看不懂上面在说什么?没关系,记住3点一共:
1)将当前处理器缓存行的数据立即写回系统内存
2)这个写回内存的操作会引起其他CPU里缓存了该内存地址的数据无效(MESI)
3)在store前加锁lock,write后unlock

通过对上面的Java程序转为汇编代码查看(之前看b站的老师转过,具体我也没转,挺麻烦的,这里保留了他的截图)
![](https://upload-images.jianshu.io/upload_images/23140115-d1ff66f90607a056.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



![](https://upload-images.jianshu.io/upload_images/23140115-16020cb12ac36d0a.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


# Java实现缓存一致性问题
由Java内存模型可以看到,参考了CPU的缓存模型,因此多核多线程情况下存在缓存一致性问题。由第5点可知,java在处理缓存一致性问题的时候,使用了volatile关键字进行处理。那么,java是如何通过实现volatile解决缓存不一致问题呢?java参考了CPU解决思路,同时把总线加锁和MESI缓存一致性协议进行了结合, 结合MESI缓存一致性协议的加锁的实现 = volatile,也就解决了缓存一致性问题。
具体如下:

线程1和线程2可以同时从主内存中读取共享变量initFlag到各自工作内存,然后调用各种执行引擎处理该变量,而对该共享变量加上volatile指令后,在线程二执行initFlag= true的时候,会加上lock的前缀汇编指令,该指令使得CPU底层会把修改的工作内存副本变量的值立即写回系统内存。而且这个数据经过总线时,让CPU总线上MESI缓存一致性协议以及结合CPU总线嗅探机制让其他CPU缓存里面那个相同的副本变量失效,同时会锁定这块内存区域的缓存(也就是即将store到内存区域的时候先锁一下), 在store回主内存的时候,会先做个lock操作,然后回写完了后,做一个unlock(write后)操作。 这样子,就可以解决缓存一致性问题了。

# 和总线加锁的区别
volatile的底层实现是:结合MESI缓存一致性协议的加锁的实现,该实现和总线加锁的区别在哪里?

volatile把这个锁的密度大大减小,性能非常高,一开始read的时候各个CPU都能read,但是在回写主内存的时候其他CPU没法运算。若volatile不加lock操作和unlock操作的话,只使用缓存一致性协议和总线嗅探机制,是否有问题??

​ 不加lock,数据刚往总线这边同步(即刚刚回写主内存),这个数据还没写到主内存中的变量中(即这个变量initFlag还没改为true),而其他CPU通过MESI缓存一致性协议里面的总线嗅探机制监听到这个initFlag的值的变动,马上把其他线程中的工作内存的值失效。而其他CPU(线程1)还在持续执行while操作,发现initFlag失效,就马上从主内存中读initFlag,这个线程2还没马上把initFlag修改过的值写到主内存,因此此时其他CPU(线程1)读的还是原来的老数据。所以lock前缀指令必须对store之前加一把锁,在真正write到主内存后,再去把这把锁释放掉,就是为了防止数据还是有些误读(时间差的问题),这个锁的密度非常小,只是对主内存赋一个值,对内存操作,速度得多块,内存级别的并发量至少每秒几十万上百万的操作。只是做变量地址的赋值操作,在这么一个短时间内加一把锁非常快!!!

# volatile不保证原子性
讲到这了,大家应该都清楚并发编程三大特性:可见性、原子性、有序性
而Volatile可以保证可见性和有序性,但是不能保证原子性,原子性可以借助synchronized的锁机制或者并发包下的原子类进行处理,这个原子性下一篇博客会进行总结。
代码演示一下volatile不保证原子性。

public class VolatileAtomicTest {

public static volatile int num = 0;

public static void increase() {
    num++;// num = num + 1
}

public static void main(String[] args) throws InterruptedException {
    Thread[] threads = new Thread[10];
    for (int i = 0; i < threads.length; i++) {
        threads[i] = new Thread(new Runnable() {
            @Override
            public void run() {
                for (int j = 0; j < 100000; j++) {
                    increase();
                }
            }
        });
        threads[i].start();
    }
    for (Thread t : threads) {
        t.join();
    }
    System.out.println(num) ;// 结果是小于或等于1000000
}

}

![](https://upload-images.jianshu.io/upload_images/23140115-21020e4f45d5bb7d.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

不保证原子性原因

线程1做完++操作,然后一旦做完assign操作后,就会写主内存。但是出现一种情况,当线程1做完++后,刚assign值的时候,这个回写操作还没做的时候,线程2 也做了num++了,同时也assign结束了,两个线程就同时向主内存回写。谁先回写的(哪个线程的数据先到达总线),那个线程就会通过volatile关键字给该数据加一把锁,后到达的回写的操作看到该数据有锁之后,就不能加锁了,同时线程1加锁成功了后,执行store的时候数据经过总线,MESI缓存一致性协议结合CPU总线嗅探机制把线程2的工作内存的值失效掉。那么线程2做的num++的操作已经没有意义了(丢失了),下次线程2再做num++的时候,重新从主内存中read到这个线程1写回的值。这个时候上一次线程2做的num++的操作丢失了,也就丢失了一次加1操作。
网络上有些博客说是因为i++不是一个原子操作,但是我更觉得这种方式才是解释为什么不保证原子性的根本原因。
![](https://upload-images.jianshu.io/upload_images/23140115-c761a366cc990d6c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



# volatile保证有序性
volatile主要是通过内存屏障来防止指令重排达到解决有序性问题!

# 最后

前程有光
936 声望618 粉丝