It turns out that ReadWriteLock can also develop high-performance caches. After reading it, I can also have a good chat with the interviewer!

Hello everyone, I am Glacier~~

In actual work, there is a very common concurrency scenario: that is, the scenario of reading more and writing less. In this scenario, in order to optimize the performance of the program, we often use the cache to improve the access performance of the application. Because the cache is very suitable for use in scenarios with more reads and less writes. In the concurrent scenario, the Java SDK provides ReadWriteLock to meet the scenario of more reads and less writes. In this article, we will talk about how to use ReadWriteLock to implement a universal cache center.

The knowledge points involved in this article are:

The article has been included in:

https://github.com/sunshinelyz/technology-binghe

https://gitee.com/binghe001/technology-binghe

Read-write lock

Speaking of read-write locks, I believe that my friends are no strangers. In general, read-write locks need to follow the following principles:

A shared variable is allowed to be read by multiple reader threads at the same time.
A shared variable can only be written by one writer thread at the same time.
When a shared variable is being written by the writing thread, the shared variable cannot be read by the reading thread at this time.

Here, you need to pay attention to the fact that an important difference between read-write locks and mutex locks is that read-write locks allow multiple threads to read shared variables at the same time, while mutex locks do not. Therefore, in high concurrency scenarios, the performance of read-write locks is higher than that of mutex locks. However, the write operations of the read-write lock are mutually exclusive, that is to say, when the read-write lock is used, a shared variable can not be read by the read thread when a shared variable is written by the writing thread.

The read-write lock supports fair mode and unfair mode. Specifically, it is ReentrantReadWriteLock by passing a boolean variable in the construction method of 06102b46e1e179.

public ReentrantReadWriteLock(boolean fair) {
    sync = fair ? new FairSync() : new NonfairSync();
    readerLock = new ReadLock(this);
    writerLock = new WriteLock(this);
}

In addition, one thing to note is that in the read-write lock, calling newCondition() on the read lock will throw an UnsupportedOperationException, which means that the read lock does not support condition variables.

Cache implementation

Here, we use ReadWriteLock to quickly implement a common tool for caching. The overall code is shown below.

public class ReadWriteLockCache<K,V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl = new ReentrantReadWriteLock();
    // 读锁
    private final Lock r = rwl.readLock();
    // 写锁
    private final Lock w = rwl.writeLock();
    // 读缓存
    public V get(K key) {
        r.lock();
        try { return m.get(key); }
        finally { r.unlock(); }
    }
    // 写缓存
    public V put(K key, V value) {
        w.lock();
        try { return m.put(key, value); }
        finally { w.unlock(); }
    }
}

As you can see, in ReadWriteLockCache, we have defined two generic types, K represents the cached Key, and V represents the cached value. Inside the ReadWriteLockCache class, we use Map to cache the corresponding data. Everyone knows that HashMap is not a thread-safe class. Therefore, a read-write lock is used here to ensure thread safety. For example, we are in get() The method uses a read lock, the get() method can be read by multiple threads at the same time; the put() method uses a write lock internally, that is to say, only one thread can write to the cache at the same time in the put() method operate.

should be noted here that whether it is a read lock or a write lock, the lock release operation needs to be placed in the finally{} code block.

In past experience, there are two ways to load data into the cache. One is: when the project starts, the data is loaded into the cache, and the other is to load the required cache data during the project operation. .

Next, let's take a look at the full load cache and the on-demand load cache respectively.

Full load cache

The full load cache is relatively simple, that is, when the project is started, the data is loaded into the cache at one time. This situation is suitable for scenarios where the amount of cached data is not large and the data changes infrequently, for example: some systems can be cached Data dictionary and other information. The general flow of the entire cache loading is shown below.

After the data is fully loaded into the cache, the corresponding data can be directly read from the cache later.

The code implementation of full load cache is relatively simple. Here, I will directly use the following code to demonstrate.

public class ReadWriteLockCache<K,V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl = new ReentrantReadWriteLock();
    // 读锁
    private final Lock r = rwl.readLock();
    // 写锁
    private final Lock w = rwl.writeLock();
    
    public ReadWriteLockCache(){
        //查询数据库
        List<Field<K, V>> list = .....;
        if(!CollectionUtils.isEmpty(list)){
            list.parallelStream().forEach((f) ->{
                m.put(f.getK(), f.getV);
            });
        }
    }
    // 读缓存
    public V get(K key) {
        r.lock();
        try { return m.get(key); }
        finally { r.unlock(); }
    }
    // 写缓存
    public V put(K key, V value) {
        w.lock();
        try { return m.put(key, value); }
        finally { w.unlock(); }
    }
}

Load cache on demand

Loading the cache on demand can also be called lazy loading, which means that the data is loaded into the cache when it needs to be loaded. Specifically: when the program is started, the data will not be loaded into the cache. When it is running, some data needs to be queried. First, check whether the required data exists in the cache. If it exists, read the data in the cache directly. If it does not exist, query the data in the database and write the data into the cache. For subsequent read operations, because the corresponding data already exists in the cache, the cached data can be returned directly.

This query cache method is suitable for most scenarios where data is cached.

We can use the following code to represent the on-demand query cache business.

class ReadWriteLockCache<K,V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl =  new ReentrantReadWriteLock();
    private final Lock r = rwl.readLock();
    private final Lock w = rwl.writeLock();
    V get(K key) {
        V v = null;
        //读缓存
        r.lock();        
        try {
            v = m.get(key);
        } finally{
            r.unlock();    
        }
        //缓存中存在，返回
        if(v != null) {  
            return v;
        }  
        //缓存中不存在，查询数据库
        w.lock();     
        try {
           //再次验证缓存中是否存在数据
            v = m.get(key);
            if(v == null){ 
                //查询数据库
                v=从数据库中查询出来的数据
                m.put(key, v);
            }
        } finally{
            w.unlock();
        }
        return v; 
    }
}

Here, in the get() method, the data is first read from the cache. At this time, we add a read lock to the query cache operation. After the query returns, the unlock operation is performed. Determine whether the data returned in the cache is empty. If it is not empty, the data will be returned directly; if it is empty, the write lock will be acquired, and then the data will be read from the cache again. If there is no data in the cache, the database will be queried and the result will be Data is written to the cache and the write lock is released. Finally, the result data is returned.

Here, some friends may ask: Why the write lock has been added to the program, why do I need to query the cache inside the write lock?

This is because in high-concurrency scenarios, there may be multiple threads competing for write locks. For example: when the get() method is executed for the first time, the data in the cache is empty. If there are three threads calling the get() method at the same time, and running to the w.lock() at the same time, due to the exclusivity of the write lock. At this time, only one thread will acquire the write lock, and the other two threads are blocked at w.lock() . The thread that has acquired the write lock continues to query the database, writes the data into the cache, and then releases the write lock.

At this time, the other two threads compete write lock, a thread acquires the lock, continue down, if w.lock() no later v = m.get(key); after the query cache data again, this thread will directly query the database, the data is written to the cache Release the write lock. The last thread will also execute according to this process.

Here, in fact, the first thread has already queried the database and written the data into the cache. The other two threads do not need to query the database again, just query the corresponding data directly from the cache. Therefore, adding w.lock() v = m.get(key); to query the cached data again can effectively reduce the problem of repeatedly querying the database in high concurrency scenarios and improve the performance of the system.

Upgrading and downgrading of read-write locks

of the lock, friends need to pay attention to: in ReadWriteLock, the lock does not support upgrade, because when the read lock has not been released, acquiring the write lock at this time will cause the write lock to wait forever, correspondingly The thread will also be blocked and cannot be awakened.

Although lock upgrade is not supported, ReadWriteLock supports lock downgrade. For example, let's take a look at the official ReentrantReadWriteLock example, as shown below.

class CachedData {
    Object data;
    volatile boolean cacheValid;
    final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();

    void processCachedData() {
        rwl.readLock().lock();
        if (!cacheValid) {
            // Must release read lock before acquiring write lock
            rwl.readLock().unlock();
            rwl.writeLock().lock();
            try {
                // Recheck state because another thread might have
                // acquired write lock and changed state before we did.
                if (!cacheValid) {
                    data = ...
                    cacheValid = true;
                }
                // Downgrade by acquiring read lock before releasing write lock
                rwl.readLock().lock();
            } finally {
                rwl.writeLock().unlock(); // Unlock write, still hold read
            }
        }

        try {
            use(data);
        } finally {
            rwl.readLock().unlock();
        }
    }
}}

Data synchronization problem

First of all, the data synchronization mentioned here refers to the data synchronization between the data source and the data cache, and to be more direct, it is the data synchronization between the database and the cache.

Here, we can take three solutions to solve the problem of data synchronization, as shown in the following figure

Timeout mechanism

This is easier to understand. When writing data to the cache, give a timeout period. When the cache times out, the cached data will be automatically removed from the cache. At this time, when the program accesses the cache again, there is no corresponding After querying the database to obtain the data, write the data into the cache.

Update the cache regularly

This solution is an enhanced version of the timeout mechanism. When writing data to the cache, a timeout period is also given. Different from the timeout mechanism, a separate thread is started in the background of the program to periodically query the data in the database, and then write the data into the cache, which can avoid the problem of cache penetration to a certain extent.

Update the cache in real time

This solution can synchronize the data in the database and the cached data in real time. Alibaba's open source Canal framework can be used to achieve real-time synchronization between the MySQL database and the cached data. can also use my personal open source mykit-data framework (recommended)~~

mykit-data open source address:

here today, I’m Glacier, see you in the next issue~~

It turns out that ReadWriteLock can also develop high-performance caches. After reading it, I can also have a good chat with the interviewer!

Read-write lock

Cache implementation

Full load cache

Load cache on demand

Upgrading and downgrading of read-write locks

Data synchronization problem

Timeout mechanism

Update the cache regularly

Update the cache in real time

冰河

引用和评论

互联网大厂的缓存策略：抵抗超高并发的秘密武器，已开源！

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性