1

1. What is a distributed lock

1.1 Introduction to Distributed Locks

Distributed lock is a lock implementation that controls access to shared resources between different systems. If a resource is shared between different systems or different hosts of the same system, mutual exclusion is often required to prevent mutual interference and ensure consistency. .

1.2 Why do we need distributed locks?

In a single-machine deployment system, thread locks are used to solve the problem of high concurrency, and the problem of multi-threaded access to shared variables achieves data consistency, such as using synchornized, ReentrantLock, etc. However, in the system deployed by the back-end cluster, the program runs in different JVM virtual machines, and because synchronized or ReentrantLock can only guarantee the validity in the same JVM process, distributed locks need to be used at this time. The principle of synchornized lock will not be repeated here.

1.3 Requirements for distributed locks

Distributed locks need to be mutually exclusive, deadlock-free, and fault-tolerant. Mutual exclusion means that at any time, only one thread should hold a lock; no deadlock is that the lock can be released even if the client holding the lock crashes unexpectedly or the process is killed, etc. It will not cause the entire service to deadlock. Fault tolerance means that clients should be able to acquire and release locks as long as the majority of nodes are functioning properly.

2. Implementation of distributed locks

The current mainstream distributed lock implementation methods are: database-based distributed locks, Redis-based distributed locks, and ZooKeeper-based distributed locks. This article mainly introduces the distributed locks implemented by Redis.

2.1 Evolution from single-machine deployment to cluster deployment locks

At the beginning, set a default value key in redis: the corresponding value of ticket is 20, and build a Spring Boot service to simulate the phenomenon of multi-window ticket sales. The code of the configuration class is not listed one by one.

2.1.1 Single-machine mode to solve concurrency problems

At the beginning, the pre-set ticket value ticket=20 in redis, then when a request comes in, it will judge whether the remaining ticket is greater than 0, if it is greater than 0, the remaining ticket will be reduced by one, and then re-written into Redis, If the inventory is less than 0, then an error log will be printed.

 @RestController
@Slf4j
public class RedisLockController {
    
    @Resource
    private Redisson redisson;
    
    @Resource
    private StringRedisTemplate stringRedisTemplate;
    
    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {
        String lockKey = "ticket";
        int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
        if (ticketCount > 0) {
            int realTicketCount = ticketCount - 1;
            log.info("扣减成功,剩余票数:" + realTicketCount + "");
            stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
        } else {
            log.error("扣减失败,余票不足");
        }
        return "end";
    }
    
}

Code running analysis: There is obviously a problem here, that is, if there are two threads requesting incoming at the same time, then when two threads request this code at the same time, as shown in thread 1 and thread 2 at the same time, the data obtained by the two threads from Redis Both are 20, then thread 1 and thread 2 will rewrite the reduced inventory ticket=19 to Redis after the execution is completed, then the data will have problems. The number of votes is reduced by one write, and there is a phenomenon of data inconsistency.

This kind of problem is easy to solve. The above problems are actually caused by the fact that taking data from Redis and reducing the remaining votes are not atomic operations, so at this time, you only need to add the synchronized synchronization code to these two operations with the following code to solve this problem. question.

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {
        String lockKey = "ticket";
        synchronized (this) {
            int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
            if (ticketCount > 0) {
                int realTicketCount = ticketCount - 1;
                log.info("扣减成功,剩余票数:" + realTicketCount + "");
                stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
            } else {
                log.error("扣减失败,余票不足");
            }
        }
        return "end";
    }

}

Code operation analysis: At this time, when multiple threads execute to the position of line 14, only one thread can acquire the lock and enter the synchronized code block for execution. When the thread is executed, the lock will be released, and the next thread will wait for it. After entering, this code will be locked and executed again. To put it simply, each thread queues up to execute the code in the code block, thus ensuring thread safety.

If there is only one machine for the back-end service in the above-mentioned way, there is no doubt that there is no problem, but now Internet companies or general software companies cannot use only one machine for back-end services, at least 2 machines. The backend service cluster architecture composed of servers, then synchronized locking obviously has no effect.

As shown in the figure below, if the backend is a service cluster composed of two microservices, nginx will forward the load balance of multiple requests to different backend services. Since the synchronize code block can only take effect in the same JVM process, Two requests can enter two services at the same time, so the synchronized in the above code has no effect at all.

Just test it with the JMeter tool, and you can easily find the bugs in the above code. In fact, the locks under the synchronized and juc packages are locks that can only be used in the JVM process dimension, and cannot be used in a cluster or distributed deployment environment.

2.1.2 Cluster mode solves concurrency problems

Through the above experiments, it is easy to find that JVM process-level locks such as synchronized cannot solve the concurrency problem in distributed scenarios, and distributed locks are generated to cope with this scenario.

This article introduces the distributed lock implemented by Redis, which can be set by Redis's setnx (only if the key key does not exist, the value of the key key is set to value. If the key key already exists, the SETNX command does nothing .) instructions to solve the problem that the locks in the above cluster environment are not unique.

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {

        String lockKey = "ticket";
        // redis setnx 操作
        Boolean result = stringRedisTemplate.opsForValue().setIfAbsent(lockKey, "dewu");
        if (Boolean.FALSE.equals(result)) {
            return "error";
        }

        int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
        if (ticketCount > 0) {
            int realTicketCount = ticketCount - 1;
            log.info("扣减成功,剩余票数:" + realTicketCount + "");
            stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
        } else {
            log.error("扣减失败,余票不足");
        }

        stringRedisTemplate.delete(lockKey);
        return "end";
    }

}

Code operation analysis: There is a problem with the code, that is, when the operation of deducting the remaining votes is performed, if the business code reports an exception, the subsequent deletion of the Redis key code will not be executed, and the Redis key will not be deleted. If it is dropped, then the key of Redis will always exist in Redis, and the following thread will come in and execute the following line of code without success, which will lead to thread deadlock, and the problem will be very serious.

In order to solve the above problem, it is actually very simple, just add a try...finally, so that the business code can release the lock normally even if an exception is thrown. setnx + try ... finally solved , the specific code is as follows:

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {

        String lockKey = "ticket";
        // redis setnx 操作
        try {
            Boolean result = stringRedisTemplate.opsForValue().setIfAbsent(lockKey, "dewu");
            if (Boolean.FALSE.equals(result)) {
                return "error";
            }
            
            int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
            if (ticketCount > 0) {
                int realTicketCount = ticketCount - 1;
                log.info("扣减成功,剩余票数:" + realTicketCount + "");
                stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
            } else {
                log.error("扣减失败,余票不足");
            }
        } finally {
            stringRedisTemplate.delete(lockKey);
        }
        return "end";
    }

}

Code operation analysis : The above problem of error reporting in the execution of business code is solved, but there will be new problems. When the program executes to a certain location in the try code block, the service is down or the service is re-released, so there will still be the above-mentioned Redis The key is not deleted causing a deadlock situation. In this way, you can use the expiration time of Redis to set the key, setnx + expiration time to solve , as shown in the following code:

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {

        String lockKey = "ticket";
        // redis setnx 操作
        try {
            Boolean result = stringRedisTemplate.opsForValue().setIfAbsent(lockKey, "dewu");
            //程序执行到这
            stringRedisTemplate.expire(lockKey, 10, TimeUnit.SECONDS);
            if (Boolean.FALSE.equals(result)) {
                return "error";
            }

            int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
            if (ticketCount > 0) {
                int realTicketCount = ticketCount - 1;
                log.info("扣减成功,剩余票数:" + realTicketCount + "");
                stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
            } else {
                log.error("扣减失败,余票不足");
            }
        } finally {
            stringRedisTemplate.delete(lockKey);
        }
        return "end";
    }

}

Code running analysis : The above code solves the deadlock problem caused by the lock not being released due to the downtime during program execution, but if the code is written like the above, there will still be problems. When the program executes to line 18, The program is down, and the expiration time of Redis is not set at this time, which will also lead to the phenomenon of thread deadlock. You can use the atomic command set by Redis to set the expiration time, and the setnx command of the atomic expiration time , as shown in the following code:

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {

        String lockKey = "ticket";
        // redis setnx 操作
        try {
            Boolean result = stringRedisTemplate.opsForValue().setIfPresent(lockKey, "dewu", 10, TimeUnit.SECONDS);
            if (Boolean.FALSE.equals(result)) {
                return "error";
            }

            int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
            if (ticketCount > 0) {
                int realTicketCount = ticketCount - 1;
                log.info("扣减成功,剩余票数:" + realTicketCount + "");
                stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
            } else {
                log.error("扣减失败,余票不足");
            }

        } finally {
            stringRedisTemplate.delete(lockKey);
        }
        return "end";
    }

}

Code running analysis : By setting the atomic expiration time command, the sudden downtime during the execution of the above program can be well solved. There seems to be no problem with the implementation of this Redis distributed lock, but there will be problems in high concurrency scenarios. Generally, when the concurrency of software companies is not very high, this method of implementing distributed locks is sufficient. , even if there are some small data inconsistencies, it is acceptable, but if it is in a high concurrency scenario, the above implementation will still have big problems.

As shown in the above code, the expiration time of the distributed lock is 10s. If the execution time of thread 1 takes 15s, and when the thread 1 thread executes for 10s, the Redis key just expires and the lock is released directly. At this time, the thread 2, the lock execution code can be obtained. If the completion time of thread 2 thread execution takes 8s, then when thread 2 thread executes to 5s, thread 1 thread just executes the code to release the lock —— stringRedisTemplate .delete(lockKey) ; At this point, it will be found that the lock deleted by thread 1 is not its own lock, but the lock added by thread 2; then thread 3 can come in again, then if a total of 5s are executed, then when thread 3 executes to At 3s, thread 2 will execute the code to release the lock again, then thread 2 deletes the lock added by thread 3.

In high concurrency scenarios, if the above problems are encountered, it will be a catastrophic bug. As long as high concurrency exists, the distributed lock will sometimes succeed and sometimes fail.

It is actually very simple to solve the above problems. Let each thread add a lock with a unique id value to Redis. When releasing the lock, first judge whether the unique id of the thread is the same as the value stored in Redis, if they are the same Release the lock. The setnx command to set the atomic expiration time of the thread id, the specific code is as follows:

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {

        String lockKey = "ticket";
        String threadUniqueKey = UUID.randomUUID().toString();
        // redis setnx 操作
        try {
            Boolean result = stringRedisTemplate.opsForValue().setIfPresent(lockKey, threadUniqueKey, 10, TimeUnit.SECONDS);
            if (Boolean.FALSE.equals(result)) {
                return "error";
            }

            int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
            if (ticketCount > 0) {
                int realTicketCount = ticketCount - 1;
                log.info("扣减成功,剩余票数:" + realTicketCount + "");
                stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
            } else {
                log.error("扣减失败,余票不足");
            }
        } finally {
            if (Objects.equals(stringRedisTemplate.opsForValue().get(lockKey), threadUniqueKey)) {
                stringRedisTemplate.delete(lockKey);
            }
        }
        return "end";
    }

}

Code running analysis : The Redis distributed lock implemented above has been able to meet most application scenarios, but it is still slightly insufficient. For example, when the execution time required for the incoming thread exceeds the expiration time of the Redis key, it has been released at this time, you Other threads can immediately obtain the lock to execute the code, and a bug will occur again.

The expiration time of the distributed lock Redis key is inappropriate no matter what it is set to. For example, if the expiration time is set to 30s, then if the business code has slow SQL and a large amount of query data, the expiration time is not easy to set. So is there a better solution here? The answer is yes - lock life.

Then the original lock renewal scheme is that when the thread is successfully locked, a sub-thread will be opened, and the task will be executed regularly at 1/3 of the lock expiration time. (that is, the key of Redis), if the lock still exists, then directly reset the lock expiration time, if the lock no longer exists, then directly end the current sub-thread.

2.2 Redison framework implements Redis distributed lock

The above "lock life extension" solution is simple to say, but it is quite complicated to implement, so there are many open source frameworks on the market that have already been implemented for us, so there is no need to repeat the wheel and write a distributed lock. So this time, I will take the Redison framework as an example, mainly to learn the idea of designing distributed locks.

2.2.1 The use of Redison distributed locks

The distributed lock implemented by Redison is very simple to use. The specific code is as follows:

 @RestController
@Slf4j
public class RedisLockController {

    @Resource
    private Redisson redisson;

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @RequestMapping("/lock")
    public String deductTicket() throws InterruptedException {

        //传入Redis的key
        String lockKey = "ticket";
        // redis setnx 操作
        RLock lock = redisson.getLock(lockKey);
        try {
            //加锁并且实现锁续命
            lock.lock();
            int ticketCount = Integer.parseInt(stringRedisTemplate.opsForValue().get(lockKey));
            if (ticketCount > 0) {
                int realTicketCount = ticketCount - 1;
                log.info("扣减成功,剩余票数:" + realTicketCount + "");
                stringRedisTemplate.opsForValue().set(lockKey, realTicketCount + "");
            } else {
                log.error("扣减失败,余票不足");
            }

        } finally {
            //释放锁
            lock.unlock();
        }
        return "end";
    }

}

2.2.2 The principle of Redison distributed lock

The principle process of Redison's implementation of distributed locks is shown in the figure below. When thread 1 is successfully locked and starts to execute business code, the Redison framework will open a background thread, and determine whether it is still or not once every 1/3 of the lock expiration time. Hold the lock (whether the key in Redis still exists), if not, end the current background thread directly. If the lock is still held, reset the lock expiration time. When thread 1 is successfully locked, then thread 2 will fail to lock. At this time, thread 2 will perform a spin operation similar to CAS, waiting for thread 1 to be released before thread 2 can successfully lock.

2.2.3 Source code analysis of Redison distributed lock

Redison uses a large number of lua scripts to ensure various atomicity of its locking operations when implementing distributed locks at the bottom. The main advantage of using lua scripts for distributed locks in Redison is to ensure that the operations of Redis are atomic. Redis will execute the entire script as a whole, and will not be inserted by other commands in the middle.

Redisson core uses lua script to lock source code analysis:

The method name is tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command):

 //使用lua脚本加锁方法
<T> RFuture<T> tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
     internalLockLeaseTime = unit.toMillis(leaseTime);

     return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command,
           //当第一个线程进来会直接执行这段逻辑                            
           //判断传入的Redis的key是否存在,即String lockKey = "ticket";
           "if (redis.call('exists', KEYS[1]) == 0) then " +    
           //如果不存在那么就设置这个key为传入值、当前线程id 即参数ARGV[2]值(即getLockName(threadId)),并且将线程id的value值设置为1
             "redis.call('hset', KEYS[1], ARGV[2], 1); " +    
          //再给这个key设置超时时间,超时时间即参数ARGV[1](即internalLockLeaseTime的值)的时间
             "redis.call('pexpire', KEYS[1], ARGV[1]); " +        
             "return nil; " +
             "end; " +
          //当第二个线程进来,Redis中的key已经存在(锁已经存在),那么直接进这段逻辑
          //判断这个Redis key是否存在且当前的这个key是否是当前线程设置的
           "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
          //如果是的话,那么就进入重入锁的逻辑,利用hincrby指令将第一个线程进来将线程id的value值设置为1再加1 
          //然后每次释放锁的时候就会减1,直到这个值为0,这把锁就释放了,这点与juc的可重锁类似           
          //“hincrby”指令为Redis hash结构的加法
             "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
             "redis.call('pexpire', KEYS[1], ARGV[1]); " +
             "return nil; " +
             "end; " +
          //倘若不是本线程加的锁,而是其他线程加的锁,由于上述lua脚本都是有线程id的校验,那么上面的两段lua脚本都不会执行
          //那么此时这里就会将当前这个key的过期时间返回 
             "return redis.call('pttl', KEYS[1]);",
             Collections.<Object>singletonList(getName()), internalLockLeaseTime, getLockName(threadId));   // KEYS[1])  ARGV[1]   ARGV[2]
}
// getName()传入KEYS[1],表示传入解锁的keyName,这里是 String lockKey = "ticket";
// internalLockLeaseTime传入ARGV[1],表示锁的超时时间,默认是30秒
// getLockName(threadId)传入ARGV[2],表示锁的唯一标识线程id

Set the listener method: method name tryAcquireOnceAsync(long leaseTime, TimeUnit unit, final long threadId)

 //设置监听器方法:    
    private RFuture<Boolean> tryAcquireOnceAsync(long leaseTime, TimeUnit unit, final long threadId) {
        if (leaseTime != -1) {
            return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_NULL_BOOLEAN);
        }
   //加锁成功这里会返回一个null值,即ttlRemainingFuture为null
   //若线程没有加锁成功,那么这里返回的就是这个别的线程加过的锁的剩余的过期时间,即ttlRemainingFuture为过期时间
        RFuture<Boolean> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_NULL_BOOLEAN);
        //如果还持有这个锁,则开启定时任务不断刷新该锁的过期时间
        //这里给当前业务加了个监听器
        ttlRemainingFuture.addListener(new FutureListener<Boolean>() {
            @Override
            public void operationComplete(Future<Boolean> future) throws Exception {
                if (!future.isSuccess()) {
                    return;
                }

                Boolean ttlRemaining = future.getNow();
                // lock acquired
                if (ttlRemaining) {
                    //定时任务执行方法
                    scheduleExpirationRenewal(threadId);
                }
            }
        });
        return ttlRemainingFuture;
    }

Scheduled task execution method: method name scheduleExpirationRenewal(final long threadId):

 //定时任务执行方法
    private void scheduleExpirationRenewal(final long threadId) {
        if (expirationRenewalMap.containsKey(getEntryName())) {
            return;
        }

        //这里new了一个TimerTask()定时任务器
        //这里定时任务会推迟执行,推迟的时间是设置的锁过期时间的1/3,
        //很容易就能发现是一开始锁的过期时间默认值30s,具体可见private long lockWatchdogTimeout = 30 * 1000;
        //过期时间单位是秒
        Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
            @Override
            public void run(Timeout timeout) throws Exception {
                
                RFuture<Boolean> future = commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
             //这里又是一个lua脚本
             //这里lua脚本先判断了一下,Redis的key是否存在且设置key的线程id是否是参数ARGV[2]值
                        "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " + 
             //如果这个线程创建的Redis的key即锁仍然存在,那么久给锁的过期时间重新设值为internalLockLeaseTime,也就是初始值30s
                            "redis.call('pexpire', KEYS[1], ARGV[1]); " +
             //Redis的key过期时间重新设置成功后,这里的lua脚本返回的就是1
                            "return 1; " +
                        "end; " +
             //如果主线程已经释放了这个锁,那么这里的lua脚本就会返回0,直接结束“看门狗”的程序
                        "return 0;",
                          Collections.<Object>singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
                
                future.addListener(new FutureListener<Boolean>() {
                    @Override
                    public void operationComplete(Future<Boolean> future) throws Exception {
                        expirationRenewalMap.remove(getEntryName());
                        if (!future.isSuccess()) {
                            log.error("Can't update lock " + getName() + " expiration", future.cause());
                            return;
                        }
                        
                        if (future.getNow()) {
                            // reschedule itself
                            scheduleExpirationRenewal(threadId);
                        }
                    }
                });
            }
        }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);    
        

        if (expirationRenewalMap.putIfAbsent(getEntryName(), task) != null) {
            task.cancel();
        }
    }
 //上面源码分析过了,当加锁成功后tryAcquireAsync()返回的值为null, 那么这个方法的返回值也为null
private Long tryAcquire(long leaseTime, TimeUnit unit, long threadId) {
   return get(tryAcquireAsync(leaseTime, unit, threadId));
}
 public void lockInterruptibly(long leaseTime, TimeUnit unit) throws InterruptedException {
        //获得当前线程id
         long threadId = Thread.currentThread().getId();
         //由上面的源码分析可以得出,当加锁成功后,这个ttl就是null
         //若线程没有加锁成功,那么这里返回的就是这个别的线程加过的锁的剩余的过期时间
        Long ttl = tryAcquire(leaseTime, unit, threadId);
        // lock acquired
         //如果加锁成功后,这个ttl就是null,那么这个方法后续就不需要做任何逻辑
         //若没有加锁成功这里ttl的值不为null,为别的线程加过锁的剩余的过期时间,就会继续往下执行
        if (ttl == null) {
            return;
        }

        RFuture<RedissonLockEntry> future = subscribe(threadId);
        commandExecutor.syncSubscription(future);

        try {
        //若没有加锁成功的线程,会在这里做一个死循环,即自旋
            while (true) {
                //一直死循环尝试加锁,这里又是上面的加锁逻辑了
                ttl = tryAcquire(leaseTime, unit, threadId);
                // lock acquired
                if (ttl == null) {
                    break;
                }
        //这里不会疯狂自旋,这里会判断锁失效之后才会继续进行自旋,这样可以节省一点CPU资源
                // waiting for message
                if (ttl >= 0) {
                    getEntry(threadId).getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } else {
                    getEntry(threadId).getLatch().acquire();
                }
            }
        } finally {
            unsubscribe(future, threadId);
        }
    //        get(lockAsync(leaseTime, unit));
    }

Analysis of the underlying unlocking source code of Redison:

 @Override
    public void unlock() {
        // 调用异步解锁方法
        Boolean opStatus = get(unlockInnerAsync(Thread.currentThread().getId()));
        //当释放锁的线程和已存在锁的线程不是同一个线程,返回null
        if (opStatus == null) {
            throw new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: "
                    + id + " thread-id: " + Thread.currentThread().getId());
        }
        //根据执行lua脚本返回值判断是否取消续命订阅
        if (opStatus) {
            // 取消续命订阅
            cancelExpirationRenewal();
        }
    }
 protected RFuture<Boolean> unlockInnerAsync(long threadId) {
        return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
                //如果锁已经不存在, 发布锁释放的消息,返回1
                "if (redis.call('exists', KEYS[1]) == 0) then " +
                    "redis.call('publish', KEYS[2], ARGV[1]); " +
                    "return 1; " +
                "end;" +
                //如果释放锁的线程和已存在锁的线程不是同一个线程,返回null
                "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +
                    "return nil;" +
                "end; " +
                //当前线程持有锁,用hincrby命令将锁的可重入次数-1,即线程id的value值-1
                "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +
                //若线程id的value值即可重入锁的次数大于0 ,就更新过期时间,返回0
                "if (counter > 0) then " +
                    "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                    "return 0; " +
                //否则证明锁已经释放,删除key并发布锁释放的消息,返回1
                "else " +
                    "redis.call('del', KEYS[1]); " +
                    "redis.call('publish', KEYS[2], ARGV[1]); " +
                    "return 1; "+
                "end; " +
                "return nil;",
                Arrays.<Object>asList(getName(), getChannelName()), LockPubSub.unlockMessage, internalLockLeaseTime, getLockName(threadId));

    }
    // getName()传入KEYS[1],表示传入解锁的keyName
    // getChannelName()传入KEYS[2],表示redis内部的消息订阅channel
    // LockPubSub.unlockMessage传入ARGV[1],表示向其他redis客户端线程发送解锁消息
    // internalLockLeaseTime传入ARGV[2],表示锁的超时时间,默认是30秒
    // getLockName(threadId)传入ARGV[3],表示锁的唯一标识线程id
 void cancelExpirationRenewal() {
        // 将该线程从定时任务中删除
        Timeout task = expirationRenewalMap.remove(getEntryName());
        if (task != null) {
            task.cancel();
        }
    }

If the above situation is a single Redis, it is perfect to use the Redison open source framework to realize the distributed lock of Redis, but the Redis in the production environment is generally a sentinel master-slave architecture, and the master-slave architecture of Redis is different from that of Zookeeper. Slave, the client can only request the Master node of the Redis master-slave architecture, and the Slave node can only do data backup. Redis does not need to synchronize data from the Master to the Slave before it can continue to receive new requests, then there will be a master-slave. Synchronization problem.

When the lock of Redis is successfully set and the business code is being executed, when Redis is synchronizing with the slave server, the Master node of Redis is down, and the lock that Redis has just set has not had time to synchronize to the slave node, then the master-slave sentinel of Redis at this time The mode will re-elect a new Master node, then the new Master node is actually the original Slave node. At this time, the thread that requests later will request this new Master node. However, the new Master node generated after the election is actually a If there is no lock, then the lock will fail.

The distributed lock implemented by the Redis master-slave sentinel architecture is unavoidable in this extreme case, but in general, the probability of such a failure in production is extremely low, even if there are occasional problems, it is acceptable.

If you want to make distributed locks 100% reliable, you can choose Zookeeper as distributed locks, which can perfectly solve this problem. Since the master-slave data synchronization of zk is different from that of Redis master-slave synchronization, the strong consistency of zk makes when the client requests the leader node of zk to lock, when the leader synchronizes the lock to most nodes of the zk cluster, the leader node Only when the client is successfully locked will be returned. At this time, when the leader node is down, zk internally elects a new leader node, then when new customers access the new leader node, the lock will also exist, so the zk cluster can be perfect Solve the above problem of Redis cluster.

Since the design ideas of Redis and Zookeeper are different, any distributed architecture needs to meet the CAP theory, "you can't have both the fish and the bear's paw", either choose AP or choose CP, obviously Redis is AP structure, and zk belongs to CP structure , which also leads to an essential difference in the data synchronization between the two.

In fact, there is a RedLock idea in designing Redis distributed locks, which is to learn from the characteristics of zk to realize distributed locks. This Redis locking method also provides APIs in the Redison framework, and the specific use is also very simple, so I won't go into details here. . Its main idea is shown in the following figure:

This implementation, I don't think it's recommended for production. It's very simple. Originally, only one Redis needs to be locked, and the setting can be returned successfully. However, now it is necessary to lock multiple Redis, and several network IOs have been added invisibly. In case the first Redis is successfully locked, the following Several Redis have similar network exceptions during the locking process. The first Redis data may need to be rolled back. In order to solve a problem with a very low probability of occurrence, several more The new problems that may arise are obviously not worth the gain. And there may be more messy problems here, so I think this implementation of Redis distributed locks is extremely not recommended for production use.

If you really need this kind of distributed lock with strong consistency, why not use the distributed lock implemented by zk directly, the performance is definitely better than that of this RedLock.

3. Distributed lock usage scenarios

Here we focus on the following two usage scenarios of distributed locks:

3.1 Hotspot cache key reconstruction optimization

Under normal circumstances, Internet companies basically use the strategy of "cache" plus expiration time, which not only speeds up data reading and writing, but also ensures regular data updates. This strategy can meet most needs, but there will also be a special case. There will be a problem: there is an unpopular key originally, because of the emergence of a hot news, suddenly the number of requests for this unpopular key has skyrocketed, making it a hot key. At this time, the cache is invalid, and it cannot be used in a very short time. If the cache is reset within a certain time, when the cache is invalid, a large number of threads will access the backend, which will increase the database load, which may cause the application to crash.

For example: "Air Force one" was originally an unpopular key that existed in the cache. Suddenly a star wearing "Air Force one" on Weibo made a hot search, so many fans of the star would come to the app to buy "Air Force one" "One", the "Air Force one" at this time has directly become a hot key, then if the cache of the "Air Force one" key is invalid at this time, there will be a large number of requests to access the db at the same time, which will be given to the later It will cause a lot of pressure on the terminal, and even cause the system to go down.

To solve this problem, only a simple distributed lock can be used to solve this problem. Only one thread is allowed to rebuild the cache, and other threads wait for the thread that rebuilds the cache to finish executing, and then re-fetch data from the cache. See the example pseudocode below:

 public String getCache(String key) {
        //从缓存获取数据
        String value = stringRedisTemplate.opsForValue().get(key);
        //传入Redis的key
        try {
            if (Objects.isNull(value)) {
               //这里只允许一个线程进入,重新设置缓存
                String mutexKey = "lock_key" + key;

                //如果加锁成功
                if (stringRedisTemplate.opsForValue().setIfPresent(mutexKey, "poizon", 30, TimeUnit.SECONDS)) {
                    //从db 获取数据
                    value = mysql.getDataFromMySQL(key);
                    //写回缓存
                    stringRedisTemplate.opsForValue().setIfPresent(key, "poizon", 60, TimeUnit.SECONDS);
                    //释放锁
                    stringRedisTemplate.delete(mutexKey);
                }
                
            } else {
                Thread.sleep(100);
                getCache(key);
            }

        } catch (InterruptedException e) {
            log.error("getCache is error", e);
        }
        return value;
    }

3.2 Solve the problem of inconsistency between cache and database data

If the data cache of the business needs to be strongly consistent with the database, and the concurrency is not very high, you can directly add a distributed read-write lock to directly solve this problem. Distributed read-write locks can be added directly to ensure that concurrent read-write or write-write are queued in order, and the read-write time is equivalent to no lock.

When the amount of concurrency is not very high and the business has strong requirements on the cache and the database, this method is the easiest to implement, and the effect is immediate. If in this scenario, if the binlog is also monitored to delay double deletion through messages to ensure data consistency, the introduction of new middleware increases the complexity of the system, which is not worth the loss.

3.3 Design Theory of Distributed Locks in Ultra-High Concurrency Scenarios

The design idea of ConcurrentHashMap is somewhat similar to that of ConcurrentHashMap, which is implemented by segmented locks. This is the implementation idea I saw on the Internet before. I have not actually used it.

If the inventory of product A is 2,000, now the 2,000 inventory of product A can be expanded to different nodes by using a principle similar to ConcurrentHashMap to use a modulo or hash algorithm to expand the inventory to different nodes, so that the 2,000 The inventory is horizontally expanded to multiple Redis nodes, and then when Redis is requested to take inventory, the request can only fetch data from one Redis, but now data can be fetched from five Redis, which can greatly improve the concurrency efficiency.

Part 4: Summary and Reflections

In summary, Redis distributed locks are not absolutely safe. Redis distributed locks are unavoidable in some extreme cases, but in general, the probability of such failures in production is extremely low, even if there are occasional problems, it is acceptable.

The CAP principle refers to the fact that in a distributed system, the three elements of Consistency, Availability, and Partition tolerance can only achieve at most two points at the same time, and it is impossible to take care of all three. You can’t have both.” Either choose AP or choose CP. Redis is chosen as the component of distributed locks because its single-threaded memory operation is very efficient, and it can maintain good performance in high concurrency scenarios.

If the distributed lock must be 100% reliable, you can choose Zookeeper or MySQL as the distributed lock, which can perfectly solve the problem of lock security, but if you choose consistency, you will lose availability, so Zookeeper or MySQL implements the The performance of distributed locks is far inferior to that of distributed locks implemented by Redis.

Thank you for reading this article. If you have any shortcomings, please point out. We can discuss and learn together and make progress together.

Text/harmony

Pay attention to Dewu Technology and be the most fashionable technical person!


得物技术
846 声望1.5k 粉丝