12

What is a distributed lock

Distributed lock is actually the realization of a lock that controls different processes of a distributed system to access shared resources together. If a certain critical resource is shared between different systems or different hosts of the same system, mutual exclusion is often required to prevent mutual interference and to ensure consistency.

Points to note about distributed locks

1) Mutual exclusion

Only one client can acquire the lock at any time

2) Anti-deadlock

If a client crashes while holding the lock and does not release the lock, then other clients cannot obtain the lock, which will cause a deadlock, so make sure that the client will release the lock. In Redis, we can set the expiration time of the lock to ensure that no deadlock occurs.

3) Unlock by the lock holder

The person who unlocks the ring must be the person who locks the ring. The lock and unlock must be the same client, the lock added by the thread of client A must be unlocked by the thread of client A, and the client cannot unlock the locks of other clients.

4) Reentrant

After a client acquires the object lock, the client can acquire the lock on the object again.

redis distributed lock

achieve

Redis locks mainly use Redis's setnx command.

  • Locking command: SETNX key value, when the key does not exist, set the key and return success, otherwise it returns failure. KEY is the unique identifier of the lock, and it is generally named according to the business.
  • Unlock command: DEL key, release the lock by deleting the key-value pair, so that other threads can acquire the lock through the SETNX command.
  • Lock timeout: EXPIRE key timeout, set the key timeout to ensure that even if the lock is not explicitly released, the lock can be automatically released after a certain period of time to prevent resources from being locked forever.

The pseudo code for locking and unlocking is as follows:

if (jedis.setnx(key, 1) == 1){
    jedis.expire(key, 30)
    try {
        //TODO 业务逻辑
    } finally {
        jedis.del(key)
    }
}

be careful

SETNX and EXPIRE non-atomic

If SETNX succeeds, after setting the lock timeout time, the server hangs up, restarts or network problems, etc., resulting in the EXPIRE command not being executed, and the lock without setting the timeout time becomes a deadlock.

Solution
1. SETNX + value is (system time + expiration time)

You can put the expiration time in the value of setnx If the lock fails, just take out the value to verify it. The lock code is as follows

long expires = System.currentTimeMillis() + expireTime; //系统时间+设置的过期时间
String expiresStr = String.valueOf(expires);

// 如果当前锁不存在,返回加锁成功
if (jedis.setnx(key_resource_id, expiresStr) == 1) {
        return true;
} 
// 如果锁已经存在,获取锁的过期时间
String currentValueStr = jedis.get(key_resource_id);

// 如果获取到的过期时间,小于系统当前时间,表示已经过期
if (currentValueStr != null && Long.parseLong(currentValueStr) < System.currentTimeMillis()) {

     // 锁已过期,获取上一个锁的过期时间,并设置现在锁的过期时间(不了解redis的getSet命令的小伙伴,可以去官网看下哈)
    String oldValueStr = jedis.getSet(key_resource_id, expiresStr);
    
    if (oldValueStr != null && oldValueStr.equals(currentValueStr)) {
         // 考虑多线程并发的情况,只有一个线程的设置值和当前值相同,它才可以加锁
         return true;
    }
}
        
//其他情况,均返回加锁失败
return false;
}

The advantage of this solution is that it cleverly removes expire to set the expiration time separately, and puts the "expiration time in the value of " 1607baff65324a. It solves the problem that the lock cannot be released when an exception occurs in the first solution. But this scheme has other shortcomings:

  • The expiration time is generated by the client itself (System.currentTimeMillis() is the time of the current system). In a distributed environment, the time of each client must be synchronized.
  • If when the lock expires, multiple clients request it at the same time and execute jedis.getSet(). In the end, only one client can successfully lock, but the expiration time of the client lock may be overwritten by other clients.
  • The lock does not save the unique identification of the holder, and may be released/unlocked by other clients.
2. SET extended command (SET EX PX NX), redis version 2.6.12 or later

SET key valueEX seconds[NX|XX]

  • NX: Indicates that the set can only be successful when the key does not exist, that is, it is guaranteed that only the first client request can obtain the lock, and other client requests can only wait for it to release the lock before it can be obtained.
  • EX seconds: Set the expiration time of the key, the time unit is seconds.
  • PX milliseconds: Set the expiration time of the key in milliseconds
  • XX: Set the value only when the key exists
if(jedis.set(key_resource_id, lock_value, "NX", "EX", 100s) == 1){ //加锁
    try {
        do something  //业务处理
    }catch(){
  }
  finally {
       jedis.del(key_resource_id); //释放锁
    }
}
3. Lua script, examples are as follows
if (redis.call('setnx', KEYS[1], ARGV[1]) < 1)
then return 0;
end;
redis.call('expire', KEYS[1], tonumber(ARGV[2]));
return 1;

// 使用实例
EVAL "if (redis.call('setnx',KEYS[1],ARGV[1]) < 1) then return 0; end; redis.call('expire',KEYS[1],tonumber(ARGV[2])); return 1;" 1 key value 100

Error removal

If thread A successfully acquires the lock and sets an expiration time of 30 seconds, but the execution time of thread A exceeds 30 seconds, the lock will be automatically released after expiration. At this time, thread B acquires the lock; subsequently, the execution of thread A is completed, and thread A uses the DEL command to The lock is released, but at this time the lock added by thread B has not been executed yet, and the lock added by thread B actually released by thread A.

Solution
1. SET EX PX NX + verify the unique random value, then delete

Since the lock may be deleted by other threads by mistake, let's set a random number for the value to mark the current thread. When deleting, check it and it will be OK. The pseudo code is as follows:

if(jedis.set(key_resource_id, uni_request_id, "NX", "EX", 100s) == 1){ //加锁
    try {
        do something  //业务处理
    }catch(){
  }
  finally {
       //判断是不是当前线程加的锁,是才释放
       if (uni_request_id.equals(jedis.get(key_resource_id))) {
        jedis.del(lockKey); //释放锁
        }
    }
}

Here, "Judging whether it is the lock added by the current thread" and "Release the lock" is not an atomic operation. If you call jedis.del() to release the lock, the lock may no longer belong to the current client, and the lock added by others will be released. (In the case of low concurrency, the probability of occurrence is extremely small)

In order to be more rigorous, lua scripts are generally used instead. The lua script is as follows:

if redis.call('get',KEYS[1]) == ARGV[1] then 
   return redis.call('del',KEYS[1]) 
else
   return 0
end;

Timeout unlocking causes concurrency

If thread A successfully acquires the lock and sets an expiration time of 30 seconds, but the execution time of thread A exceeds 30 seconds, the lock will be automatically released after expiration. At this time, thread B acquires the lock, and thread A and thread B execute concurrently.

Concurrency of the two threads A and B is obviously not allowed. Generally, there are two ways to solve this problem:

  1. Set the expiration time long enough to ensure that the code logic can be executed before the lock is released.
  2. Increase the daemon thread for the thread that acquires the lock, and increase the effective time for the lock that is about to expire but not released.

The current open source framework Redisson solves this problem. As soon as the thread locks successfully, it will start a watch dog watchdog. It is a background thread and will check every 10 seconds. If thread 1 still holds the lock, it will continue to extend the lifetime of the lock key. Therefore, Redisson uses watch dog to solve the "lock expired release, the business has not been executed" problem.

Bottom schematic diagram:

Non-reentrant

When a thread requests a lock again while holding a lock, if a lock supports multiple locks by a thread, then the lock is reentrant. If a non-reentrant lock is locked again, because the lock has already been held, locking again will fail. Redis can re-enter the lock by counting, adding 1 when locking the lock, subtracting 1 when unlocking, and releasing the lock when the count returns to 0.

Active/standby switch

In order to ensure the availability of Redis, a master-slave deployment is generally adopted. There are two ways of master-slave data synchronization: asynchronous and synchronous. Redis records the instructions in the local memory buffer, and then asynchronously synchronizes the instructions in the buffer to the slave node. The slave node executes the synchronous instruction stream to reach the state consistent with the master node. , While feeding back the synchronization status to the master node.

In the cluster deployment method that includes the master-slave mode, when the master node goes down, the slave node will take its place, but the client has no obvious perception. When client A successfully locks and the instructions are not synchronized yet, the master node hangs up and the slave node is promoted to master. The new master node has no locked data. When client B locks, it will succeed.

In order to solve this problem, Redis author antirez proposed an advanced distributed lock algorithm: Redlock. The core idea of Redlock is this:

Deploy multiple Redis masters to ensure that they will not be down at the same time. And these master nodes are completely independent of each other, and there is no data synchronization between each other. At the same time, you need to ensure that on these multiple master instances, the same method is used to acquire and release locks as in the Redis single instance.

I won’t go into details here. Redisson implements the redLock version of the lock. If you are interested, you can check it out.

Cluster split brain

Split-brain cluster refers to the fact that the Redis master node and the slave node and sentinel cluster are in different network partitions due to network problems. Because the sentinel cluster cannot perceive the existence of the master, the slave node is promoted to the master node. At this time, there are two different master node. Redis Cluster is deployed in the same way.

When different clients connect to different master nodes, two clients can have the same lock at the same time. as follows:

in conclusion

Redis is known for its high performance, but there are still some difficulties in using it to implement distributed locks to solve concurrency. Redis distributed locks can only be used as a means to alleviate concurrency. If you want to completely solve the concurrency problem, you still need the anti-concurrency means of the database.

Reference documents


skyarthur
1.6k 声望1.3k 粉丝

技术支持业务,技术增强业务,技术驱动业务