The interviewer asked me how to implement distributed locks? Fortunately, I summed up a full set of eight-part essays

A middle-aged man with a beer belly, a plaid shirt, and a severely receded hairline, holding a thermos cup in his hand and a MacBook between his arms, is walking towards you. He looks like an architect.

从事程序开发的五年工作状态，吴彦祖秃头变苏大强 - 工作五年之后的状态：吴彦祖秃头变苏大强_吴彦祖_苏大强表情

The interview begins, straight to the point.

Interviewer: Have you ever participated in the design of the seckill system?

Me: No, I usually develop background management systems, OA office systems, and internal management systems, but I have never developed a spike system.

Interviewer: Well..., the guy is very honest. Come here today, and I will contact you when there is news.

Could there be news later? When did you take the initiative to contact me?
The one who told the truth was rejected, but the one who spoke the truth was accepted instead.
Well, wait for me to see how Yideng sums up the eight-legged article of the spike system.

Me: Participated in the seckill system, and was independently responsible for the architecture design of the seckill system ([Dog Head] Yes, I designed them all).

Interviewer: That's right, so that I can ask further questions. When you are designing a seckill system, how do you prevent oversold products? For example, there is only one iPhone in the event, and 100 units are eventually sold, which is definitely not possible, and the platform will lose money.

Me: It must be locked, but due to the large amount of requests in the seckill system, distributed clusters are generally used. However, Java's own Synchronized and ReentrantLock locks can only be used in stand-alone systems. At this time, distributed locks need to be used.

Interviewer: You mentioned distributed locks. What are the functions of distributed locks?

This is where the eight-part essay begins.

Me: I think distributed locks have two main functions:

To ensure the correctness of the data:
For example: prevent overselling of products during seckill, repeated submission of forms, and idempotency of interfaces.

Avoid data duplication:
For example, scheduling tasks are repeated on multiple machines, and all requests to load the database when the cache expires.

To sum up the eight-part essay, it has to be a light.

Interviewer: The guy summed it up quite well. Do you know what characteristics should a distributed lock have?

Me: I think distributed locks should have the following characteristics:

Mutual exclusion: Only one thread can acquire the lock at a time.
Reentrant: When a thread acquires a lock, it can acquire the lock again to avoid deadlocks.
High availability: When a small number of nodes hang up, they can still provide services to the outside world.
High performance: To achieve high concurrency and low latency.
Support blocking and non-blocking: Synchronized is blocking, ReentrantLock.tryLock() is non-blocking and supports fair lock and unfair lock: Synchronized is unfair lock, ReentrantLock (boolean fair) can create fair lock

Interviewer: Boy, there is something. How do you design a distributed lock?

Me: There are several common tools that can implement distributed locks.
For example: relational database (for example: MySQL), distributed database (for example: Redis), distributed coordination service framework (for example: zookeeper)

It is relatively simple to use MySQL to implement distributed locks. Create a table:

 CREATE TABLE `distributed_lock` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主键ID',
  `resource_name` varchar(200) NOT NULL DEFAULT '' COMMENT '资源名称（唯一索引）',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_resource_name` (`resource_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='分布式锁';

When the lock is acquired, a record is inserted. If the insertion is successful, it means that the lock is acquired, and if the insertion fails, it means that the acquisition of the lock fails.

 INSERT INTO distributed_lock (`resource_name`) VALUES ('资源1');

When the lock is released, the record is deleted.

 DELETE FROM distributed_lock WHERE resource_name = '资源1';

The implementation is relatively simple, but it cannot be used in actual production. There are several problems that have not been solved:

This lock does not support blocking , and the insert fails and returns immediately. Of course, you can use a while loop until the insertion is successful, but the spin also consumes the CPU.
This lock is not reentrant, and the thread that has acquired the lock will fail to insert it again. We can add two columns, one to record the nodes and threads that acquired the lock, and the other to record the number of locks. Acquire the lock, increase the number of times by one, release the lock, decrease the number of times by one, and delete the lock when the number of times is zero.
This lock has no expiration time . If business processing fails or the machine is down, resulting in the lock not being released, the lock will always exist, and other threads will not be able to acquire the lock. We can add a list of lock expiration times, and then start an asynchronous task to scan locks whose expiration time is greater than the current time and delete them.

It's so troublesome, let's take a look at what the optimized lock looks like:

 CREATE TABLE `distributed_lock` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主键ID',
  `resource_name` varchar(200) NOT NULL DEFAULT '' COMMENT '资源名称（唯一索引）',
  `owner` varchar(200) NOT NULL DEFAULT '' COMMENT '锁持有者（机器码+线程名称）',
  `lock_count` int NOT NULL DEFAULT '0' COMMENT '加锁次数',
  `expire_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '锁过期时间',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_resource_name` (`resource_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='分布式锁';

This should be perfect, right? No, there is another question:

What if the business logic is not processed and the lock expires?

If we set the lock expiration time to 6 seconds, the business logic can be processed within 6 seconds under normal circumstances, but when the JVM has a FullGC or a network delay occurs when calling a third-party service, the business logic has not been processed yet, and the lock has expired and is deleted. Is it going to be a problem?

This introduces another knowledge point "lock renewal":

While acquiring the lock, start an asynchronous task. Whenever the business is executed for one-third of the time, that is, the second second of 6 seconds, the lock expiration time is automatically extended to 6 seconds, which ensures that Locks do not expire until business logic processing is complete.

Interviewer: Young man, distributed lock can be regarded as a play for you to understand. I would like to continue to ask that MySQL is rarely used for distributed locks in production, because MySQL's concurrency performance cannot keep up. I just mentioned that Redis can also implement distributed locks. Do you know how to implement it?

Of course I know that a full set of eight-part essays should be memorized.

Me: Using Redis to implement distributed locks is similar to using MySQL. It also needs to solve various problems encountered in the implementation process, but the solutions are slightly different.

The easiest way to acquire a lock:

 // 1. 获取锁
redis.setnx('resource_name1', 'owner1')
// 2. 释放锁
redis.del('resource_name1')

When "resource_name1" does not exist, the set is successful, that is, the lock is acquired successfully.

However, an expiration time needs to be added to prevent the lock from not being released.

 // 1. 获取锁
redis.setnx('resource_name1', 'owner1')
// 2. 增加锁过期时间
redis.exprire('resource_name1', 6, TimeUnit.SECONDS)

A new problem has been introduced. The two commands are not atomic. After acquiring the lock, it may crash before setting the expiration time. What should I do?

Easy to handle, after Redis 2.6.12, a compound command is provided:

 redis.set('resource_name1', 'owner1',"NX" "EX", 6)

There is also a problem. When the lock is released, the holder of the lock is not judged. It is possible to release the lock held by other threads. This is not possible. You can do this:

 // 释放锁
if ('owner1'.equals(redis.get('resource_name1'))){
    redis.del('resource_name1')
}

Does this work? Not yet, because the two commands get and del are not atomic operations, you need to introduce a Lua script to package the two commands into one and send them to Redis for execution:

 String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end";
redis.eval(script, Collections.singletonList('resource_name1'), Collections.singletonList('owner1'))

Does this work? Not yet, there is still a "lock renewal" problem that has not been resolved.

Even simpler, the Redis client Redisson has helped us implement the renewal function, called "WatchDog" (watchdog), which automatically wakes up the "watchdog" when we call lock.

Interviewer: Boy, you are really good. Can you talk about how to implement distributed locks using zookeeper?

Me: zookeeper uses tree nodes, similar to the Linux directory file structure, and the node names in the same directory cannot be repeated.

There are four types of nodes:

Persistent Node: Once created, it is permanently stored on the server unless manually deleted.
Temporary node: The life cycle is bound to the client. When the client disconnects, the node is automatically deleted.
Persistent sequential node: The characteristics are the same as the persistent node, except that the self-incrementing sequential number is appended to the node name.
Temporary sequence node: The characteristics are the same as the temporary node, except that the self-incrementing sequence number is appended to the node name.

Zookeeper also has a listening-notification mechanism, where clients can create watch events on resource nodes. When the node changes, it will notify the client, and the client can do corresponding business processing according to the change.

We can use the characteristics of temporary sequential nodes to create distributed locks in the following three steps:

Create a temporary sequential node node in the resource/resource1 directory
Get all nodes in the /resource1 directory, if the current node number is the smallest, it means the lock is successful
If not, the watch is monitoring the node with the smallest sequence number

The implementation logic is very simple, let's analyze the advantages of zookeeper to implement distributed locks:

Since the created temporary node is automatically deleted after disconnection, there is no need to set the lock timeout time, and there is no need to consider non-release and lock renewal.
Locks also support reentrancy due to the creator information stored on the node
Since the node can be monitored, it can be blocked

Interviewer: Young man, the opportunity for promotion and salary increase is reserved for people like you. Double salary, come to work tomorrow.

Summarize:

Although there are many knowledge points about distributed locks, they have been summarized in this picture. Welcome to like, collect, forward and comment.

分布式锁.png

The interviewer asked me how to implement distributed locks? Fortunately, I summed up a full set of eight-part essays

一灯架构

引用和评论

三道MySQL联合索引面试题，淘汰80%的面试者，你能答对几道

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性