Abstract: In a single-process system, when there are multiple threads that can change a variable at the same time, it is necessary to synchronize the variable or code block so that it can be executed linearly when modifying this variable to eliminate concurrent modification of the variable. The synchronization is essentially achieved through locks.

This article is shared from the HUAWEI cloud community " will not use distributed lock? based on etcd from scratch", original author: aoho.

Why do we need distributed locks?

In a single-process system, when there are multiple threads that can change a variable at the same time, it is necessary to synchronize the variable or code block so that it can be executed linearly to eliminate concurrent modification of the variable when modifying this variable. The synchronization is essentially achieved through locks. In order to realize that the same code block can only be executed by one thread at a time by multiple threads, a mark must be made somewhere. This mark must be visible to every thread. The mark can be set when the mark does not exist. , The remaining subsequent threads find that there is a mark, and then wait for the thread with the mark to end the synchronization code block and cancel the mark before trying to set the mark.

In a distributed environment, data consistency has always been a difficult point. Compared to a single process, the situation in a distributed environment is more complicated. The biggest difference between distributed and stand-alone environments is that they are not multi-threaded but multi-process. Since multiple threads can share heap memory, they can simply use memory as the tag storage location. The processes may not even be on the same physical machine, so the tag needs to be stored in a place that all processes can see.

The most common is the spike scenario, where multiple service instances are deployed for the order service. For example, there are 4 spike products, the first user buys 3, and the second user buys 2. Ideally, the first user can purchase successfully, and the second user prompts that the purchase failed, and vice versa. The actual situation that may occur is that both users get an inventory of 4, and the first user bought 3 of them. Before updating the inventory, the second user placed an order for 2 products and updated the inventory to 2, resulting in business logic Something went wrong.

In the above scenario, the inventory of the product is a shared variable. In the face of high concurrency, it is necessary to ensure that access to resources is mutually exclusive. In a stand-alone environment, such as the Java language, there are actually many APIs related to concurrent processing, but these APIs are powerless in a distributed scenario, because the distributed system has the characteristics of multi-threading and multi-process, and is distributed in different machines , Synchronized and lock keywords will lose the effect of the original lock. Relying on the APIs provided by these languages alone cannot realize the function of distributed locks, so we need to think of other ways to realize distributed locks.

Common lock schemes are as follows:

  • Realize distributed lock based on database
  • Implement distributed lock based on Zookeeper
  • Realize distributed locks based on cache, such as redis, etcd, etc.

Below we briefly introduce the implementation of these types of locks, and focus on how etcd implement locks.

Database-based lock

There are also two ways to implement database-based locks, one is based on database tables, and the other is based on database exclusive locks.

Additions and deletions based on database tables

Adding and deleting based on database tables is the simplest way. First, create a locked table that mainly contains the following fields: method name, timestamp and other fields.

The specific method used is: when a method needs to be locked, a related record is inserted into the table. It should be noted that the method name has a unique constraint. If multiple requests are submitted to the database at the same time, the database will ensure that only one operation can succeed, then we can think that the thread that succeeded in the operation has obtained the lock of the method and can execute the method body content. After the execution is complete, the record needs to be deleted.

The above scheme can be optimized, such as the application of a master-slave database, and two-way synchronization of data. Once the main library hangs up, quickly switch the application service to the slave library. In addition, you can also record the host information and thread information of the machine that currently acquires the lock. Then, when you acquire the lock next time, query the database first. If the host information and thread information of the current machine can be found in the database, you can directly lock Assign to this thread to realize reentrant locks.

Exclusive lock based on database

We can also implement distributed locks through exclusive locks in the database. Based on Mysql's InnoDB engine, you can use the following methods to implement locking operations:

public void lock(){
    connection.setAutoCommit(false)
    int count = 0;
    while(count < 4){
        try{
            select * from lock where lock_name=xxx for update;
            if(结果不为空){
                //代表获取到锁
                return;
            }
        }catch(Exception e){

        }
        //为空或者抛异常的话都表示没有获取到锁
        sleep(1000);
        count++;
    }
    throw new LockException();
}

Add for update after the query statement, and the database will add an exclusive lock to the database table during the query. When an exclusive lock is added to a record, other threads can no longer add an exclusive lock on the row of records. Others that have not acquired the lock will block on the above select statement. There are two possible results. The lock is acquired before the timeout, and the lock is not acquired before the timeout.

The thread that obtains the exclusive lock can obtain the distributed lock. After the lock is obtained, the business logic can be executed, and the lock is released after the execution of the business.

Summary based on database locks

The above two methods rely on a table in the database. One is to determine whether there is a lock currently exists through the existence of records in the table, and the other is to implement distributed locks through exclusive locks in the database. The advantage is that it is simple and easy to understand by directly relying on the existing relational database; the disadvantage is that operating the database requires a certain amount of overhead, and performance issues and SQL execution timeout exceptions need to be considered.

Based on Zookeeper

Distributed locks can be realized based on Zookeeper's temporary node and sequence characteristics.

When applying to lock a method, generate a unique temporary ordered node in the directory of the specified node corresponding to the method on Zookeeper. When a lock needs to be acquired, it is only necessary to determine whether the node in the ordered node is the one with the smallest sequence number. After the business logic is executed to release the lock, you only need to delete the temporary node. This approach can also avoid deadlock problems caused by locks that cannot be released due to service downtime.

Netflix has open sourced a set of Zookeeper client framework curator, you can go and see how to use it yourself. InterProcessMutex provided by Curator is an implementation of distributed locks. The acquire method acquires the lock, and the release method releases the lock. In addition, problems such as lock release, blocking lock, and reentrant lock can all be effectively solved.

Regarding the implementation of the blocking lock, the client can create a sequence node in Zookeeper and bind the watcher Watch on the node. Once the node changes, Zookeeper will notify the client, and the client can check whether the node created by it has the smallest serial number among all the current nodes. If it is, the lock is acquired and the business logic can be executed.

The distributed lock implemented by Zookeeper also has some shortcomings. The performance may not be as good as the distributed lock based on cache. Because each time in the process of creating and releasing locks, instantaneous nodes must be dynamically created and destroyed to realize the lock function.

In addition, the creation and deletion of nodes in Zookeeper can only be performed through the Leader node, and then the data is synchronized to other nodes in the cluster. In a distributed environment, it is inevitable that there is network jitter, which causes the session connection between the client and the Zookeeper cluster to be interrupted. At this time, the Zookeeper server thinks that the client is down and deletes the temporary node. Other clients can acquire distributed locks, which leads to the inconsistency of acquiring locks at the same time.

Realize distributed lock based on cache

Compared with the solution based on the database to realize the distributed lock, the implementation based on the cache will perform better in terms of performance, and the access speed is much faster. And many caches can be deployed in clusters, which can solve single-point problems. There are several types of cache-based locks, such as memcached and redis. The following article mainly explains how to implement distributed locks based on etcd.

Implement distributed locks through etcd txn

Implementing distributed locks through etcd also needs to meet the requirements of consistency, mutual exclusion, and reliability. The transaction txn, lease lease, and watch monitoring features in etcd can enable the distributed lock based on etcd to achieve the above requirements.

Thinking analysis

The transaction features of etcd can help us achieve consistency and mutual exclusion. The transaction characteristics of etcd, the IF-Then-Else statement used, the IF language determines whether the specified key exists on the etcd server, that is, whether the key creation version number create_revision is 0 to check whether the key already exists, because if the key already exists, Its create_revision version number is not 0. If the IF condition is met, then use the put operation to execute, otherwise the else statement returns the result of the failure to grab the lock. Of course, in addition to using whether the key is successfully created as the basis for judging IF, you can also create keys with the same prefix and compare the revisions of these keys to determine which request the distributed lock should belong to.

After the client requests to obtain the distributed lock, if an exception occurs, the lock needs to be released in time. Therefore, a lease is required. When we apply for a distributed lock, we need to specify the lease time. The lock will be automatically released if the lease period expires, ensuring business availability. Is this enough? When executing business logic, if the client initiates a time-consuming operation and the operation is not completed, the lease time expires, causing other requests to obtain distributed locks, causing inconsistencies. In this case, you need to renew the lease, that is, refresh the lease, so that the client can maintain a heartbeat with the etcd server.

Implementation

Based on the above analysis, we draw a flow chart for implementing etcd distributed locks, as shown below:
image.png

Based on the etcd distributed lock implemented by the Go language, the test code is as follows:

func TestLock(t *testing.T) {
    // 客户端配置
    config = clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 5 * time.Second,
    }
    // 建立连接
    if client, err = clientv3.New(config); err != nil {
        fmt.Println(err)
        return
    }

    // 1. 上锁并创建租约
    lease = clientv3.NewLease(client)

    if leaseGrantResp, err = lease.Grant(context.TODO(), 5); err != nil {
        panic(err)
    }
    leaseId = leaseGrantResp.ID

    // 2 自动续约
    // 创建一个可取消的租约,主要是为了退出的时候能够释放
    ctx, cancelFunc = context.WithCancel(context.TODO())

    // 3. 释放租约
    defer cancelFunc()
    defer lease.Revoke(context.TODO(), leaseId)

    if keepRespChan, err = lease.KeepAlive(ctx, leaseId); err != nil {
        panic(err)
    }
    // 续约应答
    go func() {
        for {
            select {
            case keepResp = <-keepRespChan:
                if keepRespChan == nil {
                    fmt.Println("租约已经失效了")
                    goto END
                } else { // 每秒会续租一次, 所以就会受到一次应答
                    fmt.Println("收到自动续租应答:", keepResp.ID)
                }
            }
        }
    END:
    }()

    // 1.3 在租约时间内去抢锁(etcd 里面的锁就是一个 key)
    kv = clientv3.NewKV(client)

    // 创建事务
    txn = kv.Txn(context.TODO())

    //if 不存在 key,then 设置它,else 抢锁失败
    txn.If(clientv3.Compare(clientv3.CreateRevision("lock"), "=", 0)).
        Then(clientv3.OpPut("lock", "g", clientv3.WithLease(leaseId))).
        Else(clientv3.OpGet("lock"))

    // 提交事务
    if txnResp, err = txn.Commit(); err != nil {
        panic(err)
    }

    if !txnResp.Succeeded {
        fmt.Println("锁被占用:", string(txnResp.Responses[0].GetResponseRange().Kvs[0].Value))
        return
    }

    // 抢到锁后执行业务逻辑,没有抢到退出
    fmt.Println("处理任务")
    time.Sleep(5 * time.Second)

}

The expected execution results are as follows:

=== RUN   TestLock
处理任务
收到自动续租应答: 7587848943239472601
收到自动续租应答: 7587848943239472601
收到自动续租应答: 7587848943239472601
--- PASS: TestLock (5.10s)
PASS

In general, the implementation process of etcd distributed locks as above is divided into four steps:

  • Client initialization and connection establishment;
  • Create a lease and automatically renew it;
  • Create a transaction and acquire a lock;
  • Execute business logic, and finally release the lock.

When creating a lease, you need to create a cancelable lease, mainly to release it when you withdraw. The steps corresponding to releasing the lock are in the defer statement above. When the defer lease is closed, the key corresponding to the distributed lock will be released.

summary

This article mainly introduces the case of implementing distributed locks based on etcd. First, the background and necessity of distributed locks are introduced. Distributed architecture is different from monolithic architecture. It involves the call of multiple instances between multiple services. There is no way to use the concurrency primitives of the programming language in the case of cross-process. To achieve data consistency, distributed locks appear to solve mutually exclusive operations of resources in a distributed environment. Then it introduces two ways to implement distributed locks based on databases: data table additions and deletions and database exclusive locks. Distributed locks can also be implemented based on Zookeeper's temporary node and sequence characteristics. These two methods have more or less performance and stability defects.

Then this article focuses on the implementation of distributed locks based on etcd. According to the characteristics of etcd, the use of transaction txn, lease lease and watch monitoring to achieve distributed locks.

In our case above, if the lock grab fails, the client returns directly. So when the lock is released, or the client holding the lock fails and exits, how can other locks quickly acquire the lock? Therefore, the above code can be improved based on the watch monitoring feature, and you can try it yourself.

Click to follow, and get to know the fresh technology of Huawei Cloud for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量