In the sharing of go-zero's distributed caching system, Kevin focused on the principle of consistent hashing and the practice in distributed caching. This article will talk about the principle of consistent hash and its implementation in go-zero in detail.
Take storage as an example. In the entire microservice system, our storage cannot be said to be just a single node.
- One is to improve stability. When a single node is down, the entire storage will face service unavailability;
- The second is data fault tolerance. Similarly, the data of a single node is physically damaged. In the case of multiple nodes, the nodes have backups, unless the nodes that are backing up each other are damaged at the same time.
So the question is, in the case of multiple nodes, which node should the data be written to?
hash
So in essence: we need a and convert it into a smaller value. This value is usually unique and extremely compact in format, such as uint64 :
- Idempotence: Every time you use the same value to calculate the hash, you must ensure that you can get the same value
This is done by the hash
But the common hash
algorithm is used for routing, such as: key % N
. One node exits the cluster due to an abnormality or has an abnormal heartbeat. At this time, perform hash route
, which will cause a large amount of data to be re-distributed to different nodes by When a node accepts a new request, it needs to reprocess the logic of obtaining data: if it is in the cache, it is easy to cause the cache avalanche .
At this time, the consistent hash
algorithm needs to be introduced.
consistent hash
Let's take a look at how consistent hash
solves these problems:
rehash
Solve a large number of rehash
problems first:
As shown in the figure above, when a new node is added, the affected key is only key31
. After the new node is added (removed), it will only affect the data near the node. The data of other nodes will not be affected, thus solving the problem of node changes.
This is exactly: monotonicity. This is also normal hash
algorithm cannot satisfy distributed scenarios.
Data skew
In fact, it can be seen from the above figure: Currently, most of the keys are concentrated on node 1
. If when the number of nodes is relatively small, most keys can be concentrated on a certain node
, the problem found during monitoring is: uneven load between nodes.
To solve this problem, consistent hash
introduced the concept virtual node
Since the load is uneven, we artificially construct a balanced scene, but there are only so many actual nodes. So use virtual node
divide the area, and the actual service node is still the previous node.
Implementation
Let's take a look at Get()
first:
Get
Let me talk about the principle of realization:
- Calculate the hash of
key
- Find the index of the first matching
virtual node
, and get the correspondingh.keys[index]
: virtual node hash value - Correspond to this
ring
to find a matchingactual node
In fact, we can see that ring
we get from []node
is a 060f92c961ade4. This is because when calculating virtual node hash
, hash conflicts may occur, and different virtual node hash
correspond to an actual node.
This also shows: node
and virtual node
are a one-to-many relationship. And the ring
inside is the following design:
This actually shows the allocation strategy of consistent hash:
virtual node
is divided as the value range.key
to obtainnode
, based on the division based onvirtual node
as the boundaryvirtual node
throughhash
ensures that the keys allocated by different nodes are roughly uniform in the corresponding relationship. That is break the binding- Adding a new node will allocate multiple
virtual node
correspondingly. The new node can load the pressure of multiple original nodes. From a global perspective, it is easier to achieve load balancing during expansion.
Add Node
After reading Get
, you can roughly know the design of the entire consistent hash:
type ConsistentHash struct {
hashFunc Func // hash 函数
replicas int // 虚拟节点放大因子
keys []uint64 // 存储虚拟节点hash
ring map[uint64][]interface{} // 虚拟节点与实际node的对应关系
nodes map[string]lang.PlaceholderType // 实际节点存储【便于快速查找,所以使用map】
lock sync.RWMutex
}
Well, in this way, a basic consistent hash is complete.
Specific code: https://github.com/tal-tech/go-zero/blob/master/core/hash/consistenthash.go
scenes to be used
As I said at the beginning, consistent hash can be widely used in distributed systems:
- Distributed cache. May
redis cluster
constructed on such a storage systemcache proxy
, freedom to control the routing. And this routing rule can use the consistent hash algorithm - Service discovery
- Distributed scheduling tasks
All of the above distributed systems can be used in load balancing modules.
project address
https://github.com/tal-tech/go-zero
Welcome to use go-zero and star support us!
WeChat Exchange Group
Follow the " practice " public exchange group get the community group QR code.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。