Why use Redis cluster?

Interviewer : talk about Redis sharded cluster. Let’s talk about Redis Cluster first?

Interviewer : Redis Cluser is the official cluster solution only available in Redis 3.x. How much do you know about this?

candidate : Well, how about starting from the basics?

candidate : When talking about Redis earlier, the Redis mentioned is a "single instance" to store all data.

candidate : 1. The architecture that achieves read-write separation in master-slave mode allows multiple slave servers to carry "read traffic", but when faced with "write traffic", only the master server is always fighting.

candidate : 2. "Vertical expansion" upgrades the hardware capabilities of the Redis server, but it is not cost-effective to upgrade to a certain level.

Candidate : Vertical expansion means "large memory", and the "cost" of Redis persistence will increase (Redis does RDB persistence, which is full. When fork subprocesses may use too much memory, it may cause the main Thread blocking time is too long)

candidate : Therefore, "single instance" has a bottleneck

Candidate : "Vertical expansion" does not work, just "horizontal expansion".

candidate : Use multiple Redis instances to form a cluster, and "distribute" data to different Redis instances according to certain rules. When the data of all Redis instances in the cluster is added up, this data is complete

Candidate : In fact, it is the concept of "distributed" (: But, in Redis, it seems that there are more people called "sharded clusters"?

candidate : I know from the front that if you want "distributed storage", you must "distribute" data (also meaning routing)

candidate : Let’s start with Redis Cluster, its "routing" is done on the client side (the SDK has integrated the routing and forwarding function)

candidate : Redis Cluster's data distribution logic involves the concept of "Hash Solt"

candidate : Redis Cluster has 16384 hash slots by default in a cluster, and these hash slots will be allocated to different Redis instances

Candidate : As for how to "divide", you can directly share it, or you can "manually" set the hash slot of each Redis instance, it's all up to us

Candidate : The important thing is that we have to divide up all 16,384, and there can be no surplus!

candidate : When the client has data to write, it first calculates the 16bit value of the key according to the CRC16 algorithm (it can be understood as a hash), and then the obtained value is modulo 16384

candidate : After taking the modulus, one of the hash slots is naturally obtained, and then the data can be inserted into the Redis instance assigned to the hash slot

Interviewer : The problem is here. Now the client has calculated the location of the hash slot through the hash algorithm, then how does the client know which Redis instance the hash slot is on?

candidate : Yes, each Redis instance in the cluster will "propagate" which hash slots it is responsible for to other instances. In this way, each Redis instance can record the relationship of "all hash slots and instances" (:

candidate : With this mapping relationship, the client will also "cache" a copy to its local, so the client will naturally know which Redis instance to operate on

Interviewer : Then I have another problem. You can also add or delete Redis instances in the cluster. How about this?

candidate : When the cluster deletes or adds a Redis instance, there will always be a change in the hash slot relationship of a certain Redis instance

candidate : the changed information will be sent to the entire cluster through a message, all Redis instances will know the change, and then update their saved mapping relationship

candidate : At this time, the client is actually not aware of it (:

candidate : Therefore, when the client requests a key, it will still request the "original" Redis instance. The original Redis instance will return the "moved" command, telling the client that it should go to the new Redis instance to request it.

candidate : After the client receives the "moved" command, it knows to request a new Redis instance, and updates the "mapping relationship between cache hash slots and instances"

candidate : To sum it up: After the data migration is completed, the client will receive the "moved" command and update the local cache

Interviewer : The data has not been completely migrated yet?

candidate : If the data has not been completely migrated, then the client "ask" command will be returned at this time. It also allows the client to request a new Redis instance, but the client will not update the local cache at this time

Interviewer : I got it

Interviewer : To put it bluntly, if there is a change in the cluster Redis instance, there will be "communication" between the Redis instances

interviewer : So when the client requests, the Redis instance will always know which Redis instance the data requested by the client is on

interviewer : If the migration has been completed, return the "move" command to tell the client which Redis instance to find for data, and the client should update its own cache (mapping relationship)

Interviewer : If you are migrating, return the "ack" command to tell the client which Redis instance to find for data

Candidate : As expected of you...

Interviewer : Do you know why there are 16384 hash slots?

candidate : Well, this. This is the case, the "communication" between Redis instances will exchange "slot information" with each other. If there are too many slots (meaning that the network packet will become larger) and the network packet will become larger, does it mean that it will be "overoccupied" Network bandwidth

candidate : Another piece is that the Redis author believes that the cluster will not exceed 1000 instances under normal circumstances

candidate : Then I took 16,384, which can reasonably disperse the data to different instances in the Redis cluster without causing excessive bandwidth usage when exchanging data

Interviewer : I got it

Interviewer : Then do you know why partitioning data uses "hash slot" in Redis? Instead of the consistent hash algorithm

Candidate : In my understanding, the consistent hashing algorithm is a "hash ring". When the client requests it, it will hash the Key, determine the position on the hash ring, and then go back clockwise. Find, the first real node found

candidate : The advantage of consistent hashing algorithm over "traditional fixed modulus" is that if an instance needs to be added or deleted in the cluster, only a small part of the data will be affected

candidate : But if you add or delete an instance in the cluster, under the consistent hash algorithm, you have to know which part of the data is affected, and you need to migrate the affected data

Interviewer : Well...

candidate : The hash slot method, we can already find through the above: every instance in the cluster can get slot-related information

candidate : After the client hashes the key, if it finds that the requested instance has no relevant data, the instance will return a "redirect" command to tell the client where to request

candidate : The expansion and shrinkage of the cluster are all based on the "hash slot" as the basic unit. In general, the "implementation" will be simpler (concise, efficient, and flexible). The process is probably to reallocate part of the slots, and then migrate the data in the slots, without affecting all the data of an instance in the cluster.

Interviewer : Do you know the general principle of "server-side routing"?

candidate 1619ee01b06c5d: Well, server-side routing generally refers to that there is a proxy layer that specifically

Candidate : I mentioned it in the last interview. Codis is more popular now.

candidate : The biggest difference between it and Redis Cluster is that Redis Cluster is directly connected to the Redis instance, while Codis connects the client directly to the Proxy, which is then distributed to different Redis instances for processing by the Proxy

candidate : The key routing scheme in Codis is very similar to Redis Cluster. Codis initializes 1024 hash slots and then distributes them to different Redis servers

candidate : The mapping relationship between the hash slot and the Redis instance is stored and managed by Zookeeper, and the Proxy will get the latest mapping relationship through Codis DashBoard and cache it locally

Interviewer : What is the process if I want to expand the Codis Redis instance?

candidate : Simply put: add a new Redis instance to the cluster, and then migrate part of the data to the new instance

candidate : The general process is: 1. Part of the data of a certain Solt in the "original instance" is sent to the "target instance". 2. After the "target instance" receives the data, it returns an ack to the "original instance". 3. After the "original instance" receives the ack, locally delete the data just given to the "target instance". 4. Continuously loop 1, 2, and 3 steps until the entire solt migration is completed

candidate : Codis also supports "asynchronous migration". For step 2 above, after the "original instance" sends data, it will continue to receive client requests without waiting for the "target instance" to return ack.

candidate : Data that has not been migrated is marked as "read-only" and will not affect data consistency. If there is a "write operation" to the data in the migration, it will make the client "retry" and finally write to the "target instance"

candidate : Also, for bigkey, asynchronous migration adopts the "split instruction" method for migration. For example, if there are 10,000 set elements, the "original instance" may send 10,000 commands to the "target instance" ", rather than a whole bigkey migration at once (because large objects are prone to blockage)

Interviewer : I understand.

This article summarizes :

reason for the birth of the cluster 1619ee01b06e4d: Write performance will encounter bottlenecks under high concurrency && cannot scale up indefinitely (not cost-effective)
cluster : Need to solve the problem of "data routing" and "data migration"
Redis Cluster data routing :
- By default, Redis Cluster has 16384 hash slots in a cluster, and the hash slots will be allocated to the instances in the Redis cluster.
- The instances of the Redis cluster will "communicate" with each other, interacting with the hash slot information they are responsible for (in the end, each instance has a complete mapping relationship)
- When the client requests, use the CRC16 algorithm to calculate the Hash value and modulate it to 16384, naturally you can get the hash slot and then get the corresponding Redis instance location
Why 16384 hash slots : 16384 hash slots can make the data distributed by Redis instances relatively uniform, but will not affect the interactive slot information between Redis instances, resulting in serious network performance overhead problems
Why does Redis Cluster use a hash slot instead of a non-consistent hash algorithm? : The implementation of the hash slot is relatively simple and efficient. Each expansion or contraction only needs to move the data corresponding to the Solt (slot), and generally does not move the entire Redis instance
Codis data routing : 1024 hash slots are allocated by default, and the mapping information will be saved to the Zookeeper cluster. Proxy will cache a copy locally. When the Redis cluster instance changes, DashBoard will update the mapping information between Zookeeper and Proxy
Redis Cluster and Codis data migration : Redis Cluster supports synchronous migration, Codis supports synchronous migration && asynchronous migration
- Add the new Redis instance to the cluster, and then migrate part of the data to the new instance (online)

Welcome to follow my WeChat public [1619ee01b07028 Java3y ] to talk about Java interviews. The online interviewer series is being updated continuously!

[Online Interviewer-Mobile] The series updated twice a week!

[Online Interviewer-Computer] The series updated twice a week!

Original is not easy! ! Seek three links! !