Author: Yue Mingqiang

A member of the DBA team of Aikesheng Beijing Branch, responsible for the operation and maintenance of the database management platform and MySQL problem handling. Good at fault location of MySQL.

Source of this article: original submission

* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.


What is sentinel

As one of the most popular NoSQL databases, Redis has the main features: high performance, high availability, high scalability and rich data structure. Among them, sentinel, as the high-availability implementation architecture of Redis, can automatically switch to the slave library after the main library fails. So what is its principle, and we will reveal it next.

Mutual discovery between sentinel

When we configure the sentinel cluster, each sentinel only configures the connection to the main library, but there is no mutual configuration between the sentinels, so how can sentinels be identified?

Each sentinel periodically (2s) sends a message to the _sentinel_:hello of the master-slave Pub/Sub under the cluster, writes its own information, and subscribes at the same time. When a new sentinel is detected, it will immediately join the cluster.

Determine if the master node is offline

Under normal circumstances, 3 or more sentinel nodes are required to form a cluster, and sentinel sends a PING command to the main library every SENTINEL_PING_PERIOD (1000ms)

When no result is returned within the time of is-master-down-after-milliseconds, then the status is set to s_down (subjectively offline). Can I switch at this time?

The answer is no, because this is only a single-node sentinel's monitoring of redis, and it needs to be combined with the detection results of other sentinel nodes to determine whether to switch.

After the judgment is subjectively offline, SENTINEL is-master-down-by-addr will be sent to other sentinel nodes. When the sentinel node supports at least more than quorum, then the status is set to o_down (objective offline), and the node is judged to be down String.

Select new node

When it is judged that the master node is offline, it will judge whether the remaining nodes can be elected as the master according to certain conditions:

1. The status of the slave library is normal. Slave servers with S_DOWN, O_DOWN and DISCONNECTED status will not be selected

2. The slave library responds normally. PING more than 5s, INFO more than 3 times the time of info_refresh does not receive a reply from the server will not be selected

3. Copy the relatively normal slave library. Master-slave connection interruption time cannot be too long

4. Priority cannot be 0

the slave libraries according to the following content, and select the best instance to promote to the master library

Slaves with lower priority> Slaves with larger copy offsets> Slaves with smaller runid> Slaves with more commands

The magical Gossip protocol

The state transition of Redis state from S_DOWN to O_DOWN is resolved through negotiation between sentinels, so do they directly broadcast the information to other nodes?

There is a saying that rumors are the fastest spreading virus in the world, so how do rumors spread virally? For example, if a male artist collapsed in a house, the distance from a very small circle to the general public may take only a few days. Of course, this is not the very few people who have announced to everyone: I have some black material. It's that a few people first spread it to people around them, and then people around them add some guesses and spread it to more people, until the black material of a certain artist on the market is flying all over the sky.

Sentinel uses such a way to communicate, called the Gossip protocol. The difference from the above is that all the information in the middle is true. Each node randomly sends its own node information to some nodes. In the second wave, it will randomly send the information it has to other nodes, one by one. The wave spreads until all nodes have the same data. Another use of the Gossip protocol on Redis is for internal communication of Redis Cluster. If the number of shards in Redis Cluster is too large, the transmission of information between instances will be very laborious. The Gossip protocol is adopted to ensure that no matter how many nodes are added, the communication pressure between all nodes is not large.

Concluding remarks

This article mainly explains the conceptual things of Redis sentinel, including judgment methods, election methods, switching procedures, communication methods, etc. If you want to know more, please pay attention to the follow-up articles of Redis in our public account.


爱可生开源社区
426 声望207 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。