7

Redis master-slave replication problem

Redis master-slave replication can synchronize the data of the master node to the slave node. The slave node has two functions at this time:

  • Once the master node is down, the slave node as the backup of the master node can be up at any time.
  • Expand the reading ability of the master node and share the reading pressure of the master node.

图片

The following problems exist in master-slave replication at the same time:

  • Once the master node goes down, the slave node is promoted to the master node. At the same time, the master node address of the application side needs to be modified, and all slave nodes need to be ordered to replicate the new master node. The entire process requires manual intervention.
  • The writing ability of the master node is limited by a single machine.
  • The storage capacity of the master node is limited by a single machine.
  • The disadvantages of native replication will be more prominent in earlier versions, such as: Redis replication is interrupted, the slave node will initiate psync. If the synchronization is unsuccessful at this time, full synchronization will be performed. While the main database is performing full backup, it may cause stalls in milliseconds or seconds.
  • Reply to the keyword Redis Manual in the background of the Public Account of the Road to Migrant Workers. Get a free redis best practice and actual combat guide e-book

Redis's sentinel (Sentinel) in-depth exploration

Redis Sentinel architecture

图片

Redis is to solve the defects of our master-slave replication (election problem), ensure our Redis is highly available, and realize automatic fault detection and failover.

The system performs the following three tasks:

  • Monitoring: Sentinel will constantly check whether your master server and slave server are operating normally.
  • Reminder: When there is a problem with one of the Redis server , the sentry can send a notification to the programmer through the API
  • Automatic failover: When the master server is down, the sentry will start an automatic failover operation, upgrade one slave server to the master server, and let the other slave servers copy the new master server.

Configure Sentinel

Redis source code contains a file named sentinel.conf, which is an example of a Sentinel configuration file with detailed comments.

The minimum configuration required to run a Sentinel is as follows:

1)sentinel monitor mymaster 192.168.10.202 6379 2

Sentine监听的maste地址,第一个参数是给master起的名字,第二个参数为master IP,第三个为master端口,第四个为当该master挂了的时候,若想将该master判为失效,

在Sentine集群中必须至少2个Sentine同意才行,只要该数量不达标,则就不会发生故障迁移。

2)sentinel down-after-milliseconds mymaster 30000

表示master被当前sentinel实例认定为失效的间隔时间,在这段时间内一直没有给Sentine返回有效信息,则认定该master主观下线。

只有在足够数量的 Sentinel 都将一个服务器标记为主观下线之后, 服务器才会被标记为客观下线,将服务器标记为客观下线所需的 Sentinel 数量由对主服务器的配置决定。

3)sentinel parallel-syncs mymaster 2

当在执行故障转移时,设置几个slave同时进行切换master,该值越大,则可能就有越多的slave在切换master时不可用,可以将该值设置为1,即一个一个来,这样在某个

slave进行切换master同步数据时,其余的slave还能正常工作,以此保证每次只有一个从服务器处于不能处理命令请求的状态。

4)sentinel can-failover mymaster yes

在sentinel检测到O_DOWN后,是否对这台redis启动failover机制

5)sentinel auth-pass mymaster 20180408

设置sentinel连接的master和slave的密码,这个需要和redis.conf文件中设置的密码一样

6)sentinel failover-timeout mymaster 180000

failover过期时间,当failover开始后,在此时间内仍然没有触发任何failover操作,当前sentinel将会认为此次failoer失败。 

执行故障迁移超时时间,即在指定时间内没有大多数的sentinel 反馈master下线,该故障迁移计划则失效

7)sentinel config-epoch mymaster 0

选项指定了在执行故障转移时, 最多可以有多少个从服务器同时对新的主服务器进行同步。这个数字越小, 完成故障转移所需的时间就越长。

8)sentinel notification-script mymaster /var/redis/notify.sh

当failover时,可以指定一个"通知"脚本用来告知当前集群的情况。

脚本被允许执行的最大时间为60秒,如果超时,脚本将会被终止(KILL)


9)sentinel leader-epoch mymaster 0

同时一时间最多0个slave可同时更新配置,建议数字不要太大,以免影响正常对外提供服务。
Subjective offline and objective offline
  • Subjective offline: Refers to the offline judgment made by a single Sentinel instance on the server.
  • Objective offline: Refers to multiple Sentinel instances making SDOWN subjective offline judgments on the same server.

How Redis Sentinel works

1. Each Sentinel sends a PING command to its known master server, slave server, and other Sentinel instances at a frequency of once per second.

图片

2. If an instance is more than the specified value since the last effective reply to the PING command, then this instance will be marked as subjective offline by Sentinel.

图片

3. All Sentinels that are monitoring this main server must confirm that the main server has indeed entered a subjective offline state at a frequency of once per second.

图片

4. If a sufficient number of Sentinel agree to this judgment within the specified time frame, then the main server is marked as objectively offline.

图片

5. Each Sentinel will send INFO commands to all known master servers and slave servers every 10 seconds. When a master server is marked as objectively offline by Sentinel, the frequency of Sentinel sending INFO commands to all slave servers of the offline master server will be changed from once every 10 seconds to once every second. Reply to the keyword Redis Manual in the backstage of the official account of Migrant Workers' Tech Road. Get a free redis best practice and actual combat guide e-book.

图片

6. Sentinel and other Sentinel negotiate the status of the master node. If the master node is in the SDOWN state, a new master node will be automatically selected by voting. Point the remaining slave nodes to the new master node for data replication.

图片

7. When there is not enough Sentinel to agree to the offline of the main server, the objective offline status of the main server will be removed. When the main server returns a valid reply to Sentinel's PING command, the main server's subjective offline status will be removed.

图片

Automatically discover Sentinel and slave servers

A Sentinel can be connected with multiple other Sentinels, and each Sentinel can check the availability of each other and exchange information.

You don't need to set the addresses of other Sentinels separately for each Sentinel you are running, because Sentinel can automatically discover other Sentinels that are monitoring the same master server through the publish and subscribe functions.

  • Each Sentinel will send a message to all the channels of the master server and slave server monitored by it at a frequency of once every two seconds through the publish and subscribe function. The message contains the Sentinel's IP address, port number and running ID ( runid).
  • Each Sentinel subscribes to the channels of all the master servers and slave servers monitored by it, looking for sentinels that have not appeared before. When a Sentinel finds a new Sentinel, it will add the new Sentinel to a list.
  • The information sent by Sentinel also includes the complete current configuration of the main server. If a Sentinel contains a master server configuration that is older than the configuration sent by another Sentinel, then this Sentinel will immediately be upgraded to the new configuration.
  • Before adding a new Sentinel to the list of monitoring main servers, Sentinel will first check whether the list already contains Sentinel with the same operating ID or the same address (including IP address and port number) as the Sentinel to be added. If it is If yes, Sentinel will first remove those Sentinels that have the same operation ID or the same address in the list, and then add a new Sentinel

Failover

A failover operation consists of the following steps:

  • It is found that the main server has entered an objective offline state.
  • Increment our current era and try to be elected in this era.
  • If the election fails, it will try to be elected again after twice the set failure migration timeout time. If successful, then follow the steps below.
  • Select a slave server and upgrade it to the master server.
  • Send the SLAVEOF NO ONE command to the selected slave server to turn it into the master server.
  • Through the publish and subscribe function, the updated configuration is propagated to all other Sentinels, and other Sentinels update their own configurations.
  • Send the SLAVEOF command to the slave servers of the offline master server to let them replicate the new master server.
  • When all the slave servers have started to replicate the new master server, the leader Sentinel terminates the failover operation.

Author: After a while
https://my.oschina.net/u/3995125/blog/3133204


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer