Probing in Redis replication

Redis needs some probing mechanisms during operation to ensure the ability to perceive the other end.

Slave rebuilds the replication phase

When the master-slave link is disconnected due to network or other reasons, the slave will try to rebuild the replication. In this process, the slave's replication state machine repl_state variable will go through a series of transmissions, and finally it will be REPL_STATE_CONNECTED state.

repl_state has a timeout setting for the dwell time of many states, so that resources can be removed as soon as possible after an error. The server.repl_transfer_lastio variable acts as a timer, which records the last time the slave made io interactions (ie read and write events) from the master.

REPL_STATE_CONNECTING timeout

In the REPL_STATE_CONNECTING stage, the slave will master and slave the connet master. This process uses non-blocking IO, and periodically checks whether it times out in the replicationCron function.

/* Non blocking connection timeout? */
if (server.masterhost &&
    (server.repl_state == REPL_STATE_CONNECTING ||
     slaveIsInHandshakeState()) &&
     (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
{
    serverLog(LL_WARNING,"Timeout connecting to the MASTER...");
    cancelReplicationHandshake();
}

REPL_STATE_TRANSFER timeout

In the REPL_STATE_TRANSFER stage, the slave will receive the rdb file from the master, which is usually not completed in one read, so it is necessary to periodically check whether the process times out in the replicationCron function.

/* Bulk transfer I/O timeout? */
if (server.masterhost && 
    server.repl_state == REPL_STATE_TRANSFER &&
    (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
{
    cancelReplicationHandshake();
}

If the amount of rdb data is too large, some additional processing may be required, as described below.

1) If the master dump rdb takes too long, on the slave side, the master client has not sent data, the callback function readSyncBulkPayload will not be triggered, then the repl_transfer_lastio variable will never be refreshed, and will be cancelReplicationHandshake in the timeout check, resulting in this time the main Slave sync failed. Therefore, the master will periodically send \n to probe the slaves that are waiting to receive rdb.

// master 处理逻辑
listRewind(server.slaves,&li);
while((ln = listNext(&li))) {
    client *slave = ln->value;

    int is_presync =
        (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_START ||
        (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_END &&
         server.rdb_child_type != RDB_CHILD_TYPE_SOCKET));

    if (is_presync) {
        if (write(slave->fd, "\n", 1) == -1) {
            /* Don't worry about socket errors, it's just a ping. */
        }
    }
}

2) After the slave receives the rdb file, if the rdb is too large, the loading process will hang, and the repl_ack_time variable will not be refreshed, making the master think that the slave has hung up. Therefore, it periodically sends \n for detection at rdbLoadRio .

// slave 处理逻辑
if (server.masterhost && server.repl_state == REPL_STATE_TRANSFER)
    replicationSendNewlineToMaster();
...

void replicationSendNewlineToMaster(void) {
    static time_t newline_sent;
    if (time(NULL) != newline_sent) {
        newline_sent = time(NULL);
        if (write(server.repl_transfer_s,"\n",1) == -1) {
            /* Pinging back in this stage is best-effort. */
        }
    }
}

After the master receives \n , it refreshes the client->repl_ack_time time in the processInlineBuffer function to prevent the client from detecting timeout.

// master 处理逻辑
if (querylen == 0 && getClientType(c) == CLIENT_TYPE_SLAVE)
    c->repl_ack_time = server.unixtime;

master-slave normal replication phase

When the master-slave replication is performed normally, the master will continue to send the commands stream to the slave to maintain the consistency of the master-slave dataset state.

During this process, the master and slave will send some probe packets to sense the status of master-slave replication.

slave probe

The detection of the slave depends on the client.lastinteraction variable, which records the last time the instance read data from the client fd and is updated in the read event callback function readQueryFromClient .

We know that from the perspective of master and slave, each other is a client with a special flag.

In the replicationCron function, it will check whether the master is normal. If abnormal, release the master client.

// slave 处理逻辑
if (server.masterhost && server.repl_state == REPL_STATE_CONNECTED &&
    (time(NULL)-server.master->lastinteraction) > server.repl_timeout)
{
    serverLog(LL_WARNING,"MASTER timeout: no data nor PING received...");
    freeClient(server.master);
}

client.lastinteraction The update of the variable needs to depend on the processing logic of the master.
For the slave nodes it knows, the master will periodically send ping commands, the period is determined by the configuration parameter repl-ping-replica-period, the unit is s.

// master 处理逻辑
if ((replication_cron_loops % server.repl_ping_slave_period) == 0 &&
    listLength(server.slaves))
{
    int manual_failover_in_progress =
        server.cluster_enabled &&
        server.cluster->mf_end &&
        clientsArePaused();

    // 跳过处于 mf 过程中的 slave 
    if (!manual_failover_in_progress) {
        ping_argv[0] = createStringObject("PING",4);
        replicationFeedSlaves(server.slaves, server.slaveseldb,
            ping_argv, 1);
        decrRefCount(ping_argv[0]);
    }
}

The slave refreshes the lastinteraction value after receiving the pong reply, and performs a timeout check every second.

master probe

The detection of the master depends on the variable, which records the last time the slave's REPLCONF command was received.

In the replicationCron function, it will check whether the slave is normal. If abnormal, release the slave client.

// master 处理逻辑
/* Disconnect timedout slaves. */
if (listLength(server.slaves)) {
    listIter li;
    listNode *ln;

    listRewind(server.slaves,&li);
    while((ln = listNext(&li))) {
        client *slave = ln->value;
        
        // 注意：这里是 SLAVE_STATE_ONLINE 状态的 slave，
        // 必然将 slave fd 挂上了写事件回调。
        if (slave->replstate != SLAVE_STATE_ONLINE) continue;
        if (slave->flags & CLIENT_PRE_PSYNC) continue;
        if ((server.unixtime - slave->repl_ack_time) > server.repl_timeout)
        {
            serverLog(LL_WARNING, "Disconnecting timedout replica: %s",
                replicationGetSlaveName(slave));
            freeClient(slave);
        }
    }
}

Why is the detection of the master not done by ping like the slave?
This is because the master needs to know the repl offset of each slave it knows. Therefore, the slave sends REPLCONF ACK <reploff> per second (the offset value is taken from client->reploff variable, which is updated after the slave processCommand), which also reaches the ping rate. Purpose.

// slave 处理逻辑
if (server.masterhost && server.master &&
    !(server.master->flags & CLIENT_PRE_PSYNC))
    replicationSendAck();

After the master receives the REPLCONF command, it refreshes the repl_ack_time value and performs a timeout check every second.

Probing in cluster mode

Probing in the redis cluster mode, through the cluster gossip message to perform equal probing, has been described in detail in the previous article.

Probing in Redis replication

Slave rebuilds the replication phase

REPL_STATE_CONNECTING timeout

REPL_STATE_TRANSFER timeout

master-slave normal replication phase

slave probe

master probe

Probing in cluster mode

happen

引用和评论

Redis 单机、哨兵、集群搭建

Redis-扩容机制

Redis-主从同步原理

Redis-单线程模型

Redis-内存机制

Redis-对象类型

Redis【2】- SDS源码分析