Author: Jia Shiwen Zhan Enqiang
RedisSyncer is a Redis synchronization middleware that simulates slave through replication protocol to obtain source Redis node data and write to target Redis to realize data synchronization. The project mainly includes the following sub-projects:
redis synchronization service engine redissyncer-server
redissycner client redissyncer-cli
redis data verification tool redissycner-compare
An integrated deployment solution redissyncer based on docker-compse
This article mainly introduces the design and implementation of the reidssyncer engine (redissyncer-server), as well as the mechanism of the engine running.
Synchronization process <br>The native redis master slave mode is mainly divided into two stages. The first stage synchronizes the rdb mirror, which is the full synchronization part; after the full synchronization is completed, the command propagation mode is entered, and each successful data change operation will be synchronized to the slave node. The redissyncer simulates this mechanism and disassembles the two parts, which can perform either a full synchronization task or a full or incremental synchronization alone.
create socket
Send auth user password (new user in 6.0)
OK 成功
其他 error
send->ping
返回:
ERR invalid password 密码错误
NOAUTH Authentication required.没有发送密码
operation not permitted 操作没权限
PONG 密码成功
作用:
检测主从节点之间的网络是否可用。
检查主从节点当前是否接受处理命令。
Send slave port information
REPLCONF listening-port <port>
-->OK 成功
-->其他 失败
Send slave node IP
REPLCONF ip-address <IP>
--> OK 成功
--> 其他 失败
Send EOF capability (capability)
REPLCONF capa eof
--> OK 成功
--> 失败
作用:
是否支持EOF风格的RDB传输,用于无盘复制,就是能够解析出RDB文件的EOF流格式。用于无盘复制的方式中。
redis4.0支持两种能力 EOF 和 PSYNC2
redis4.0之前版本仅支持EOF能力
Send PSYNC2 capability
REPLCONF capa PSYNC2
--> OK 成功
--> 失败
作用:
告诉master支持PSYNC2命令 , master 会忽略它不支持的能力. PSYNC2则表示支持Redis4.0最新的PSYN复制操作。
Send PSYNC
PSYNC {replid} {offset}
--> FULLRESYNC {replid} {offset} 完整同步
--> CONTINUE 部分同步
--> -ERR 主服务器低于2.8,不支持psync,从服务器需要发送sync
--> NOMASTERLINK 重试
--> LOADING 重试
--> 超过重试机制阈值宕掉任务
读取PSYNC命令状态,判断是部分同步还是完整同步
PSYNC —> start heartbeat
REPLCONF ACK <replication_offset>
心跳检测
在命令传播阶段,从服务器默认会以每秒一次的频率
发送REPLCONF ACK命令对于主从服务器有三个作用:
作用:
检测主从服务器的网络连接状态;
辅助实现min-slaves选项;
检测命令丢失。
REPLCONF GETACK
->REPLCONF ACK <replication_offset>
After the rdb mirror synchronization is completed, command propagation is entered, and the master will continuously push the changed data to the slave.
In order to ensure that there are mechanisms such as breakpoint resuming, data compensation, and disconnection reconnection in RedisSyncer to ensure the stability and availability during the data synchronization process, the specific mechanisms are as follows.
Breakpoint Resume Mechanism
The resuming mechanism of RedisSyncer is based on the replid and offset of Redis. RedisSyncer has two versions of resuming mechanism v1 and v2.
v1 version:
After the v1 version data is written to the destination redis, the offset is persisted locally, so that the next restart will be pulled from the last offset. But because of this scheme, the operation of writing the destination and the persistence of offset are not atomic operations. If an interruption occurs in the middle, it will cause data inconsistency. For example, if the data is successfully written to the destination first, and then a crash, restart, etc. occurs before the persistent offset is successful, then the data of the last offset pulled by the resuming of the breakpoint will be inconsistent in the end.
v2 version:
In the v2 version strategy, RedisSyncer will wrap the commands without transactions in each pipeline batch through multi and exec, and insert an offset checkpoint at the end of the transaction. When resuming the transfer from a breakpoint, it is necessary to search for the checkpoint from the db library of the target Redis and find the maximum offset of the corresponding source node, and then resume the transfer from a breakpoint according to the offset. At present, the v2 version only supports the case where the target is a single-machine Redis. In v2 version:
v2 command transaction encapsulation structure
v2 checkpoint checkpoint structure:
HASH hset redis-syncer-checkpoint {value}
{value}:
* {ip}:{port}-runid {replid}
* {ip}:{port}-offset {offset}
* pointcheckVersion {version}
Although rollback is not supported in the transaction mechanism of Redis, and if the transaction is executed in the middle of the transaction, the transaction is still executed and completed, but consistency can be guaranteed except in special cases. In the v2 mechanism, in order to prevent 'write amplification', a checkpoint will be written in each logical library of the target redis, so when performing a breakpoint resuming operation, the synchronization tool will first scan the checkpoints in each logical library of the target And select the checkpoint with the largest offset in it as the parameter of the breakpoint resuming.
Data compensation mechanism
In the process of data synchronization, there are cases where key writing fails due to network stability or other factors. For this reason, redissyncer implements a compensation mechanism to ensure the consistency of source and destination data. The premise of data compensation is the idempotency of command writing. Therefore, in RedisSyncer, some non-idempotent commands such as INCR, INCRBY, INCRBYFLOAT, APPEND, DECR, and DECRBY are converted into idempotent commands and then written to the target Redis. When the target is a stand-alone Redis or Proxy, RedisSyncer writes data to the target Redis through the pipeline mechanism. The submission of each batch of pipeline will return a list of results. The synchronization tool will verify the correctness of the results in the pipeline. If Some commands fail to be written, and the synchronization tool retries the batch of commands related to the key. If the retry exceeds the specified threshold, the task will be aborted. For non-idempotent structures such as lists with large keys, data compensation will not be performed, and the task will be forced to end for manual processing.
Disconnection and reconnection mechanism
Due to network jitter and other reasons, the connection between the source end and the target end of the synchronization tool may be disconnected during the synchronization process, so a disconnection retry mechanism is required to ensure that the problem of abnormal disconnection occurs during the process of task synchronization. The disconnection and reconnection mechanism exists between the connection with the source Redis node and the RedisSyncer, RedisSyncer and the target Redis node, and the two have their own processing mechanisms.
Source-end reconnection mechanism The disconnection and reconnection mechanism between source Redis and RedisSyncer is implemented through the recorded offset. When the connection is disconnected due to network exceptions and other reasons, RedisSyncer will retry to establish a connection with the source Redis node and pass the current The runid, offset and other information recorded by the task are used to pull the incremental data before the disconnection. After the connection is re-established successfully, the synchronization task of RedisSyncer will continue to synchronize without awareness. When the disconnection and reconnection exceeds the specified retry threshold or there is no way to resume data transmission because the offset is brushed, RedisSyncer will stop the current synchronization task and wait for manual intervention.
Target reconnection mechanism
The disconnection and reconnection mechanism between RedisSyncer and the target Redis is realized by caching the previous batch of pipeline commands. When the connection is disconnected abnormally, RedisSyncer reconnects and plays back the previous batch of failed write commands. When playback fails or exceeds the number of consecutive retries, RedisSyncer will shut down the current synchronization task and wait for manual intervention.
chaining of commands
RedisSyncer adopts a chain strategy to process synchronization data. If any strategy fails, the key will not be synchronized. The chain strategy process is shown in the figure
Each key will be processed by a strategy chain in RedisSyncer. As long as one strategy fails, the key will not be synchronized to the target Redis. For example, if the calculation strategy of the key expiration time calculates that the key has expired in the full stage, it will automatically Discard the key.
The policies in the policy chain include:
Task management task startup process
When the task is stopped and the cleaning process task is actively stopped, RedisSyncer will first stop data writing on the source Redis side and then enter the data protection state to ensure that a small part of data that may still be in RedisSyncer that has not been written to the target can be completely written to the target side, and Correctly record the offset of the last piece of data written and persist it to ensure that RedisSyncer can provide the correct offset when resuming the transfer from a breakpoint.
task status
Task exception handling principle If an error is encountered in a RedisSyncner task that may cause data inconsistency, RedisSyncer will shut down the task and wait for manual intervention.
Rdb cross-version synchronization implementation
There is a forward compatibility problem with the rdb file, that is, the rdb file of the higher version cannot be imported into the Redis of the lower version of rdb
Cross-version migration implementation mechanism For structures that may have large keys, such as: SET, ZSET, LIST, HASH and other structures:
For other commands such as String and other structures: In order to ensure the idempotency of their commands, the command parser will serialize (implement DUMP) according to the RDB version of the target REDIS node, and the transmission module will use REPLACE to deserialize to the target node. (The REPLACE command does not support [REPLACE] in versions below redis 3.0)
For commands that do not have sequential requirements on data members, such as: SET, ZSET, HASH command parser parses it into one or more sadd, zadd, hmset and other commands for processing
For commands that have sequential requirements on data members, such as: List and other commands, if the command parser judges that it is a big key and splits it into multiple subcommands, it must be sent to the target REDIS node in order.
Problems existing between REDIS versions: Since REDIS is backward compatible (the lower version cannot be compatible with the higher version of RDB), there is a version number identifier in its RDB file protocol. REDIS will first detect when RDB is imported or rdbLoad is fully synchronously executed. Whether the RDB VERSION is backward compatible, if not, it will throw Can't handle RDB format version error.
The syncer cross-version implementation mechanism For the full synchronization of RDB data, the syncer divides the commands into two categories for processing.
About the RDB VERSION section in the RDB file protocol
Example of the beginning of the REDIS RDB file structure
----------------------------# RDB is a binary format. There are no new lines or spaces in the file.
52 45 44 49 53 # Magic String "REDIS"
30 30 30 37 # 4 digit ASCCII RDB Version Number. In this case, version = "0007" = 7 RDB VERSION field
FE 00 # FE = code that indicates database selector. db number = 00
Pseudo code for checking part of RDB VERSION
def rdbLoad(filename):
rio = rioInitWithFile(filename);
# 设置标记:
# a. 服务器状态:rdb_loading = 1
# b. 载入时间:loading_start_time = now_time
# c. 载入大小:loading_total_bytes = filename.size
startLoading(rio)
# 1.检查该文件是否为RDB文件(即文件开头前5个字符是否为"REDIS")
if !checkRDBHeader(rio):
redislog("error, Wrong signature trying to load DB from file")
return
# 2.检查当前RDB文件版本是否兼容(向下兼容)
if !checkRDBVersion(rio):
redislog("error, Can't handle RDB format version")
return
.........
//Redis中关于RDB_VERSION检查的代码
rdbver = atoi(buf+5);
if (rdbver < 1 || rdbver > RDB_VERSION) {
rdbCheckError("Can't handle RDB format version %d",rdbver);
goto err;
}
Large Key Split During RDB Synchronization
When RedisSyncer encounters LIST, SET, ZSET, HASH and other structures during the full synchronization phase, when the data size exceeds the threshold, RedisSyncer will split the key into multiple subcommands in the form of an iterator and write it to the target library. Prevent some large keys from being read into memory at one time, causing the program to generate oom and improve the speed of synchronization. For command synchronization tools that do not have large keys, they will be written to the target in the form of serialization and deserialization.
Appendix 1 Redis RDB Protocol
redis RDB Dump file format
----------------------------# RDB is a binary format. There are no new lines or spaces in the file.
52 45 44 49 53 # Magic String "REDIS"
30 30 30 37 # 4 digit ASCCII RDB Version Number. In this case, version = "0007" = 7
----------------------------
FE 00 # FE = code that indicates database selector. db number = 00
----------------------------# Key-Value pair starts
FD $unsigned int # FD indicates "expiry time in seconds". After that, expiry time is read as a 4 byte unsigned int
$value-type # 1 byte flag indicating the type of value - set, map, sorted set etc.
$string-encoded-key # The key, encoded as a redis string
$encoded-value # The value. Encoding depends on $value-type
----------------------------
FC $unsigned long # FC indicates "expiry time in ms". After that, expiry time is read as a 8 byte unsigned long
$value-type # 1 byte flag indicating the type of value - set, map, sorted set etc.
$string-encoded-key # The key, encoded as a redis string
$encoded-value # The value. Encoding depends on $value-type
----------------------------
$value-type # This key value pair doesn't have an expiry. $value_type guaranteed != to FD, FC, FE and FF
$string-encoded-key
$encoded-value
----------------------------
FE $length-encoding # Previous db ends, next db starts. Database number read using length encoding.
----------------------------
... # Key value pairs for this database, additonal database
FF ## End of RDB file indicator
8 byte checksum ## CRC 64 checksum of the entire file.
RDB文件以魔术字符串“REDIS”开头。
52 45 44 49 53 # "REDIS"
RDB 版本号
接下来的 4 个字节存储 rdb 格式的版本号。这 4 个字节被解释为 ascii 字符,然后使用字符串到整数转换转换为整数。
00 00 00 03 # Version = 3
Database Selector
一个Redis实例可以有多个数据库。
单个字节0xFE标记数据库选择器的开始。在该字节之后,一个可变长度字段指示数据库编号。请参阅“长度编码”部分以了解如何读取此数据库编号。
键值对
在数据库选择器之后,该文件包含一系列键值对。
za
每个键值对有 4 个部分 -
1.密钥到期时间戳。
2.指示值类型的一字节标志
3.密钥,编码为 Redis 字符串。请参阅“Redis 字符串编码”
4.根据值类型编码的值。参见“Redis 值编码”
Appendix II Redis RESP Protocol
Redis RESP protocol
The RESP protocol was introduced in Redis 1.2, but it became the standard way to communicate with Redis servers in Redis 2.0. is the protocol implemented in the Redis client. RESP is actually a serialization protocol that supports the following data types: simple strings, errors, integers, bulk strings, and arrays.
The way RESP is used as a request-response protocol in Redis is as follows:
The client sends commands to the Redis server as a RESP array of bulk strings.
The server replies with one of the RESP types according to the command implementation.
In RESP, the type of some data depends on the first byte:
For simple strings, the first byte of the reply is "+"
For errors, the first byte of the reply is "-"
For integers, the first byte of the reply is ":"
For bulk strings, the first byte of the reply is "$"
For arrays, the first byte of the reply is "*"
RESP can represent Null values using special variants of bulk strings or arrays specified later. In RESP, different parts of the protocol are always terminated with "\r\n" (CRLF).
RESP Simple Strings
Starts with a '+' character, followed by a string that cannot contain CR or LF characters (newlines are not allowed), and ends with CRLF (ie "\r\n"). like:
"+OK\r\n"
1
RESP Errors
"-Error message\r\n"
1
like:
-ERR unknown command 'foobar'
-WRONGTYPE Operation against a key holding the wrong kind of value
1
2
RESP Integers
Integers is just a CRLF-terminated string representing an integer, prefixed with a ":" byte. E.g
":0\r\n"
":1000\r\n"
1
2
Bulk Strings
Used to represent a single binary safe string up to 512 MB in length. Bulk strings are encoded as follows:
The "$" byte is followed by the number of bytes (prefix length) that make up the string, terminated by CRLF.
the actual string data.
Last CRLF.
"foobar" is encoded as follows:
"$6\r\nfoobar\r\n"
1
when the string is empty
"$0\r\n\r\n"
1
Bulk Strings can also be used to represent Null values in a special format to indicate that the value does not exist. In this special format, the length is -1 and there is no data, so Null is represented as:
"$-1\r\n"
1
RESP Arrays
Format:
A '*' character as the first byte, then the number of elements in the array as a decimal number, then CRLF.
https://segmentfault.com/write###
Additional RESP type for each element of the Array. An empty array is represented as:
"*0\r\n"
1
An array of "foo" and "bar" is represented as
"*2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n"
1
["foo",nil,"bar"] (Null elements in Arrays)
*3\r\n$3\r\nfoo\r\n$-1\r\n$3\r\nbar\r\n
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。