How to reduce Codis storage costs by 90%? One push did it with Pika!

As a data intelligence company, Getui not only has massive relational data, but also has accumulated rich key-value and other non-relational data resources. Getui uses Codis to store large-scale key-value data. With the continuous increase of the company's kv type data, the cost of using native Codis to build a cluster is getting higher and higher. In some scenarios that do not require high performance response, Getui plans to adopt new storage and management solutions to effectively balance cost and performance. After selection, Getui introduced the 360 open source storage system Pika as the underlying storage of Codis to replace the higher-cost codis-server and manage distributed kv data clusters. The process of connecting Pika to Codis was not smooth sailing. In order to better meet the needs of business scenarios, a series of design and renovation work was carried out.

This article is the fourth part of the topic of "Big Data Cost Reduction and Efficiency Improvement". I will share with you the practical experience of how to perfectly combine Pika and Codis, and ultimately save 90% of big data storage costs.

Four components of Codis

Before understanding the specific migration practice, you need to have a preliminary understanding of the basic architecture of Codis. Codis is a distributed Redis solution consisting of four components, codis-fe, codis-dashboard, codis-proxy, and codis-server.

Among them, codis-server is the most core and basic component in Codis. Based on the Redis 3 version, codis-server has expanded its functions, but it still relies on high-performance Redis to provide services. codis-server extends the slot-based key storage function (in order to realize the function of slot, codis-server will occupy additional memory beyond the storage required for data), and can perform hot migration of slot data between different groups in the Codis cluster.
codis-fe provides a management interface that is friendly to operation and maintenance, which is convenient for unified management of multiple sets of codis-dashboard.
codis-dashboard is responsible for managing the data consistency of components such as slot, codis-proxy, and ZooKeeper (or etcd), the operation and maintenance status of the entire cluster, the expansion and contraction of data, and the high availability of components, similar to the api-server function of k8s.
codis-proxy is mainly provided to the access proxy used at the business level, which is responsible for parsing the request route and routing the key's routing information to the corresponding back-end group. In addition, codis-proxy also has a very important function, that is, when the cluster is expanded or contracted through codis-fe, codis-proxy will trigger the key migration process according to the migration status of the slot corresponding to the group, which can achieve uninterrupted Live migration of data in the case of business services to ensure business availability.

Challenges of Pika accessing Codis

We introduced Pika mainly to replace codis-server. As a 360 open source Redis-like storage system, the bottom layer of Pika uses RocksDB, which is fully compatible with the Redis protocol, and the mainstream version provides Codis access capabilities. But in the process of introducing Pika and migrating data to Codis, we found that the combination of Pika and Codis is not perfect.

Problem 1: Inconsistent grammar

Before connecting, we deeply checked and compared the source code of Pika and Codis, and found that there are relatively few commands implemented by Pika. It remains to be seen whether some functions can be used normally after connecting Pika to Codis.

Pika (version 3.4.0) source code in pika_command.h header file:

//Codis Slots
const std::string kCmdNameSlotsInfo = "slotsinfo";
const std::string kCmdNameSlotsHashKey = "slotshashkey";
const std::string kCmdNameSlotsMgrtTagSlotAsync = "slotsmgrttagslot-async";
const std::string kCmdNameSlotsMgrtSlotAsync = "slotsmgrtslot-async";
const std::string kCmdNameSlotsDel = "slotsdel";
const std::string kCmdNameSlotsScan = "slotsscan";
const std::string kCmdNameSlotsMgrtExecWrapper = "slotsmgrt-exec-wrapper";
const std::string kCmdNameSlotsMgrtAsyncStatus = "slotsmgrt-async-status";
const std::string kCmdNameSlotsMgrtAsyncCancel = "slotsmgrt-async-cancel";
const std::string kCmdNameSlotsMgrtSlot = "slotsmgrtslot";
const std::string kCmdNameSlotsMgrtTagSlot = "slotsmgrttagslot";
const std::string kCmdNameSlotsMgrtOne = "slotsmgrtone";
const std::string kCmdNameSlotsMgrtTagOne = "slotsmgrttagone";

The commands supported by codis-server are as follows:

 {"slotsinfo",slotsinfoCommand,-1,"rF",0,NULL,0,0,0,0,0},
    {"slotsscan",slotsscanCommand,-3,"rR",0,NULL,0,0,0,0,0},
    {"slotsdel",slotsdelCommand,-2,"w",0,NULL,1,-1,1,0,0},
    {"slotsmgrtslot",slotsmgrtslotCommand,5,"w",0,NULL,0,0,0,0,0},
    {"slotsmgrttagslot",slotsmgrttagslotCommand,5,"w",0,NULL,0,0,0,0,0},
    {"slotsmgrtone",slotsmgrtoneCommand,5,"w",0,NULL,0,0,0,0,0},
    {"slotsmgrttagone",slotsmgrttagoneCommand,5,"w",0,NULL,0,0,0,0,0},
    {"slotshashkey",slotshashkeyCommand,-1,"rF",0,NULL,0,0,0,0,0},
    {"slotscheck",slotscheckCommand,0,"r",0,NULL,0,0,0,0,0},
    {"slotsrestore",slotsrestoreCommand,-4,"wm",0,NULL,0,0,0,0,0},
    {"slotsmgrtslot-async",slotsmgrtSlotAsyncCommand,8,"ws",0,NULL,0,0,0,0,0},
    {"slotsmgrttagslot-async",slotsmgrtTagSlotAsyncCommand,8,"ws",0,NULL,0,0,0,0,0},
    {"slotsmgrtone-async",slotsmgrtOneAsyncCommand,-7,"ws",0,NULL,0,0,0,0,0},
    {"slotsmgrttagone-async",slotsmgrtTagOneAsyncCommand,-7,"ws",0,NULL,0,0,0,0,0},
    {"slotsmgrtone-async-dump",slotsmgrtOneAsyncDumpCommand,-4,"rm",0,NULL,0,0,0,0,0},
    {"slotsmgrttagone-async-dump",slotsmgrtTagOneAsyncDumpCommand,-4,"rm",0,NULL,0,0,0,0,0},
    {"slotsmgrt-async-fence",slotsmgrtAsyncFenceCommand,0,"rs",0,NULL,0,0,0,0,0},
    {"slotsmgrt-async-cancel",slotsmgrtAsyncCancelCommand,0,"F",0,NULL,0,0,0,0,0},
    {"slotsmgrt-async-status",slotsmgrtAsyncStatusCommand,0,"F",0,NULL,0,0,0,0,0},
    {"slotsmgrt-exec-wrapper",slotsmgrtExecWrapperCommand,-3,"wm",0,NULL,0,0,0,0,0},
    {"slotsrestore-async",slotsrestoreAsyncCommand,-2,"wm",0,NULL,0,0,0,0,0},
    {"slotsrestore-async-auth",slotsrestoreAsyncAuthCommand,2,"sltF",0,NULL,0,0,0,0,0},
    {"slotsrestore-async-select",slotsrestoreAsyncSelectCommand,2,"lF",0,NULL,0,0,0,0,0},
    {"slotsrestore-async-ack",slotsrestoreAsyncAckCommand,3,"w",0,NULL,0,0,0,0,0},

Also, the syntax supported by codis-server and Pika is different. For example, if you want to view the detailed information of slot 1 on a node, the commands executed by Codis and Pika are as follows: That is, we must add support for the Pika syntax format to the command scheduling and management functions of the codis-fe layer.

In response to this problem, we implemented support for Pika master-slave synchronization, master-slave promotion and other related commands by modifying part of the source code logic in the codis-dashboard layer, thus completing the operation at the codis-fe level.

Problem 2: The data migration was not completed successfully

After completing the above operations, we start to migrate the kv data to Pika. Then, the problem came, we found that although the codis-fe interface shows that the data has been migrated, the data to be migrated has not been migrated to the corresponding cluster. On the codis-fe interface, we did not see any obvious error messages.

Why does this problem occur?

We continued to view the source code of Pika about the slot:

void SlotsMgrtSlotAsyncCmd::Do(std::shared_ptr<Partition> partition) {
  int64_t moved = 0;
  int64_t remained = 0;
  res_.AppendArrayLen(2);
  res_.AppendInteger(moved);
  res_.AppendInteger(remained);
}

We found that in daily operation, the command sent to Pika through codis-dashboard is a successful return, so codis-dashboard immediately receives a successful signal during migration, and then directly changes the migration status to success, but in fact At this point the data migration is not actually performed.

In response to this situation, we consulted the official Pika documentation on the case of Pika with Codis expansion.

From the official documentation, this migration scheme is a lossy scheme that may lose data. We need to redesign and adjust the migration scheme according to our own situation.

1. Design and develop Pika migration tool

First of all, according to the principle of data expansion and contraction of Codis, we refer to the architecture design of codis-proxy, and use the Go language to design and develop a set of Pika data migration tools to achieve the following functional requirements:

Disguise the Pika migration tool as a Pika instance to access Codis and provide services.
Using the Pika migration tool as a traffic forwarding tool, similar to codis-proxy, can forward the request of the corresponding slot to the specified Pika instance, thereby ensuring business availability during the migration process.
It enables the Pika migration tool to perceive the master-slave synchronization during the migration process. When the master-slave is completed, it can automatically disconnect the slave node and write the new data into the new cluster, so as to ensure data consistency during the traffic distribution process. sex.

2. Use the Pika migration tool for live data migration

After completing the design and development of the Pika migration tool according to the above requirements, we can use this tool to perform live migration of data.

The migration process is as follows:

Step1: The original state of the cluster

From the figure below, we can see that we need to migrate the slot information in the 901-1023 interval in 801-1023 to the new component, namely Group4, to provide services as a new instance.

Step2: Connect the Pika migration tool to to provide services16239ba53a2283

Before the Pika migration tool is connected to Codis, we need to use the 901-1023 to be migrated in Group3 as the master node of Group4, and perform master-slave data synchronization. At this time, 901-1023 of Group3 is the master, and 901-1023 of Group4 is the slave. After completing this step, the Pika migration tool can be connected to Codis. First migrate the slot information of 801-1023 to the Pika migration tool. At this time, the Pika migration tool writes the read and write information of 801-900 to Group3. In the Pika migration tool, point the read and write information of 901-1023 to both Group4 and Group3. Then go to the next step.

Step3: Master-slave synchronization data and dynamically switch master-slave

At this point, the Pika migration tool has completed the access, and it will forward the slot request of 801-1023 to the backend. It should be noted here that the Pika migration tool will check whether the master-slave synchronization is completed when processing write traffic. If the master-slave synchronization is completed, the Pika migration tool will directly disconnect the slave of the Pika instance in Group4 and write new data to Group4, otherwise it will continue to route the written data to Group3. If it is read traffic, the Pika migration tool will first try to obtain the data of Group4, if it is obtained, it will return, otherwise it will go to Group3 to obtain the data. If there is no write traffic in the slot 901-1023, it is impossible to judge whether the master-slave synchronization of the slot is completed and whether to disconnect the master-slave, then we can send a command for the slot to the Pika migration tool to perform this operation. Until the master-slave synchronization of all slots in Group4 is completed and the master-slave is disconnected, the next step is performed.

The following figure visually shows the job logic of the Pika migration tool:

Step4: Move the slot to be migrated into the new Group

After completing step 3, the slot information of the Pika migration tool, namely 801-900, is migrated back to Group3, and 901-1023 is migrated to Group4. After 901-1023 is completely migrated to Group4, the redundant old data in the original Group3 can be deleted.

So far, we have completed the expansion of the kv cluster through the Pika migration tool.

It should be noted here that most of the functions of the Pika migration tool are similar to codis-proxy, except that the corresponding routing rules need to be converted and the syntax instructions that support Pika are added. The reason why it can be designed and implemented in this way is that all atomic command operations are generated during the codis-proxy migration process, so that the data at the target end can be intercepted at the layer of the Pika migration tool, and the data can be dynamically written to the corresponding in the cluster.

Plan effect measurement

After the above series of operations, we successfully replaced the original codis-server with Pika. So have we achieved our pre-conceived goal of taking into account cost and performance?

First of all, in terms of performance, according to the feedback from the online business party, the current overall business service p99 value is 250 milliseconds (including multiple operations on Codis and Pika), which can meet the performance requirements of the current network.

Looking at the cost, since the data structures of the stored keys are similar, the actual physical space occupied is basically the same. By converting Pika's data into codis-server storage, the memory usage is about 24/4 82 = 480G of memory space. According to the current operation and maintenance experience, if 480G of data is actually stored, 10G of data is stored per node, and a single node has a maximum of 15G, which requires 48 nodes, that is, 256G 6 machines (3 masters and 3 slaves) are required to provide services.

In this way, we can conclude that the cost of using Pika to store the same capacity of data is only 5-10% of that of Codis!

Sincere selection advice

We also compared the performance of the single instance of Pika and the single instance of Redis.

When the pressure test command is redis-benchmark -r 1000000000 -n 1000 -c 50, the performance is as follows:

When the pressure test command is redis-benchmark -r 1000000000 -n 1000 -c 100, the performance is as follows:

From the stress test results of the test environment, relatively speaking, in the case of single-instance stress test, Redis performs better; in the scenario of using Pika, it is recommended that the kv type has better performance, and the String type is recommended among the five data structures.

Based on the pressure measurement data and the current network situation, we have summarized the advantages and disadvantages of the two technology stacks Codis + codis-server and Codis + Pika:

For the above comparison, our selection suggestions are as follows:

Summarize

The above is a practical process of using Pika to replace codis-server and realizing massive kv data storage and reading and writing at low cost. The column "Big Data Cost Reduction and Efficiency Improvement" will continue to focus on the balance between performance and cost. We hope our practical experience can help big data practitioners find the optimal solution for big data cost reduction and efficiency improvement more quickly.

How to reduce Codis storage costs by 90%? One push did it with Pika!

Four components of Codis

Challenges of Pika accessing Codis

Problem 1: Inconsistent grammar

Problem 2: The data migration was not completed successfully

1. Design and develop Pika migration tool

2. Use the Pika migration tool for live data migration

Plan effect measurement

Sincere selection advice

Summarize

个推

引用和评论

个推助力小米米家全场景智能生活体验再升级

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

《SQL应用场景解析：如何通过SQL解决实际业务问题》

如何实现页面广告随时上下线、过期自动下线及到时自动上线

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

How to reduce Codis storage costs by 90%? One push did it with Pika!

Four components of Codis

Challenges of Pika accessing Codis

Problem 1: Inconsistent grammar​

Problem 2: The data migration was not completed successfully

1. Design and develop Pika migration tool

2. Use the Pika migration tool for live data migration

Plan effect measurement

Sincere selection advice

Summarize

个推

引用和评论

个推助力小米米家全场景智能生活体验再升级

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

《SQL应用场景解析：如何通过SQL解决实际业务问题》

如何实现页面广告随时上下线、过期自动下线及到时自动上线

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

Problem 1: Inconsistent grammar