Aurora Notes丨Real-time access optimization and practice of tens of billions of data

Author: Aurora Senior Engineer-Bao Li

Abstract

The Aurora push backend tag/alias system stores more than 10 billion pieces of data, and the peak QPS exceeds 500,000. With the development of the business, the storage capacity and access volume are still increasing. Some bottlenecks in the previous system have gradually emerged, so a lot of optimization work has been continuously done in the past one or two years, and finally achieved very good results. Recently, after accumulation and precipitation, this part of the work will be summarized.

background

The current old system mainly stores the mapping relationship between tags/aliases and registration IDs, and uses Key-Value structure storage. Considering that a registration ID may have multiple tags, and a tag also has multiple different registration IDs, this part The data uses the Set data structure in Redis storage; and a registration ID has only one alias, and an alias also has multiple different registration IDs. This part of the data uses the String/Set data structure. Due to the large amount of data in this part, taking into account the storage cost, Redis uses the single-Master mode, and the final data landing uses Pika storage (a Redis-like file storage). The data in Pika and Redis is kept consistent by the business side. When Redis is normally available, read Redis; when Redis is unavailable, read Pika. After Redis is available, restore data from Pika to Redis, and read Redis again. The storage architecture of the old system is as follows:

As can be seen from the above architecture diagram, Redis/Pika adopts a master-slave mode, where Redis has only Master, the configuration management module is used to maintain the master-slave relationship of the Redis/Pika cluster, the configuration is written to ZooKeeper, and the business module is read from ZooKeeper. Take the configuration without making configuration changes. All configuration changes are taken care of by the configuration management module. When you need to migrate, expand, or shrink, you must operate through the configuration management module. The advantages and disadvantages of this old system are as follows:

advantage:
Centralized configuration management, business modules do not need to be configured separately
Read data in Redis to ensure high concurrent query efficiency
Pika master-slave mode ensures that the data is not lost
The configuration management module maintains the mapping relationship between the shard slot and the instance, and routes to the specified instance according to the slot value of the Key

shortcoming:
The data structure stored in Redis and Pika is inconsistent, which increases the complexity of the code
Redis single master mode, when a Redis node is unavailable, the read request penetrates to Pika, and Pika cannot guarantee the query efficiency, which will cause the read request to increase in time or even time out
After Redis failure recovery, data needs to be resynchronized from Pika, which increases the duration of system unavailability, and data consistency requires more complex calculations to ensure
When migrating/expanding/shrinking, you need to manually operate the configuration management module, which is cumbersome and error-prone
Redis stores the same amount of data as Pika, which takes up a lot of memory storage space, and the resource cost is high
There is room for improvement in the availability of the entire system, and failure recovery time can be shortened as much as possible
The configuration management module is a single point, not highly available. When this service is down, the entire cluster is not highly available and cannot perceive the heartbeat status of Redis/Pika
The operation of breaking up a large key needs to be triggered manually. There are too many registered IDs under individual tags in the system, and they are stored on the same instance, which easily exceeds the storage upper limit of the instance, and a single instance limits the read performance of the key

Analysis of the shortcomings of the old system

Taking into account the above shortcomings of the old system, it is mainly solved from the following directions:

The data structure stored in Redis and Pika is inconsistent, which increases the complexity of the code
Analysis: The inconsistency between Redis and Pika data in the old system is mainly due to the inefficient operation of the Set data structure of the early version of Pika, and the higher efficiency of the String data structure operation, but it is necessary to traverse all Pika instances when obtaining all registered IDs under tags/alias. The operation is very time-consuming. Considering that the latest version of Pika has optimized the Set data structure and improved the read and write performance of the Set data structure, the consistency of the Redis and Pika data structures should be maintained.

Redis single master mode, when a certain node of Redis is unavailable, the read request penetrates to Pika, and Pika cannot guarantee the query efficiency, which will cause the read request to increase or even time out
Analysis: Redis single master mode is extremely risky. The master-slave mode needs to be optimized, so that the master-slave switch can be performed when a Master fails, and data is no longer restored from Pika, reducing failure recovery time and reducing the possibility of data inconsistency.

After Redis failure recovery, data needs to be resynchronized from Pika, which increases the duration of system unavailability, and data consistency requires more complex calculations to ensure
Analysis: The recovery time of this system is too long because Redis is in single-master mode and has no persistence. Redis needs to be optimized into master-slave mode and persistence must be turned on, so there is almost no need to restore data from Pika, unless Redis is the master-slave All instances are unavailable at the same time. After the data does not need to be restored from Pika, the data in Redis will be consistent with the data in Pika when the Redis master-slave instance fails.

When migrating/expanding/shrinking, you need to manually operate the configuration management module, which is cumbersome and error-prone
Analysis: There are too many manual interventions in the configuration management module, which is very easy to make mistakes. This part should minimize manual operations. Consider the introduction of Redis sentry, which can replace most of the manual operation steps.

Redis stores the same amount of data as Pika, which takes up a lot of memory storage space, and the resource cost is high
Analysis: Analyze the data volume, access volume and access source of the data in different dimensions in Redis (as shown in the figure below). External request volume (estimated) The data in this column reflects the access volume per unit time of each different Key.

Redis's storage data is mainly divided into two parts: tag/alias to registration ID and registration ID to tag/alias data. Through analysis, the data from tag/alias to registration ID accounts for about 1/3 of the storage space, and the amount of visits accounts for about 1/3 of the storage space. To 80%; the data from the registered ID to the tag/alias accounted for about 2/3 of the storage space, and the amount of access accounted for 20%. It can be seen that the red number part is accessed by Pika, the black part is accessed by Redis, and 3.7% of the data can be optimized to access Redis. Then it can be concluded that the red data is never accessible in Redis. So you can consider deleting the part of the data from the registration ID to the tag/alias in Redis and requesting access to this part of the data to Pika, which can save about 2/3 of the storage space and ensure the read performance of the entire system.

There is room for improvement in the availability of the entire system, and failure recovery time can be shortened as much as possible
Analysis: This part is mainly due to the fact that one of the services is not highly available, and the complexity of the entire system architecture is relatively high, and data consistency is relatively difficult to guarantee, resulting in a long failure recovery time. Consider that all services should be optimized to be highly available , Simultaneously simplify the structure of the entire system.

The configuration management module is single-point, non-highly available. When this service is down, the entire cluster is not highly available and cannot perceive the heartbeat status of Redis/Pika
Analysis: The risk of manual configuration management is also very large. The master-slave relationship of Pika is manually specified through the configuration file. After mismatching, it will cause data confusion and generate dirty data. You need to use the Redis sentinel mode, use the sentinel to manage Redis/Pika, and the configuration management module is no longer Directly manage all Redis/Pika nodes, but through sentinel management. At the same time, when master-slave switchover or node failure occurs, the configuration management module is notified, and the configuration is automatically updated to Zookeeper. There is basically no manual intervention when migrating/expanding/shrinking.

The super key breaking operation needs to be triggered manually. There are too many registered IDs under individual tags in the system, and they are stored on the same instance, which easily exceeds the storage upper limit of the instance, and a single instance limits the read performance of the key
Analysis: This part of the manual operation should be optimized to be automatically triggered, automatically complete the migration, reduce manual intervention, and save labor costs.

Redis Sentry Mode

Redis sentinel provides high availability for Redis/Pika, which can resist certain types of failures without manual intervention. It also supports monitoring, notification, automatic failover, configuration management and other functions:
Monitoring: The sentinel will constantly check whether the master and slave instances are working as expected
Notification: The sentinel can notify the problem instance to the application via Redis Pub/Sub
Automatic failover: If there is a problem with the master instance, you can initiate a failover, upgrade one of the slave instances to the master, and reconfigure the other slave instances as slave instances of the new master instance, and notify the application to use the new master instance
Configuration management: The application will be notified when a new slave instance is created or the master instance is unavailable

At the same time, the sentinel also has a distributed nature. The sentinel itself is designed to work in coordination with multiple sentinel processes. When multiple sentinels reach a consensus on the fact that a given host is no longer available, fault detection will be performed, which reduces errors. Possibility of reporting. Even if not all the sentinel processes are working, the sentinel can still work normally, so that the system can cope with failures.
The Redis sentry + master-slave mode can promptly feed back the health status of the instance when Redis/Pika fails, and perform automatic master-slave switching when necessary. At the same time, it will notify the application subscribed to this message through the sub/pub mechanism of Redis. Therefore, the application program perceives this master-slave switch, and can switch the link to a healthy instance in a short time to ensure the normal operation of the service.

The main reasons for not adopting Redis cluster mode are as follows:
The current storage scale is large, and when the cluster mode fails, the recovery time may be long
Master-slave replication in cluster mode is asynchronous, and data consistency is not guaranteed during failure recovery
In cluster mode, the slave instance cannot provide external queries, only the backup of the master instance
Unable to fuzzy match the key globally, that is, it is impossible to traverse all instances to query a fuzzy matched Key

Final solution

In summary, in order to ensure the high availability of the entire storage cluster, reduce the time of failure recovery, and even achieve zero impact on part of the business when a failure occurs, we adopt the Redis sentinel + Redis/Pika master-slave model, and the Redis sentinel guarantees the entire storage cluster. Highly available, Redis master-slave is used to provide query data from tags/alias to registration ID, Pika master-slave is used to store full data and some registration IDs to query tag/alias data. The business needs to ensure that all Redis and Pika data are fully synchronized. The structure of the new solution is as follows:

From the above architecture diagram, the current Redis/Pika is in a multi-master-slave mode, which is monitored by different multiple sentinel services at the same time. As long as any one of the master-slave instances is available, the entire cluster is available. The Redis/Pika cluster contains multiple master-slave instances. The business side calculates the slot value according to the Key. Each slot is mapped to the instance according to the slot specified by the configuration management module. If the Redis master-slave instance corresponding to a slot is unavailable, the corresponding Pika will be queried, thus ensuring the high availability of read requests for the entire system. The advantages and disadvantages of this new solution are as follows:

advantage:
All services of the entire system, all storage components are highly available, and the availability of the entire system is very high
The failure recovery time is fast. Theoretically, when there is a Redis/Pika master instance failure, it will only affect write requests. The failure time is the interval between sentinel detection; when the Redis/Pika slave instance fails, read and write requests are not affected, and the read service can Automatically switch to the master instance, the failure time is zero, and automatically switch back to the slave instance when the slave instance is restored
Tag/alias storage isolation, read-write isolation, different business isolation, so that they do not interfere with each other, expand and shrink according to different scenarios
Reduce the memory footprint of Redis by about 2/3, and only use 1/3 of the original storage space, reducing resource costs
The configuration management module is highly available, and one of the services is down, which does not affect the high availability of the entire cluster. As long as one of the services is available, the entire system is available
It can be smoothly migrated/expanded/shrinked, and it can be done automatically without the business side operation during the migration, without manual intervention, and the data is synchronized during the expansion/shrinking, and the configuration management module configuration can be modified and the service can be restarted.
The super key breakup operation is automatically triggered, and the entire operation has no perception of the business side, reducing operation and maintenance costs

shortcoming:
When both the master and slave instances of Redis are not used, the entire system fails to write the data corresponding to the slot of this instance. Considering the difficulty of synchronizing Redis data from Pika in real time, and the probability that the master and slave instances are not available is very low, choose to tolerate this situation
Sentinel management increases the complexity of the system. When the Redis/Pika instance is switched between master and slave, the business module is notified that the processing is prone to errors. This part of the function has undergone rigorous testing and long-term online functional inspection

Other optimizations
In addition to the optimization of the above architecture, this optimization also includes the following aspects:

Through IO reuse, the original one instance link per thread is changed to manage multiple links in the same thread at the same time, which improves the QPS of the service and reduces the resource utilization rate during peak periods (such as load, etc.)
The previous old system has multiple modules calling each other, which reduces the coupling between modules and reduces deployment and operation and maintenance costs
The previous old system modules used MQ interaction, which was inefficient, and changed to RPC calls, which improved the call efficiency, ensured the success rate of the calls, reduced data inconsistencies, and made it easier to locate problems
No longer use the customized version of Redis/Pika, you can upgrade to the latest stable version of Redis/Pika as needed
The query module increases the cache when querying large keys, caches the query results, and does not query Redis/Pika before the cache expires, which improves the query efficiency

Outlook
In the future, this system can continue to be improved and optimized from the following aspects:
The storage status of the Big Key is more intelligently managed, and the automatic migration of the Big Key during the setting is added to make the storage more balanced
Develop a reasonable Redis data expiration mechanism to reduce Redis storage and reduce service storage costs
Add collection operation service, realize cross-Redis/Pika instance intersection/union and other operations, and add caching mechanism to improve upstream service access efficiency and push message delivery efficiency

Summary
This system optimization is based on the original storage components, according to the characteristics of services and data, reasonably optimizes the method of calling between services, optimizes the space for data storage, uses Redis as a cache, and only stores the data with a larger amount of access, which reduces resources Usage rate. Pika, as a data landing and carrying requests with a small amount of access, according to the advantages and disadvantages of different storage components, a reasonable allocation of request methods. At the same time, all services are designed to be highly available, which improves system availability and stability. Finally, by increasing the design of caching, the query QPS during the peak period is improved, and the response speed of the system during the peak period is guaranteed without expansion.

Aurora Notes丨Real-time access optimization and practice of tens of billions of data

极光JIGUANG

引用和评论

AIGC | 如何用“Flow”，轻松解决复杂业务问题

【赞奇实测】DeepSeek 满血版 8卡 H20 141GB 并发压力测试，体验极致性能！

【成功解决】JetBrains PyCharm 激活提示 “Key is invalid” (秘钥无效) 的终极解决方案

个人博客目录在此

【前瞻技术布局】打破"沙漏“现象→提高生成式搜索/推荐的上限

好用的开源埋点方案-ClkLog埋点用户分析系统

Y 分钟速成 zfs