6

background

The content like business is a very frequent business scenario in the Dewu community. The function itself is not complicated, but there are many business scenarios, high QPS, and due to the user volume of the community, the overall like data volume is very large. Among them, the core and the highest requirements for response performance are the scenarios of “whether users like content” and “number of content likes”.

In the Dewu community, any scene of content consumption will be processed by the above two like scenes, so the QPS of the overall like business is very high in the community. When we are swiping various feed streams, every time we slide down, we need to judge whether the logged-in user has liked dozens of pieces of content. As a basic business, the high-performance response of the content like business has a great impact on the consumption experience of upstream content scenarios.

This article describes how the praise business of the Dewu community achieves high-performance response and the historical optimization and exploration process of high performance, stability and low cost in the use of cache, hoping to bring some benefits to readers.

Evolutionary exploration

v1.0 version

Functional Requirements

Various feed streams and content details pages in the community are displayed in several scenarios of consumption display, "whether the logged-in user has liked the content", "the total number of likes of the content" and "the list of users who have liked the latest content".

Implementation plan

The overall high performance of the like business is based on the Redis+MySQL architecture. MySQL provides data storage and query support, and Redis supports high-performance business response. In version 1.0, the service architecture is still a single PHP service. In the technical solution, all the users who like users under dynamic conditions are queried and placed in a PHP array, and then serialized into a Json string and stored in Redis in the form of Key/Value. When the user browses the content, the cached data is taken out, the Json is deserialized into a PHP array, and then the in_array and count methods are used to determine whether the content has been liked and the number of likes of the content. In terms of cache maintenance, Redis is directly cleared every time a new user likes or cancels a like.

The cache structure is as follows:
cId => '[uid1,uid2,uid3...]'

The flow chart is as follows:

图片

main problem

There are many points to be optimized in this version of the scheme.

First, when the cache is constructed, it is necessary to query the data of all like users under dynamic conditions. The amount of data is large, and slow SQL is likely to be generated, which may put a lot of pressure on the DB and bandwidth.

Second, the cache storage data structure is a Key/Value structure. Each time it is used, it needs to be queried from Redis, and then deserialized into a PHP array. The in_array() and count() methods have a relatively large overhead, especially for popular queries. When dynamic, the CPU and MEM resources of the server are wasted to a certain extent, and it also generates a relatively large network bandwidth overhead for Redis.

Third, in terms of cache maintenance, the cache will be cleared directly every time a new like is added, and a large number of likes in popular trends will cause cache breakdown, which will result in a large number of DB checkback operations.

v2.0 version

We all know that some hot events are easy to ferment in the community, and this is naturally the case in the Dewu community. In a certain hot event, a number of hot content appeared in the Dewu community in an instant, and a large number of users entered the Dewu community to browse relevant news and like it. Every time there is a new like, the cache will be cleared! When there are a large number of users browsing the hot news, and a large number of users like and the cache is cleared, the risk of cache breakdown is very high, which will lead to a large number of query requests hitting the DB layer. After evaluating the risk, the R&D side , the cache transformation was carried out overnight.

Functional Requirements

1. Solve the risk of cache breakdown of hot content.
2. Optimize the server resource consumption caused by serialization and deserialization of cached data at the code level.

Implementation plan

In this transformation, in addition to optimizing and solving the risk of cache breakdown, it also considered some more efficient implementations for some of the shortcomings of the previous cache itself. In the cache data structure, the previous Key/Value structure is abandoned, and the collection structure is adopted. The characteristics of the collection ensure that the user IDs in the collection will not be repeated, and the total number of likes in the dynamic can be accurately maintained. By checking whether the user is in the collection, it can be efficiently judged whether the user likes the content. This solves the process of obtaining all data from Redis and requiring code to parse Json every time you query. Redis collection supports directly judging whether to like and counting likes through the SISMEMBER and SCARD interfaces, thereby improving the response speed of the entire module and service load. In terms of cache maintenance, every time a new like is added, the user ID is actively added to the collection, and the cache expiration time is updated. Each time you query, the remaining expiration time of the cache will also be queried. If it is less than one-third, the expiration time will be updated again, which avoids cache breakdown when there are a large number of new likes in popular trends. .

The cache structure is as follows:
cid => [uid1,uid2,uid3...]

The flow chart is as follows:

图片

main problem

In the technical solution, all the like records under the dynamic will be found out and put into a collection. When the dynamic is a popular dynamic, the number of like users will be very large. At this time, the collection becomes a big key. The cleaning of large keys has a relatively large impact on the stability of Redis, and may cause Redis jitters due to cache expiration at any time, which in turn causes service jitters. And every time you query all the like users, it is easy to generate slow SQL, and it also puts more pressure on the network bandwidth.

v3.0 version

Functional Requirements

1. Solve the risk of caching large keys in the V2.0 version.
2. Optimize the slow SQL scenario generated by users who like all the query content when the cache is rebuilt.

Implementation plan

In version 3.0, the big key is broken up, and the like users under the same dynamic are broken up into fragments and then maintained in the cache. Each time the cache is operated, the fragment value is first calculated according to the user ID, so that each time Each shard has a smaller size and faster maintenance and response speed. To obtain the total number of likes, the community service has been migrated to the Go service architecture. We have also built a separate counting service to maintain the total number of likes of the content independently, saving the consumption of the scar interface.

The cache structure is as follows:

 cid_slice1 => [uid1,uid11,uid111...] 
cid_slice2 => [uid2,uid22,uid222...] 
cid_slice3 => [uid3,uid33,uid333...] 
...

The flow chart is as follows:

图片

main problem

If you only look at the v3.0 version from the perspective of technical implementation, it seems that it has temporarily reached a level, and it can normally support the high-performance response of the community like business within a certain period of time. However, if considered from the business point of view and the overall concept, this design still has many optimization points. E.g:

The cache shard still maintains all the liked user data under the browsing dynamic, which consumes a lot of Redis resources and increases the difficulty of cache maintenance. The effective usage rate of cached data is very low. In the recommendation flow scenario, the user's browsing dynamics will hardly be browsed again.

The current technical solution is designed for a single piece of content. In various feed streaming scenarios, the query task is actually multiplied by ten times in the like service. This amplification causes a certain amount of resource consumption and waste to the server, Redis, and DB.

For some historical trends with a lot of likes, the cache will be rebuilt when someone accesses them. The reconstruction cost is high, but the usage rate is not high.
The design of cache collection sharding maintains a lot of useless data, and also generates a large number of keys. Keys also occupy memory space in Redis.

 …

To sum up, higher server load, Redis request volume, DB request volume. Very large Redis resource usage (tens of GB).

So we need a better solution to optimize the following phenomena:

1. In the feed stream scenario, the server load caused by the batch query content task amplification, Redis request, and DB request amplification phenomenon.
2. The cache is stored and used more efficiently, reducing the overall usage of the cache.
3. The cache has a higher hit rate.
4. Distinguish between hot and cold data.

The implementation logic in the actual feed scenario:

Batch query dynamic like data

图片

Version V4.0

Functional Requirements

Combined with actual business scenarios, most of the upstream services in the scenario judge whether to like or not in batches, and the dynamics of the community itself also have a certain degree of freshness (hot and cold). The requirements for the new cache are:

1. It can solve the phenomenon of batch query traffic amplification in the feed flow scenario.
2. Cached data distinguishes between hot and cold data, and reduces invalid storage (can distinguish between hot and cold data from the perspective of content and like users).
3. The cache structure should be simple and easy to maintain, and the business implementation should be clear.

Implementation plan

Design ideas:

1. The reason why the batch query task is enlarged is that the previous cache was designed with content as the dimension, and the new solution should be designed with the user as the dimension.

2. In the old scheme, accessing the content like data will rebuild the cache. Some old content rebuilding the cache is cost-effective, and the users who like the content are not always active and will revisit the content. The new scheme has to wait to distinguish between hot and cold data, and cold and hot data. Data is directly accessed to the DB, and cache reconstruction/update maintenance is no longer performed.

3. In the design of maintaining the cache expiration time and extending the expiration time of the old scheme, each time the cache is operated, the ttl interface operation will be performed, and the QPS will be directly x2. The new scheme should avoid ttl operation, but at the same time can maintain the cache expiration time.

4. The cache operation and maintenance should be simple, and it is expected that a Redis interface operation can achieve the purpose.

Therefore, in the selection of the Redis data structure of the new scheme, it can be judged whether it is liked, whether it is hot or cold data, and whether the expiration time needs to be extended. The previous set cannot be satisfied, so we choose the Hash table structure. The user ID is used as the key, and the contentId is used as the field. Considering that the community content ID is increasing, the coententID can represent the hot and cold data to a certain extent. Only a certain amount of contentID is maintained in the cache for a certain period of time, and minCotentnID is added to distinguish the cold For hot data, in order to reduce the invocation of the ttl interface, the ttl field is also added for the user to judge the validity period of the cache and prolong the expiration time of the cache. Three birds with one stone!

The cache structure is as follows:

 {
  "userId":{
    "ttl":1653532653,    //缓存新建或更新时时间戳
    "cid1":1,            //用户近一段时间点赞过的动态id
    "cid2":1,            //用户近一段时间点赞过的动态id
    "cidn":1,            //用户近一段时间点赞过的动态id
    "minCid":3540575,    //缓存中最小的动态id,用以区分冷热,
  }
}

In the actual business scenario, the process is as follows:

图片

Through the flow chart, we can clearly see that the upstream feed stream, a batch query request, has no loop logic. In the optimal case, there is only one Redis operation, and the business implementation is also very simple and clear.

Optimization Results

Before and after optimization, the daily peak QPS of Redis query volume dropped by 20 times.

图片

The average RT of the interface before and after optimization decreased by 10 times.

图片

Before and after optimization, the daily peak QPS of DB query volume decreased by 6 times.

图片

图片

The cache before and after optimization saves about 16G of storage space.

图片

follow-up

Optimization will not end, technology will not stop, and technical solutions will evolve with the evolution of business.

Summarize

In this article, the relevant historical background and technical solutions are analyzed for the exploration and evolution of the cache optimization of the Dewu community's praise business, and there are more details. The optimization of so many versions is based on the risk points and needs in actual business scenarios. The solution of each version is not a perfect solution, and v4.0 is not the final solution, and developers are still needed. Further thinking is also needed to explore better technical solutions.

And with the continuous development and iteration of the business, more scenarios and difficulties will emerge, and we have been on the road of optimization and exploration.

*Text / Shenzhi
@德物科技public account


得物技术
851 声望1.5k 粉丝