Wechat team sharing: how the backend of Wechat does not crash under massive concurrent requests

This article cites the content of the article "How does WeChat with 1.28 billion monthly active users prevent crashes?" and the paper "Overload Control for Scaling WeChat Microservices", with numerous changes, optimizations, and revisions.

1 Introduction

WeChat is a national-level instant messaging IM application. The monthly active users have already exceeded 1 billion, and the volume of chat messages often increases during the Chinese New Year and festivals. The service is easily overloaded, but the fact is that the background of WeChat is The service has been relatively stable, so how did they do it?

Based on the paper "Overload Control for Scaling Wechat Microservices" published by WeChat (see the attachment at the end of the article for the original PDF download of the paper), this article shares WeChat's background overload control and protection strategy based on a large-scale micro-service architecture, as well as WeChat's strategy based on the characteristics of IM services. Some unique architectural design practices, many of which are instructive and worth reading.

(This article has been published simultaneously at: http://www.52im.net/thread-3930-1-1.html )

2. Concurrency pressure faced by WeChat

Before the publication of the paper "Overload Control for Scaling Wechat Microservices", there were more than 3,000 services in the backend of WeChat (including instant chat, social relations, mobile payment and third-party authorization, etc.), occupying more than 20,000 machines (with the widespread use of WeChat popularity, these numbers are still increasing).

The entry service for front-end requests needs to process 1 billion to 10 billion requests per day, and each such request will trigger more internal related services. Overall, the WeChat backend needs to process hundreds of millions of requests per second. .

With the continuous development of WeChat, these service subsystems have been rapidly updated and iterated. Taking March to May 2018 as an example, in just two months, the service subsystems of WeChat experienced an average of nearly 1,000 changes per day, and the pressure on operation and maintenance can be imagined.

In addition: the distribution of WeChat's daily request volume is very uneven, and the peak request volume can reach three times the usual. On special days (such as Chinese New Year), the traffic during peak periods can soar to 10 times as much as usual. Sometimes there is a screen-swiping activity in the circle of friends, and the traffic will definitely increase suddenly. It can be seen that the concurrent pressure on the WeChat back-end system is quite large.

Moreover: the environment in which these services in the WeChat backend are constantly changing, including hardware failures, code bugs, system changes, etc., will lead to dynamic changes in the capacity the services can afford.

3. WeChat's back-end service architecture

The backend of WeChat also adopts the microservice architecture. Speaking of microservices, in fact, I understand that it is an independent service built with a unified RPC framework, and the services call each other to realize various functions. This is also the basic architecture of modern services. After all, no one wants to see my circle of friends collapsed, which makes it impossible to chat with me. This is also a typical benefit of WeChat.

The microservice architecture of WeChat backend is generally divided into three layers:

As shown in the figure above, the three layers of services are:

1) "Entry Springboard" service (front-end service that receives external requests);
2) "shared springboard" service (middle layer coordination service);
3) "Basic service" (a service that no longer makes requests to other services, ie acts as a receiver of requests).

Most of the services in the backend of WeChat belong to "shared springboard" services, such as "entry springboard" services such as login, sending chat messages, payment services, etc. "Basic services" are the information data interface classes that are best understood on a daily basis, such as account data, personal information, friend/contact information, etc.

According to the request volume of WeChat's back-end service (between one billion and ten billion per day), the entry protocol triggers more requests for the "shared springboard" service and "basic service", and the core service needs to process hundreds of millions of requests per second. The request is obvious.

4. What is overload protection

1) What is service overload?

Service overload means that the request volume of the service exceeds the maximum value that the service can bear, resulting in excessive server load and increased response delay.

The performance on the user side is that it cannot be loaded or loaded slowly, which will cause the user to retry further. The service has been processing invalid requests in the past, resulting in a drop of 0 for valid requests, and even an avalanche of the entire system.

2) Why does service overload occur?

The Internet is born with sudden traffic, spikes, snap purchases, major emergencies, festivals, and even malicious attacks, which will cause services to bear several times the usual pressure. Scenario, this is service overload.

3) The benefits of overload protection

Overload protection is mainly to improve user experience, ensure service quality, and still be able to provide part of the service capability when burst traffic occurs, rather than paralyzing the entire system.

System paralysis means loss of users, poor reputation, husband and wife quarrels, and even life-threatening threats (if the Tencent document crashes, this document is used for disaster relief).

The WeChat team is facing the challenge of high concurrent requests of this magnitude, and the approach is refined service overload control. We continue to learn.

5. Overload control technical challenges faced by WeChat

Overload control is critical for large-scale online applications that require 24×7 service availability under unpredictable load surges.

Traditional overload control mechanisms are designed for systems with a small number of service components, relatively narrow "front doors", and common dependencies.

The full-time online service feature of a modern instant messaging IM application such as WeChat is becoming more and more complex in terms of architecture and dependencies, far exceeding the design goals of traditional overload control.

These technical pain points include:

1) Since there is no single entry point for the service request sent to the WeChat backend, the traditional centralized load monitoring method of the global entry point (gateway) is not applicable;
2) The service call graph for a particular request may depend on request-specific data and service parameters, even for requests of the same type (so, when a particular service is overloaded, it can be difficult to determine which types of requests should be throttled to mitigate this) case);
3) Too many request aborts waste computing resources and affect user experience due to high latency;
4) Since the service call chain is extremely complex and constantly evolving, the maintenance cost and system overhead of effective cross-service coordination are too high.

Since a service may make multiple requests to services it depends on, and may also make requests to multiple backend services, we must pay special attention to overload control. We use a specialized term called "subsequent overloads" to describe situations in which multiple overloaded services are invoked or a single overloaded service is invoked multiple times.

"Subsequent overload" presents challenges for effective overload control. Randomly performing load shedding when the service is overloaded can maintain the system at saturated throughput, but subsequent overloads may greatly reduce system throughput beyond expectations...

That is, in a large-scale microservice scenario, overloading will become more complicated. If it is a single service, only one request is required for an event, but in a microservice, an event may request many services, and any service overload fails. All other requests will be invalid. As shown below.

For example: in a transfer service, you need to query the card numbers of the two, and then query A succeeds, but query B fails, and the event of checking the card number is a failure. For example, the query success rate is only 50%, then the success rate for querying the two card numbers is only 50% * 50% = 25%. The more times the service is called for an event, the lower the success rate will be.

6. WeChat overload control mechanism

WeChat's microservice overload control mechanism is called "DAGOR" (because WeChat calls its service relationship model "directed acyclic graph", or DAG for short).

Obviously, the underlying mechanism of this microservice must be independent of the specific business implementation. DAGOR must also be decentralized, otherwise it is difficult to achieve real-time and accurate overload control under such a large and unevenly distributed traffic as WeChat. At the same time, it cannot adapt to the rapid iterative release of functions of microservices (on average, nearly 1,000 microservices go online and offline every day).

In addition, DAGOR also needs to solve a problem: the service call chain is very long. If the underlying service discards the request due to overload protection, the resources consumed by the upper service will be wasted, and the user experience will be greatly affected (think about the progress bar reaching 99% to tell you Failed). Therefore, the overload control mechanism must have synergy between services, and sometimes the entire call chain needs to be considered.

First let's see how to detect service overload.

7. How does WeChat judge overload

Generally, overload can be judged using throughput, latency, CPU usage, packet loss rate, number of pending requests, request processing events, and so on.

WeChat uses the average waiting time of requests in the queue as a criterion. The average wait time is the time from when a request arrives to when it starts processing.

Why not use response time? Because the response time is related to the service, many microservices are called in a chain, the response time is uncontrollable and cannot be standardized, and it is difficult to use it as a unified judgment basis.

So why not use the CPU load as a criterion? Because the high CPU load does not mean that the service is overloaded, because a service request is processed in a timely manner, the CPU is at a high level, but it is a relatively good performance. In fact, if the CPU load is high, the monitoring service will warn you, but it will not directly enter the overload processing process.

The default timeout time of Tencent Microservice is 500ms. By calculating whether the average waiting time per second or every 2000 requests exceeds 20ms, it is judged whether it is overloaded. This 20ms is the threshold value based on the five years of WeChat background.

Another advantage of using the average waiting time is that it is independent of the service and can be applied to any scenario without being associated with the business, and can be directly transformed on the framework.

When the average waiting time is greater than 20ms, some requests will be filtered and adjusted with a certain slowdown factor. If the average waiting time is judged to be less than 20ms, the pass rate will be increased at a certain rate. Generally, a strategy of fast drop and slow rise is adopted to prevent large service fluctuations. , the whole strategy is equivalent to a negative feedback circuit.

8. WeChat overload control strategy

Once the WeChat backend detects that the service is overloaded, it needs to filter and control the requests according to a certain overload policy to determine which requests can be processed by the overloaded service and which ones need to be discarded.

As we have analyzed earlier, for the microservice scenario of chained calls, randomly discarding requests will lead to a very low success rate of the overall service. Therefore, requests are controlled according to priority, and requests with low priority will be discarded first.

So from which dimensions should the optimization level be graded?

8.1 Business-based priority control For WeChat, different business scenarios have different priorities, such as:

1) The login scene is the most important business (if you can't log in, everything is blind);
2) Payment messages have higher priority than ordinary im chat messages (because users are more sensitive to money);
3) Ordinary messages have higher priority than messages in the circle of friends (after all, the essence of WeChat is still IM chat).

Therefore, there is a natural business priority in WeChat.

WeChat's approach is to pre-define all business priorities and save them in a Hash Table:

For services that are not defined, the default is the lowest priority.

The business priority is found in the request meta information in the Entry Services of each business. Since the success of a request depends on all subsequent requests of its downstream services, all subsequent requests of downstream services will also carry the same business priority. When the service is overloaded, requests with higher priority are processed and requests with lower priority are discarded.

However, only using business priorities to decide whether to discard requests can easily cause system turbulence, such as:

1) The sudden increase in payment requests causes overload, and message requests are discarded;
2) After discarding the message request, the system load is reduced, and the message request is processed again;
3) However, processing message requests again overloads the service and discards message requests in the next window.

In this way, the service request control is repeatedly adjusted, and the overall experience is very bad. Therefore, WeChat needs more refined service request control.

PS: WeChat tried to provide APIs for service providers to modify business priorities by themselves. Later, in practice, it was found that this approach was extremely difficult to manage in different teams, and it was prone to errors in overload control, and finally gave up.

8.2 User-Based Priority Control It is clear that, as discussed in the previous section, business-priority-based control is not sufficient:

1) First of all, it is impossible to discard or allow the request of an entire business because of high load, because the request volume of each business is very large, which will definitely cause large fluctuations in the load;
2) In addition, if requests are randomly discarded in the business, the overall success rate will be very low in the case of overload.
To solve this problem, WeChat introduces user priority.

WeChat has 128 priorities calculated by user ID within each business priority:

First of all, the user priority should not be the same. For ordinary people, the user priority is calculated by hashing the user's unique ID (this hash function changes every hour, so that all users have the opportunity to enjoy high priority in a relatively long period of time. , guaranteeing "fairness"). Like business priority, the priority on the access chain of a single user is always the same.

Here is a question: why not use the session ID to calculate the priority?

In theory, the effect of using session ID and user ID is the same, but using session ID to refresh when the user logs in again, the user's priority may change at this time. In an overloaded situation, he may have recovered by raising the priority.

In this way, users will develop bad habits and log in again when there is a problem with the service, which will undoubtedly further aggravate the overload of the service.

Therefore, because the user priority is introduced, it forms a two-dimensional control plane with the service priority. According to the load situation, the admission priority (B, U) of this server is determined. When the incoming request service priority is greater than B, or the service priority is equal to B, but the user priority is higher than U, it will pass, otherwise it will be rejected. .

The following figure is this "priority (B, U)" control logic (we will discuss it in detail later):

8.3 Adaptive Priority Adjustment In large-scale microservice scenarios, the server load changes very frequently. Therefore, the access priority of the server needs to be dynamically changed. WeChat has dozens of business priorities, and each business priority has 128 user priorities, so the total priority is several thousand.

How to adjust the priority according to the load situation?

The easiest way is to traverse from right to left: each adjustment judges the load situation. This time complexity is O(n). Even if the dichotomy is used, the time complexity is O(logn). Under thousands of priorities, it may take dozens of adjustments to determine a suitable priority. Each adjustment Well, if you count the priority again, it may take dozens of seconds. This method is undoubtedly very inefficient.

WeChat proposes a method based on histogram statistics to quickly adjust the admission priority: the amount of requests for each priority in the past cycle (1s or 2000 requests) under the current admission priority of the maintainer on the server. When overloaded, the load is lightened by shedding the amount of requests for the next cycle. Assuming that the sum of the passed requests of all priorities in the previous cycle is N, the number of requests in the next cycle should be reduced by N*a. How to reduce it? Each time the priority is increased, a certain amount of requests is reduced until the number of reductions is greater than the target amount, and the opposite method is used to restore the load, except that the coefficient is b, which is smaller than a, but also for fast decline and slow rise. According to the empirical value a is 5%, b is 1%.

In order to further reduce the pressure on overloaded machines, can the request not be sent downstream when the downstream is overloaded? Otherwise, the downstream still has to accept the request, unpack, and discard the request, which wastes bandwidth and increases the downstream load.

In order to achieve this capability: every time a downstream service is requested, the downstream returns the access priority of the current service to the upstream, and the upstream maintains the access priority of the downstream service. If it is found that the request priority cannot meet the access threshold of the downstream service , directly discarded without requesting the downstream, further reducing the pressure on the downstream.

9. Experimental data

WeChat's service overload control strategy (ie DAGOR) has been operating in WeChat's production environment for many years, which is the best proof of its design feasibility.

But it did not provide the necessary diagrams for academic papers, so WeChat conducted a set of simulation experiments at the same time.

The chart below highlights the benefits of overload control based on queue time rather than response time. These benefits are most pronounced in the event of subsequent overloading (right).

10. Summary

The entire overload control logic flow of WeChat is shown in the following figure:

For the above picture, let's interpret it:

1) When a user initiates a request from WeChat, the request is routed to the access layer service, and a unified service and user priority are assigned, and all word requests to the downstream inherit the same priority;
2) Call one or more downstream services according to the business logic. When the service receives a request, it first determines whether the request is accepted or discarded according to its own service admission priority (the service itself periodically adjusts the admission priority according to the load situation);
3) When the service needs to initiate a request to the downstream again, determine the access priority of the downstream service recorded locally (if it is less than that, it will be discarded, and if there is no record or the priority is greater than the record, the request will be sent to the downstream);
4) The downstream service returns the information required by the upstream service, and carries its own admission priority in the information;
5) The upstream parses the information after receiving the return, and updates the downstream service admission priority recorded locally.

WeChat's entire overload control strategy has the following three characteristics:

1) Business-independent: use request waiting time instead of response time, and formulate user and business priorities, which have nothing to do with the business itself;
2) Efficient and fair: The priority of the request chain is consistent, and the hash function will be changed regularly to adjust the user priority. In the case of overload, it will not always affect fixed users;
3) Combination of independent control and joint control: The admission priority depends on the independent service, but it can be combined with the downstream services to optimize the performance when the service is overloaded.

11. Write at the end

The WeChat team's sharing only mentioned overload control, but I believe that the service caller should have some other mechanisms that can solve the problem of request timeout not caused by overload of downstream services, but by network jitter.

The service-independent, decentralized, efficient and fair features provided by this microservice overload control mechanism (ie DAGOR) of WeChat have been running well in the backend of WeChat for many years.

Finally, the WeChat team also shared their valuable experience in designing and operating DAGOR:

1) Overload control in a large-scale microservice architecture must be decentralized and autonomous in each service;
2) Overload control should take into account various feedback mechanisms (such as DAGOR's cooperative admission control) rather than relying solely on open-loop heuristics;
3) The overload control design should be understood by analyzing the actual workload.

12. References

[1] Overload Control for Scaling WeChat Microservices
[2] Luo Shen's interpretation of "Overload Control for Scaling WeChat Microservices"
[3] With 2W servers and hundreds of millions of requests per second, how can WeChat not "run out of control"?
[4] DAGOR: WeChat microservice overload control system
[5] How does WeChat with a monthly active user of 1.28 billion prevent it from collapsing?
[6] Technical challenges and practical summary behind the 100 billion visits of WeChat Moments
[7] QQ 18 years: Decryption of 800 million monthly active QQ background service interface isolation technology
[8] Design practice of massive data cold and hot hierarchical architecture based on time series in WeChat backend
[9] The way of architecture: 3 programmers achieve an average daily release of 1 billion WeChat Moments [with video]"
[10] Rapid Fission: Witness the evolution of WeChat’s powerful background architecture from 0 to 1 (1)
[11] A summary note on the technical architecture of the WeChat background

13. The original text of the paper

Please download this attachment for the thesis PDF:
(Because the attachment cannot be uploaded, please download it from this link: http://www.52im.net/thread-3930-1-1.html at the end of the article "References" attachment)

An overview of the full content of the paper PDF:

(This article has been published simultaneously at: http://www.52im.net/thread-3930-1-1.html )

Wechat team sharing: how the backend of Wechat does not crash under massive concurrent requests

1 Introduction

2. Concurrency pressure faced by WeChat

3. WeChat's back-end service architecture

4. What is overload protection

5. Overload control technical challenges faced by WeChat

6. WeChat overload control mechanism

7. How does WeChat judge overload

8. WeChat overload control strategy

9. Experimental data

10. Summary

11. Write at the end

12. References

13. The original text of the paper

JackJiang

引用和评论

小红书APP的全新鸿蒙NEXT端性能优化技术实践

即时通讯安全篇（一）：正确地理解和使用Android端加密算法

全民AI时代，大模型客户端和服务端的实时通信到底用什么协议？

深入浅出微服务基础设施：服务架构的演进历史

融云数据监控平台「北极星」教程，聊天室洪峰、连接异常、消息未达正确解法

深入浅出微服务基础设施: 微服务核心组件

Linux版微信的正确打开方式