头图

This article was shared by Wang Hao, the development engineer of the ByteDance technical team, and there are many revisions.

1 Introduction

For users in the mobile Internet era, short video applications are no longer just watching videos, especially the head application Douyin, which is already a new type of social product in addition to traditional IM instant messaging software.

For the Chinese New Year, the heaviest festival of the year, red envelopes are an essential holiday-specific social element, and Douyin will naturally not be missed. During the Spring Festival event in 2022, Douyin will combine video and Spring Festival red envelopes, and users can send blessings to fans and friends by shooting videos and sending red envelopes.

This article will share the various technical challenges brought by the massive red envelope social activities during the Spring Festival to Douyin, and how the Douyin technical team solved these problems one by one in practice.


study Exchange:

  • Introductory article on mobile IM development: "One entry is enough for beginners: developing mobile IM from scratch"
  • Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK (click here for alternate address)

(This article has been published simultaneously at: http://www.52im.net/thread-3945-1-1.html )

2. Series of articles

"Decryption of Social Software Red Packet Technology (1): Comprehensive Decryption of QQ Red Packet Technical Solution - Architecture, Technical Implementation, etc."
"Deciphering of social software red envelope technology (2): Deciphering the technological evolution of WeChat shake red envelope from 0 to 1"
"Social Software Red Packet Technology Decryption (3): The Technical Details Behind WeChat Shake the Red Packet Rain"
"Decryption of Social Software Red Envelope Technology (4): How Does WeChat Red Envelope System Cope with High Concurrency"
"Decryption of Social Software Red Envelope Technology (5): How the WeChat Red Envelope System Achieves High Availability"
"Decryption of Social Software Red Envelope Technology (6): The Evolution Practice of Storage Layer Architecture of WeChat Red Envelope System"
"Decryption of Social Software Red Packet Technology (7): Massive and High Concurrency Technology Practice of Alipay Red Packet"
"Decryption of Social Software Red Envelope Technology (8): Comprehensive Decryption of Weibo Red Envelope Technical Scheme"
"Social software red envelope technology decryption (9): talk about the design, disaster recovery, operation and maintenance, architecture, etc. of the mobile QQ Spring Festival red envelope"
"Social Software Red Packet Technology Decryption (10): Technical Practice of Mobile QQ Client for the 2020 Spring Festival Red Packet"
"Decryption of Social Software Red Packet Technology (11): The Most Completely Decrypted WeChat Red Packet Random Algorithm (including demo code)"
"Decryption of Social Software Red Envelope Technology (12): Decrypting the Technical Design and Practice Behind Douyin's Spring Festival Red Envelope" (* this article)

3. First look at the red envelope business gameplay

The entire red envelope activity of Douyin is divided into two ways of playing: B2C and C2C. The following is a brief introduction to the process of these two ways of playing, and it is also convenient for readers to understand the subsequent technical problems and solutions.

3.1 B2C Red Packet In the B2C Red Packet game, users need to first come to Douyin or Douyin Lite to participate in the Spring Festival Red Packet Rain event, and there is a certain probability to receive a red packet subsidy in the Spring Festival Red Packet Rain event.

Users can jump directly to the camera page after receiving the subsidy, or jump to the camera page after shooting a video. On the camera page, the user will see a red envelope pendant after shooting the video, and in the pendant, you can see that the subsidy has been issued.

After the user selects the subsidy and clicks the next step to complete the submission, the distribution of the video red envelope can be completed.

As shown in the picture above, from left to left: "Picture 1" is the Spring Festival red envelope rain event, "Picture 2" is the red envelope subsidy, "Picture 3" is the red envelope pendant, and "Picture 4" is the B2C red envelope sending tab page.

3.2 C2C Red Packet In the C2C Red Packet game, the user shoots a video and clicks the pendant, fills in the amount and number of red packets, selects the red packet collection range, and clicks to send the red packet to pull up the cashier. After the user completes the payment, click Next to publish the video. The distribution of C2C red envelopes can be completed.

As shown in the figure above, from left to left: "Picture 1" is the C2C red envelope sending tab page, "Picture 2" is the payment interface, and "Picture 3" is the display of the pendant after the red envelope payment.

3.3 Red Packet Collection
The red envelope collection process is the same for both B2C and C2C.

When a user encounters a video with a video red envelope while swiping a video on Douyin, there is a red envelope button below the video. The user clicks the red envelope to receive it, and it will pop up to the red envelope cover.

After the user clicks on the red envelope to open the red envelope, the red envelope can be received. After the red envelope is successfully received, a pop-up window of the receiving result will be displayed. In the receiving result, the user can view the receiving amount and jump to the receiving details page. You can see this on the receiving details page. The luck of receiving red envelopes from other users.

As shown in the picture above, from left to left: "Picture 1" is the video of the red envelope, "Picture 2" is the cover of the red envelope, "Picture 3" is the result of receiving the red envelope, and "Picture 4" is the details of the red envelope.

Finally, the C2C video red envelope has a promotional video, so you can better understand the overall operation process: (click the video below to play ▼)

4. Technical challenges

4.1 The design of the general red envelope system was mentioned in the previous section. This Spring Festival event (referring to the 2022 Spring Festival) needs to support both B2C and C2C red envelopes.

These two types of red envelopes have some similar businesses as well as many different businesses.

On the same point: they all include the two operations of issuing and receiving red envelopes. At different points: for example, the red envelope distribution of B2C needs to be sent through the use of subsidies, while the red envelope distribution of C2C requires users to complete the payment. B2C red envelope users need to withdraw cash after receiving it, while C2C red envelope users will go directly to the change after receiving it.

Therefore, it is necessary to design a general red envelope system to support multiple red envelope types.

In addition: For the red envelope system itself, in addition to issuing and receiving red envelopes, it also involves the query of some red envelope information and the advancement of various state machines. How to divide these functional modules is also a point that needs to be considered.

4.2 The issue of large-flow subsidy distribution and processing As mentioned earlier: B2C red envelope games will first distribute subsidies. During the Spring Festival activities, a large number of users will enter and participate in each red envelope rain. If these traffic are directly sent to the database, a large amount of database resources will be required. During the Spring Festival, the database resources are very scarce, and how to reduce the resource consumption of this part is also a problem that needs to be considered.

4.3 The problem of selection of the red envelope collection scheme In the red envelope business, receiving is a high-frequency operation.

In the design of the receiving method, the business scenario needs to consider whether a red envelope will be received by multiple users at the same time. Multiple users receiving the same red envelope at the same time may cause hot account issues and become a bottleneck for system performance.

There are also several solutions to solve the problem of hot accounts. We need to choose the appropriate solution based on the business scenario characteristics of video red envelopes.

4.4 Stability and Disaster Recovery In this Spring Festival event, there are two business processes, B2C and C2C, in which each business traffic link depends on many downstream services and basic services.

In such a large-scale event, if a black swan event occurs, how to quickly stop the loss and reduce the overall impact on the system is a problem that must be considered.

4.5 Fund security assurance issues During the Spring Festival, B2C will issue a large amount of red envelope subsidies. If the subsidy is over-issued, or there is a problem with the write-off of the subsidy, a subsidy is written off multiple times, which will cause a lot of capital losses.

In addition, C2C also involves the inflow and outflow of users' funds. If the user finds that the money has decreased after receiving the red envelope, it may also cause a large number of customer complaints and capital losses.

Therefore, it is necessary to make adequate preparations for financial security.

4.6 Stress testing of the red envelope system In the traditional stress testing method, we generally perform stress testing on a high-traffic interface to obtain the bottleneck of the system.

However, in the red envelope system, the user's issuance and checking are carried out at the same time, and these interfaces are also interdependent. For example, the red envelope needs to be issued first, and the red envelope can be triggered to receive multiple people after receiving it. To view the receiving details.

If the traditional single-interface pressure measurement method is used, it will be very difficult to mock the data first. The pressure measurement data corresponding to the payment needs to be specially generated because it involves the real name, and the pressure measurement of a single interface and a single interface is difficult to obtain the real bottleneck of the system.

Therefore, how to perform stress measurement on the entire link of the system to obtain the accurate bottleneck of the system is also a problem that we need to solve.

After recognizing the technical challenges and technical problems we are facing, we will share how we solve these problems in practice.

5. Design practice of general red envelope system

For the red envelope system, the core includes three operations:

1) Red envelopes sent;
2) Red envelope collection;
3) Refunds for unclaimed red envelopes.

In addition: we will also need to check some red envelope information and information received. For the three core operations of sending, receiving and refunding, we need to maintain their state. At the same time, in our business scenario, there is also the issuance of B2C-specific subsidies, and we also need to maintain the status of subsidies.

After the initial introduction of the red envelope system above, you can see several functional modules of the red envelope:

1) issue;
2) receive;
3) Refund;
4) Subsidy distribution;
5) Various information inquiries;
6) Maintenance of state machine, etc.

After sorting out the functions of the red envelopes, we began to divide the modules of the red envelopes.

Module division principle:

1) Functional cohesion, each system only handles one task (convenient for subsequent system development and iteration, as well as problem troubleshooting);
2) The API gateway layer only performs simple proxy processing;
3) Asynchronous task dismantling;
4) Separation of reading and writing, dividing the core operation of the red envelope and the query of the red envelope into two services.

Module division result:

1) Red envelope gateway service: HTTP API gateway, external docking client and h5, internal encapsulation of various system rpc interfaces, current limiting, permission control, downgrade and other functions;
2) Red envelope core services: mainly carry the core functions of red envelopes, including red envelope issuance, collection, refund, and red envelope subsidy distribution, maintenance of red envelope state machine, and red envelope status advancement;
3) Red envelope inquiry service: mainly carries the red envelope inquiry function, including red envelope details, red envelope sending status, red envelope receiving status, red envelope receiving details, red envelope subsidy information;
4) Red envelope asynchronous service: It mainly carries red envelope asynchronous tasks to ensure the flow of the state machine, including red envelope transfer, red envelope refund, and red envelope subsidy status advancement;
5) Red Packet Basic Service: It mainly carries the public calls of each system of Red Packet, such as operations on DB, redis, tcc, public constants and tools, which are equivalent to the basic toolkit of Red Packet;
6) Red envelope reconciliation service: It mainly carries the reconciliation logic of red envelopes and finance, and reconciles accounts with finance on a daily basis.

Finally, the system architecture of the entire video red envelope is shown in the figure:

6. The issuance and processing practice of large-flow subsidies

6.1 Synchronous reward distribution In the red packet subsidy distribution chain process, in order to cope with the large traffic during the Spring Festival, the entire chain process has also undergone several iterations of the plan.

In the initial design, we followed the synchronous subsidy distribution process. The upstream link calls the red envelope system interface to issue coupons. After the coupons are issued successfully, the user perceives that the coupons have been issued successfully and can use the coupons to issue red envelopes.

The overall process of the initial plan is as follows:

One problem with the above solution is that during the Spring Festival event, the entire link needs to be able to handle the total traffic during the event, and eventually the traffic will hit the database, and database resources are also in short supply during the Spring Festival.

6.2 Asynchronous reward distribution In order to solve the problem of synchronous reward distribution, we changed the overall process to shaving peaks through MQ, thereby reducing the downstream traffic pressure.

It is equivalent to changing from synchronous to asynchronous. After the user participates in the activity, an encrypted Token will be issued to the client for the display of the client and the interaction with the server.

The asynchronous coupon issuance scheme of the event is as follows:

This solves the problem of large traffic, but introduces other problems accordingly. . .

In the original plan: the user's red envelope subsidies will be stored in the red envelope system first, and we can find the corresponding records in the red envelope database for subsequent user queries and write-offs of subsidies.

But in the asynchronous method: it is estimated that it takes 10 minutes for the entire subsidy to be credited, and the user may start to use the subsidy to issue video red envelopes immediately after sensing the issuance of the coupons on the APP interface, or go to the red envelope pendant to check the red envelopes they have received. Subsidies, but at this time the subsidy has not been recorded in the red envelope system.

6.3 Final solution In order to solve the above problems, we have modified the entire logic of video red packet distribution and red packet subsidy query. That is, when users use the red envelope subsidy to distribute video red envelopes, we will first perform a storage operation for the subsidy, and only after the storage is successful can the subsidy be used for red packet distribution.

In addition, for the query interface: we cannot perceive whether all subsidies are fully accounted for, so every time we query, we need to go to the reward issuing end to query the full Token list. At the same time, we also need to query the user's subsidies in the database, and perform a merge operation on these two parts of data to get the full list of subsidies.

In the above process: in order to solve the problem of delay in MQ asynchronous, we actively enter the account when the user makes a request, and the user's active operations include using subsidies to issue red envelopes and query subsidies.

Why do we only enter the account when the subsidy is distributed and not when the subsidy is inquired?

Because the user's query behavior is a high-frequency behavior and involves batch operations, we cannot sense whether the subsidy is credited before operating the DB, so it will involve batch processing of the DB, and even we need to repeat every time the user comes to query This operation will cause a lot of waste of DB resources.

However, when the subsidy is issued, it is a low-frequency operation. For a single subsidy operation, we only need to enter the account when the user is writing off. This can greatly reduce the pressure on the database and save database resources.

7. The selection practice of the red envelope collection scheme

Among the technical solutions for receiving video red envelopes, we also have some options and considerations, which we will share with you here.

7.1 Pessimistic locking scheme

As shown in the figure above: it is also the most common idea (we call it scheme 1), lock the red envelope in the database when the user receives it, then deduct the amount, and then release the lock to complete the red envelope collection.

The advantages of this scheme are clear and clear, but the problem of this scheme will cause multiple users to receive red envelopes at the same time, which will cause database row lock conflicts and need to wait in line.

When there are too many queued requests, it will cause waste of database links and affect the performance of the overall system. At the same time, no feedback is received from the upstream for a long time, which leads to a timeout. The user side may keep retrying, resulting in the exhaustion of the overall database link, resulting in a system crash.

7.2 Red Packet Pre-splitting Scheme The problem of Scheme 1 is that multiple users can receive lock conflicts at the same time, but unlocking lock conflicts can be divided into finer granularity, thereby increasing the concurrent amount of receiving a single red packet.

The specific plan is as follows (we call it plan 1):

In Scheme 2, a change is made to the process of issuing red packets, that is, when red packets are issued, the red packets will be pre-split, and the red packets will be split into multiple red packets, thus completing the refinement of the lock granularity. When users receive red envelopes, the previous competition for a single red envelope lock has changed to the current allocation of multiple red envelope locks.

Therefore, when receiving red envelopes, the problem becomes how to distribute red envelopes to users.

A common idea is that when a user requests to receive a red envelope, a serial number is generated by the redis self-increment method, and the serial number corresponds to the red envelope that should be received. However, this method strongly relies on redis. When the redis network is jittery or the redis service is abnormal, it needs to be downgraded to query the red envelopes that the DB has not received to obtain the serial number. The overall implementation is more complicated.

7.3 Final solution In the video red envelope scenario, the entire business process is that the user shoots a video and sends a red envelope, and then the red envelope is triggered when the video is swiped in the video recommendation feed stream.

Compared with the typical IM group chat scenarios such as WeChat and Feishu, the concurrent number of receiving the same red packet in the video red packet is not very high, because the user's operation of swiping the video and the feed stream itself completes the breakup of traffic. . Therefore, for video red envelopes, the number of concurrent receipts is not very high.

From a business point of view: in terms of demand realization, we need to be able to obtain the information on the number of unclaimed red envelopes after the user has completed the collection and send it to the user for display. Option 1 is very convenient to obtain inventory of red envelopes, while Option 2 is more troublesome to obtain inventory.

In addition, from the perspective of system development complexity and disaster tolerance: Scheme 1 is relatively a more suitable choice. But we need to deal with the risks in the first option. We need other ways to protect DB resources and minimize lock conflicts.

The specific plans are as follows:

1) Red envelope redis current limit:

In order to reduce DB lock conflicts as little as possible, the current will be limited according to the red envelope number, and the request amount of the remaining red envelopes * 1.5 is allowed to pass each time.

The current limit returns a special error code, and the front-end trains up to 10 times. In the case of too many requests, it will be processed slowly in this way.

2) Memory queue:

In addition to the redis current limit, in order to reduce DB locks, we add a red envelope memory lock to the collection process.

For a single red envelope, only the request to obtain the memory lock can continue to request the DB, so that the conflict of the DB lock is migrated to the memory for processing in advance, and the memory resource is very cheap compared with the DB resource, when the request volume is too large , we can scale horizontally.

To implement memory locks, we made several changes.

First of all: It is necessary to ensure that the same red envelope request can be sent to the same tce instance. Here we have adjusted the gateway layer routing. When the gateway layer calls the downstream service, the routing strategy will be implemented according to the red envelope number to ensure the request of the same single number. hit the same instance.

In addition: We have implemented a set of memory locks based on the channel in the core service of the red envelope system, and the memory lock corresponding to the red envelope will be released after the red envelope is received.

Finally: In order to prevent the memory usage of the lock from being too large or not being released in time, we set up a timed task to process it regularly.

3) Asynchronous transfer:

From the perspective of interface time consumption: transfer is a time-consuming operation. It involves interaction with third-party payment institutions, and there will be cross-computer room requests, and the response delay will be long. Asynchronous transfer can reduce the delay of receiving red envelopes interface and improve service performance and user experience.

From the perspective of user perception: the user is more concerned about whether the red envelope is received successfully after clicking on it, and the user's perception is not so strong as to whether the balance is synchronized to the account.

In addition: the transfer itself also has a process from the transfer to the successful transfer, and the asynchronous transfer has basically no effect on the user's perception.

8. Stability disaster recovery practice

8.1 Outlining the disaster recovery of the entire red envelope system, we mainly do it in the following three ways:

1) Interface current limit;
2) Business downgrade;
3) Multiple mechanisms ensure the advancement of the state machine.

As shown below:

These methods are described below.

8.2 Interface Current Limiting Interface current limiting is a common disaster recovery method, which is used to protect the system from only processing requests within the tolerance range and prevent the system from being crashed due to excessive external requests.

Before implementing the interface current limit, we first need to communicate with the upstream and downstream and the product to obtain an estimated red envelope issuance and collection volume, and then sort out the overall traffic flow of the whole link by module according to the issuance and collection volume.

The following is the request volume of a b2c full link that we sorted out at that time:

After there is a request volume of each module, after summarizing, you can get the traffic requests of each interface, each service of the red envelope system, and each service that depends on the downstream. At this time, it is more convenient to limit the current.

8.3 Business downgrade
8.3.1) Core dependency downgrade:

During the Spring Festival activities, there are many services that the entire link of the red envelope system depends on. These downstream link dependencies can be divided into core dependencies and non-core dependencies.

When the downstream core service is abnormal, a certain link may be unavailable. In this case, you can directly downgrade at the API layer and return a friendly copy prompt, and let it go after the downstream service is restored.

For example, in the red envelope sending process of C2C, the user needs to complete the payment before sending the red envelope. If the payment process of Caijing is abnormal or the payment success status is not completed for a long time, it will cause the red envelope to be sent unsuccessfully after the user pays, and it will also cause the front end to keep coming. Querying the status of red envelopes through rotation training, resulting in a sharp increase in the number of requests, causing service pressure, and even affecting B2C red envelope issuance and query.

At this time, the C2C red envelope distribution can be downgraded and returned by way of interface downgrade, reducing service pressure and reducing the impact on other business logic.

8.3.2) Non-core dependency downgrade:

In addition to core dependencies, the red envelope system also has some non-core downstream dependencies. For these dependencies, if the service is abnormal, we can reduce part of the user experience to ensure the availability of the service.

For example, as we mentioned earlier, users need to obtain all available red envelope subsidies before sending B2C red envelopes. We will go to the reward issuing end to query all the Token lists, then query our own DB, and then merge and return.

If the interface for obtaining the Token list is abnormal, we can downgrade and only return the subsidy data in our own DB, which can ensure that the user can still issue red envelopes in this case, which only affects the display of part of the subsidy, not the entire red envelope. send link.

8.4 Multiple mechanisms to ensure the advancement of the state machine In the red envelope system, if an order has not reached the final state for a long time, for example, the user has not received the red envelope for a long time after receiving the red envelope, or the user has not received the C2C red envelope for a long time and has not refunded the user for a long time. May cause customer complaints from users.

Therefore, it is necessary to timely and accurately ensure that the status of each order in the system can be pushed to the final state.

Here we have several ways to guarantee.

The first is the callback. After the order of the relying party system is processed, it will be notified to the red envelope system in a timely manner. This method is also the most timely method.

However, only relying on the callback may cause the relying party to be abnormal or the network jitter to cause the callback to be lost. At this time, we will send an mq to the red envelope system at each stage of the red envelope, and spend a certain period of time to consume the mq to actively query the order status of the relying party for updates.

Finally, we will have a timed task for each state machine for the bottom line. If the timed task is executed many times and the final state has not been reached, it will be notified by lark, and the problem will be found by human intervention in time.

9. Fund security assurance practice

9.1 Transaction idempotency In programming, idempotency means that any number of executions of a request will have the same impact as a single execution. In capital security, the corresponding idempotent logic processing through the order number can prevent the occurrence of capital loss.

Specifically: in the red envelope system, we use the unique key of the order number to ensure that the interface is idempotent in the red envelope issuance, collection and refund.

In addition: the subsidy issuance interface of the red envelope system is idempotent, and the same external single number requests for subsidy multiple times, we need to ensure that only one coupon will be issued.

There are many schemes to achieve idempotency: including the realization of idempotency through database or redis. The most reliable way to achieve this is through database unique key conflicts, but this approach introduces some additional problems when there are sharded instances in the database.

Here: Let's briefly introduce the distribution of subsidies. In the design of the business system, we build the database table of the business in the way of uid sharding, which leads to the sharding key of the subsidy being uid, although we also set The subsidy number of the red packet is used as the unique key.

However, there is a risk that if the uid of the same external order number is changed when the upstream system call subsidy is issued, it may cause two requests to be sent to different database instances respectively, resulting in the failure of the unique index, resulting in capital loss.

In order to solve this problem, we additionally introduce a database that subsidizes the issuance of external tracking numbers as the sharding key to solve this risk.

9.2 B2C Red Packet Check In addition to taking corresponding financial security considerations in the system design of the development process, we also need to check whether our system has financial security issues by means of reconciliation.

In the B2C link, the entire link is mainly from subsidy distribution to red envelope collection. We perform corresponding hourly hive reconciliation for the upstream and downstream data of these links.

9.3 C2C Red Packet Check In the C2C link, the whole process is mainly from the user initiates the payment, to the user receiving the transfer and finally the red packet expired refund.

The three processes of payment, transfer and refund need to be checked accordingly.

At the same time, it is also necessary to ensure that the user's red envelope distribution amount is greater than or equal to the red envelope transfer amount + the red envelope refund amount. The reason is greater than or equal to because the entire cycle from the successful issuance of the red envelope to the successful refund will be more than 24 hours.

In addition: there may be multiple refund orders due to the transfer in transit. If the requirements are strictly equal, the specific reconciliation timing cannot be controlled.

10. Stress testing practice of the red envelope system
10.1 Overview As mentioned earlier, the link of the red envelope system includes multiple interfaces, such as sending and receiving checks, etc., and it is necessary to simulate the real behavior of users to conduct stress tests to obtain the real performance of the system. Here, we use the script stress testing method of the stress testing platform to perform stress testing.

First of all, it is necessary to reconstruct the entire pressure measurement link, and communicate with the upstream and downstream whether the pressure measurement can be performed. If the pressure measurement cannot be performed, corresponding mock processing is required.

In addition: For storage services, databases, redis and mq, ensure the correct delivery of pressure measurement targets, otherwise it may affect the online.

After modifying the stress test link, you need to construct the corresponding stress test script, which is divided into two scripts for B2C and C2C.

10.2 B2C Red Packet Link Stress Test

The above is the entire link of the B2C stress test. First, the subsidy is distributed, and then the subsidy is inquired, and the red envelope is distributed through the subsidy. In order to simulate the situation of multiple people receiving the red envelope, we set up multiple goroutines to receive the red envelope concurrently.

10.3 C2C Red Packet Link Stress Test

Because the C2C red envelope involves payment-related operations, the entire link is another set of processes, so a separate script is also required for C2C.

In the stress testing process, because of the dependencies of external systems, if you wait for the full link to be OK and then perform stress testing together, some unknown problems may occur.

Therefore, we need to start the pressure test on the whole link together after the pressure test is OK by ourselves. In the blue modules related to payment, we have added corresponding mock switches to control the results of the pressure test.

When the mock switch is turned on, it will directly construct a result return, and when the mock switch is turned off, it will normally request Finance to obtain the result.

11. Follow-up planning (service set)

In the system disaster recovery mentioned above, if the red envelope core service is changed, or the database DB host room hangs, it will affect all users. At this time, it can only be downgraded and returned, and the entire system cannot be quickly switched and restored.

In the future, consider changing the service to a set-based architecture.

That is, the service server and the corresponding storage are divided into a separate set, each set only handles the traffic in the corresponding divided unit, and the traffic splitting and fault isolation between multiple units, as well as the data backup between sets.

In this way, when a certain unit is abnormal, the traffic of the corresponding unit can be switched to the backup unit in time.

12. More information

[1] A set of IM architecture technology dry goods for hundreds of millions of users (Part 1): overall architecture, service splitting, etc.
[2] A set of IM architecture technology dry goods for hundreds of millions of users (Part II): reliability, orderliness, weak network optimization, etc.
[3] From novice to expert: how to design a distributed IM system with hundreds of millions of messages
[4] Alibaba Technology Sharing: E-commerce IM messaging platform, technical practice in group chat and live broadcast scenarios
[5] A set of high-availability, easy-to-scale, high-concurrency IM group chat, single chat architecture scheme design practice
[6] A set of practice sharing of mobile IM architecture design for massive online users (including detailed pictures and texts)
[7] A set of original distributed instant messaging (IM) system theoretical architecture scheme
[8] One entry for beginners is enough: developing mobile IM from scratch

(This article has been published simultaneously at: http://www.52im.net/thread-3945-1-1.html )


JackJiang
1.6k 声望810 粉丝

专注即时通讯(IM/推送)技术学习和研究。