Secrets of the IM architecture design of enterprise WeChat: message model, tens of thousands of people, read receipts, message withdrawal, etc.

The author of this article Pan Tanglei, Tencent WXG (WeChat Business Group) development engineer, graduated from Sun Yat-sen University. The content has been revised.

1. Content overview

This article summarizes the IM message system architecture design of enterprise WeChat, expounds the technical difficulties and challenges that enterprise business brings to IM architecture design, as well as the comparison and analysis of technical solutions. At the same time, it summarizes some common methods of IM background development, which are suitable for IM messaging system.

Recommended reading: Another article shared by the enterprise WeChat team is also worth reading.

Learning Exchange:

5 groups for instant messaging/push technology development and communication: 215477170 [recommended]
Introduction to Mobile IM Development: "One entry is enough for novices: Develop mobile IM from scratch"
Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK

(This article was published synchronously at: http://www.52im.net/thread-3631-1-1.html)

2. Noun explanation

The following are the abbreviations of technical terms involved in the content of this article, the specific meanings are as follows:

1) seq: self-increasing serial number, one for each message (see: "WeChat Massive IM Chat Message Serial Number Generation Practice");
2) ImUnion: Message intercommunication system, used to connect enterprise WeChat and WeChat messages;
3) Control messages: control commands, invisible messages, and a reliable notification mechanism for multiplexing message channels;
4) Application message: the message issued by the system application;
5) api message: a message issued by a third-party application;
6) Appinfo: The unique strid corresponding to each message, which is globally unique. The appinfo of the same message is the same in all recipients.

3. Technical background

Enterprise WeChat is an office collaboration product, and chat messaging is the most basic function. The stability, reliability, and security of the message system are especially important.

There are many difficulties in the construction and design of the message system. Moreover, the messaging system for the toB scenario needs to support more complex business scenarios.

specific services for toB scenarios are:

1) Message authentication: The relationship types include group relationship, relationship with company colleagues, friendship relationship, group enterprise relationship, and circle enterprise relationship. The sending and receiving parties must have at least one relationship before they are allowed to send messages;
2) Receipt message: each message needs to record a list of read and unread personnel, involving frequent status read and write operations;
3) Withdrawal message: support 24-hour validity period withdrawal action;
4) Message storage: Cloud storage has a long time span, and can support up to 180 days of message storage. Hundreds of terabytes of user messages need to be optimized to reduce machine costs;
5) Ten thousand people chat: The maximum number of groups can support 10,000 people, and a group message is like a small DDoS attack;
6) WeChat intercommunication: Two heterogeneous im systems are directly connected, reliability and consistency are especially important.

4. Overall architecture design 1: architecture layering

As shown above, the overall architecture is layered as follows.

1) Access layer: unified entrance, receiving client requests, and forwarding to the corresponding CGI layer according to the type. The client can connect to wwproxy through long or short connection. Active clients will give priority to the long connection to initiate the request. If the long connection fails, the short connection will be used to retry.

2) CGI layer: http service, receives wwproxy data packets, verifies the user's session status, and uses the secret key distributed in the background to unpack. If the decryption fails, the request is rejected. If the decryption is successful, the plaintext packet body is forwarded to the svr corresponding to the back-end logic layer.

3) Logic layer: a large number of microservices and asynchronous processing services, using the self-developed hikit rpc framework, and using tcp short connections for communication between SVRs. Perform data integration and logical processing. Communication with external systems is through http protocol, including WeChat intercommunication, push platform of mobile phone manufacturers, etc.

4) Storage layer: The message storage is based on the levelDB model to develop msgkv. SeqSvr is a sequence number generator, which guarantees that the dispatched seq monotonously increases and does not roll back, and is used for message sending and receiving protocols.

5. Overall architecture design 2: Message sending and receiving model

The enterprise WeChat messaging model uses a push-pull method, which is highly reliable and simple in design.

The following is a sequence diagram of message push and pull:

PS: As shown in the figure above, the sender requests the background to write the message to the receiver's storage, and then push to notify the receiver. The recipient receives the push and actively comes up to the background to receive the message.

not heavy, not lost, and reached in time. These three are the core indicators of the messaging system:

1) Real-time reach: The client establishes a long connection with the background to ensure the real-time reach of the message push;
2) Timely notification: If the client's long connection is not there, the process is killed, use the push platform of the mobile phone manufacturer to push notifications, or directly pull up the process to receive messages;
3) The message is reachable: If there is a message flood and the push in the background is lagging, the client has a round-robin training mechanism to ensure that the message is reachable;
4) Anti-message loss: In order to prevent message loss, as long as the back-end logic layer receives the request, it is guaranteed that the message is written to the receiver's storage. If it fails, it will try again. If the request fails at the CGI layer, the message red dot will be returned to the client;
5) Message reordering: In the scenario of a weak network, the client may have successfully written the request to the storage, and the response packet will time out, causing the client to retry to initiate the same message, which will cause the message to be repeated. In order to avoid this situation, each message will generate a unique appinfo, and the background will be indexed for weighting. The same message will be returned directly to ensure that there is only one stored.

6. Overall architecture design 3: News spread writing

There are two typical ways of message distribution in IM:

1) Diffusion reading;
2) Diffusion write.

6.1 Diffusion reading
That is: only one copy of each message is saved, and group chat members all read the same data.

Advantages: save storage capacity.

Disadvantages:

① Each user needs to store the conversation list, and pull the conversation message through the conversation id;
② The protocol for receiving messages is complicated. Each session requires incremental synchronization messages, and each session needs to maintain a sequence number.

6.2 Diffusion write
That is: multiple copies of each message, and each group chat member has one copy in their own storage.

advantage:

① All messages can be incrementally synchronized with only one sequence number, and the message receiving protocol is simple;
② Fast reading speed and good front-end experience;
③ Satisfy more ToB business scenarios: receipt message, cloud deletion.

The same message will behave differently from everyone's perspective. For example: a receipt message, the sender can see the read and unread list, and the receiver can only see whether it has been read or not. When you delete a group message from the cloud, it disappears from your own message list, and other people are still visible.

Disadvantages: increase in storage capacity.

The enterprise WeChat adopts the spread writing method, and the message sending and receiving is simple and stable. The increase in storage capacity can be solved by a cold and hot separation solution. The cold data is stored on a cheap SATA disk, the diffusion reading experience is slightly worse, and the protocol design is relatively complicated.

The following figure shows the protocol design written by diffusion:

As shown in FIG:

1) Each user has only one independent message flow. Multiple copies of the same message exist in the message flow of each user;
2) Each message has a seq. In the message flow of the same user, the seq is monotonically increasing;
3) The client saves the largest seq in the message list, indicating that the client already has all messages smaller than the seq. If a new message arrives on the client side by push, the seq is used to request incremental data from the background, and the background returns the message data larger than this seq.

7. System stability design 1: Flexible strategy

7.1 Background
Enterprise WeChat, as a chat im tool for to B scenarios, is used for communication in work scenarios, and has a relatively obvious peak effect (as shown in the figure below).

As shown in the figure above: Working hours are 9:00~12:00 in the morning and 14:00~18:00 in the afternoon, which is the peak of chat, and the volume of messages has increased sharply. Workdays and holidays will also form a clear contrast.

High system pressure during peak periods, occasional network fluctuations or machine overloads may cause a large number of system failures. The im system has relatively high requirements for timeliness, and there is no way to cut peaks. Then it is necessary to introduce some flexible strategies to ensure the stability and availability of the system.

The specific method is to start the overload protection strategy: when the svr has reached the maximum processing capacity, it means that it is in an overload state, and the service capacity will drop sharply as the load increases. If the svr is overloaded, it will reject some normal requests to prevent the machine from being overwhelmed and still be able to serve externally. Determine whether it is in an overload state by counting the adjusted time consumption of svr, worker usage, etc. The overload protection strategy plays a role in protecting the system during peak requests to prevent avalanche effects.

The following picture shows the request rejected due to overload:

7.2 Problem
The problem caused by the overload protection strategy in the previous summary is that the system fails to return from overload, the front-end message fails to display, and the red dot is displayed, which will seriously affect the product experience.

Sending messages is the most basic function of the im system, and the availability requirement is almost 100%, so this strategy must be optimized.

7.3 Solution
The solution idea is to return to the front-end success despite failure, and the back-end guarantees the ultimate success.

In order to ensure the availability of the message system and avoid the red dots on the front end caused by overload failure of the system during peak periods, many optimizations have been made.

The specific strategy is as follows:

1) The logic layer holds the failed request, returns the front-end success, without a red dot, the back-end asynchronously retries until it succeeds;
2) In order to prevent the retry request from filling up the queue when the system has a large area of failure, only the failed request for half an hour will be held, and the new request will be directly returned to the front end after half an hour;
3) In order to avoid retrying aggravating system overload, retry is delayed exponentially;
4) Complex message authentication (friend relationship, business relationship, group relationship, circle relationship) is time-consuming and serious, and background fluctuations are likely to cause failure. If it is not clear that the authentication fails, the idempotent retry is performed;
5) In order to prevent malicious requests, limit the number of concurrent requests for a single user and a single enterprise. For example, if the number of workers consumed by a single user exceeds 20%, the user's request is directly discarded without retrying.

After optimization, fluctuations in the background are basically not perceived by the front end.

The following is a process comparison before and after optimization:

8. System stability design 2: System decoupling

Due to the product form, the company's WeChat messaging system will rely on many external modules and even external systems.

For example: to communicate with WeChat messages, the permission to send messages needs to be determined by ImUnion. ImUnion is an external system, and the call takes a long time.

Another example: the message audit function of the financial version, the message needs to be synchronized to the audit module, and the rpc call is added.

Another example: single chat group chat message of customer service, the message needs to be synchronized to the crm module, and the rpc call is added. In order to avoid failures of external systems or external modules, drag down the message system and increase time-consuming, system decoupling is required.

Our solution: All interactions with external systems are designed to be asynchronous.

Thinking point: How to design a request that needs to return results synchronously to be asynchronous?

For example, group chat intercommunication messages need to pass ImUnion authentication to return the result, and the front end is used to show whether the message was successfully sent. Let the client succeed first, and fail asynchronously, then call back to the client to make a red dot.

If it is a non-main process, the asynchronous retry is guaranteed to succeed, and the main process is not affected, such as the message audit synchronization function. Then, only need to ensure the stability of the internal system, the main process of sending messages can not be affected.

Decoupling effect chart:

9. System stability design 3: Business isolation

There are many types of corporate WeChat messages:

1) Single chat group chat: basic chat, high priority;
2) api message: The message sent by the enterprise through the api interface has frequency limitation and priority;
3) Application messages: messages issued by system applications, such as announcements, have frequency limits and are in priority;
4) Control messages: invisible messages. For example, if the group information changes, a control message will be issued to notify the group members, and the priority is low.

Group chats are divided into 3 categories according to the number of people in the group:

1) Ordinary group: a group with less than 100 people, with high priority;
2) Large group: group of less than 2000 people, priority;
3) Ten thousand people: low priority.

Numerous businesses: If they are not isolated, fluctuations in one of the businesses may cause the entire messaging system to be paralyzed.

The most important thing: the stability of the core link needs to be ensured, that is, the single chat within the enterprise and the group chat with less than 100 people, because this business is the most basic and the most sensitive, there are slight problems, and the volume of complaints is huge.

The rest of the business: isolate each other and reduce involvement. Isolation is based on priority and importance, and the corresponding concurrency has also been adjusted to ensure the stability of the core link as much as possible.

Effect picture of decoupling and isolation:

10. Design and optimization of to B business functions 1: Ten thousand people

10.1 Technical background
The upper limit of the number of people in the enterprise WeChat group is 10,000. As long as everyone in the group sends a message, the diffusion amount is 10,000 * 10,000 = 100 million calls, which is very huge.

It takes a long time to complete the delivery of 10,000 people, which affects the timeliness of the news.

10.2 Problem analysis
Since the super-large group has a large amount of diffusion writing and takes a long time, it is natural to wonder whether the super-large group can be picked up separately to make a diffusion read.

Let's analyze the difficulties faced by Super Large Group as a single copy:

① A large group, a message flow, group members all synchronize the messages of this flow;
② If the user has multiple large groups, multiple streams need to be synchronized, and the client needs to maintain the seq of each stream;
③ The client is uninstalled and reinstalled, and it doesn't know which message flow it has, and it needs to be stored and informed in the background;
④ When a new message arrives in a large group, all group members need to be notified. If the push is not reached, the client cannot sense that there is a new message, and it is impossible to train all the message streams.

To sum up: the single-copy solution is too expensive.

The following will introduce some optimization practices that we have done to address the spread of chat among tens of thousands of people.

10.3 Optimization 1: Concurrency limit
The diffusion of the tens of thousands of people is large. In order to make the news arrive as soon as possible, multiple coroutines are used to distribute the news. But it is not unlimited to increase the degree of concurrency.

In order to avoid the high frequency of messages sent by a certain tens of thousands of people, causing pressure on the entire message system, message distribution takes the group id as the dimension, which limits the distribution concurrency of a single group. The time it takes to distribute a message to one person is 8ms, so the total time for 10,000 people is 80s, and the upper limit of concurrency is 5, so it takes 16s to complete the message distribution. The time-consuming of 16s is acceptable from the product point of view, and large groups are not sensitive to timeliness. At the same time, the concurrency is controlled within a reasonable range.

In addition to limiting the concurrency of a single group id, it also limits the overall concurrency of 10,000 people. For a single machine, the number of workers in a small group is 250, and the number of workers in a group of 10,000 is 30.

There are frequent messages sent by 10,000 people, and the number of workers is full, resulting in a backlog in the queue:

Due to concurrency restrictions, the number of calls is flattened, there is no request to increase infinitely, and the system is stable:

10.4 Optimization 2: Merge Insert
Chats in work scenes are mostly done in small groups, and large groups are used for administrators to send notices or bosses to send red envelopes.

There is a common rule for large groups of news: Usually there are few news, and they will suddenly become active. For example, if the boss sends a big red envelope in the group and the group members make a fuss, a lot of news will be generated at this time.

If the volume of messages increases, concurrency is limited, and tasks cannot be processed, the queue will naturally be backlogged. There may be multiple messages in the backlog of tasks that need to be distributed to group members of the same group.

At this point: these messages can be combined into one request and written to the message storage, and the throughput of the message system can be doubled.

In daily monitoring, this kind of scene can be captured, and 20 messages can be inserted at the same time at the peak, which is very friendly to the entire system.

10.5 Optimization 3: Business degradation
For example: group personnel changes, group name changes, and group settings changes will spread an invisible control message within the group. When group members receive this control message, they request the background to synchronize new data.

For example: a group of 10,000 people, because the news is too frequent, causing harassment to the group members, some group members choose to withdraw from the group to reject the news, suppose 1,000 people choose to withdraw from the group. Then the amount of control messages spread is 1000w, and the user requests data from the background after receiving the control message, which will bring about 1000w additional data requests, causing huge pressure on the system.

Control messages are necessary in small groups, allowing group members to perceive changes in group information in real time.

But in large groups: the change of group information is actually not so real-time, and users can't feel it. Therefore, in combination with business scenarios, degraded services are implemented, and control messages can be directly discarded and not distributed in large groups, reducing the call to the system.

11. Design and optimization of to B business function 2: Receipt message

11.1 Technical background
Receipt message is a function often used in office scenarios, and you can see the reading status of the message recipient.

The reading status of a receipt message will be frequently modified, and the number of times a group message is modified is proportional to the number of group members. With hundreds of millions of messages every day, frequent reads and writes, and a huge amount of requests, how to ensure that the status of each message is consistent on the receiving side is a difficult point.

11.2 Implementation Scheme
There are two schemes for storing the reading status of the message.

Option One:

Idea: Use message storage to insert a new message to point to the old message, and this new message has the latest reading status. When the client receives a new message, it replaces the content display of the old message with the content of the new message to achieve the effect of displaying the reading status.

Advantages: Reuse the message channel, the receipt status can be obtained by incremental synchronization messages, the notification mechanism and the sending and receiving protocol are reused, and the front and back ends are small.

Disadvantages:

① The storage is redundant, and if the status changes many times, multiple messages need to be inserted;
② Both the sender and receiver need to modify the reading status (the receiver needs to mark the message as read status), and there is a problem of data consistency between the sender and receiver.
Option II:

Idea: Store the reading status of each message independently, and the message sender pulls the data through the message id.

Advantages: Consistent status.

Disadvantages:

① Build a reliable notification mechanism to notify the client of a change in the attribute of a message;
② The synchronization protocol is complicated, and the client needs to know exactly which message's status has changed;
③ Message expired and deleted, and reading status data should also be automatically expired and deleted.

Enterprise WeChat adopts a solution to achieve it, which is simple and reliable, with minor changes: the problem of storage redundancy can be merged through LevelDB when the disk is placed, and only the message of the final state can be retained; the consistency problem will be explained below how to solve it.

The figure above is the protocol flow (referid: the pointed message id, senderid: the msgid of the message sender):

1) Each message has a unique msgid, which is unique within a single user, and is automatically generated by kv storage;
2) Receiver b has read the message, the client brings msgid=b1 request to the background;
3) Add a new message to the recipient b, msgid=b2, referid=b1, and point to the message with msgid=b1. And set the message content of msgid=b2 as the message has been read. The message body of msgid=b1 contains the msgid of the sender, that is, senderid=a1;
4) The sender a, reads the message body of msgid=a1, adds b to the read list, saves the new read list to the message body, generates a new message msgid=a2, referid=a1, and writes it to a's message flow;
5) The receiver c has read the same message and follows the same logic in the message flow of c;
6) The sender a, reads the message body of msgid=a1, adds c to the read list, saves the new read list to the message body, generates a new message msgid=a3, referid=a1, and writes it to The message flow of a. a3>a2, take a3 with greater msgid as the final state.
11.3 Optimization 1: Asynchronization
The recipient has read the message, allowing the client to sense the success of synchronization, but the status of the sender does not need to be modified synchronously. Because of the status modification of the sender, the receiver is not aware of it. Then, an asynchronous strategy can be adopted to reduce the time-consuming of synchronous calls.

The specific approach is:

1) The recipient's data is written synchronously, so that the client immediately perceives that the message has been successfully read;
2) The sender's data is written asynchronously, reducing synchronization requests;
3) Asynchronous writes ensure success through retries and achieve the goal of eventual consistency in the state.

11.4 Optimization 2: Consolidation processing
The client receives a large number of messages, not one message has been read confirmation, but multiple messages have been read together. In order to improve the processing efficiency of receipt messages, multiple messages can be combined and processed.

As shown in FIG:

1) X>>A: means X sent a message to A;
2) A merges and confirms 3 messages, and B merges to confirm 3 messages. Then it only needs to process 2 times to mark that 6 messages have been read;
3) After mq distribution, the same sender can also be combined for processing. On the sender, X merges to process 2 messages, Y merges to process 2 messages, and Z merges to process 2 messages, and 6 messages can be marked by the merge process for 3 times.

After the merger processing, the processing efficiency is greatly improved. The figure below shows the call data collected during the online peak period. It can be seen that the optimized effect saves a total of 44% of the write volume.

11.5 Read and write coverage solution
The message processing method of the sender is to read the data first, and then overwrite it into the storage after modification. If there are multiple receivers, the sender's data will be written concurrently, and the problem of overwriting cannot be avoided.

The process is as follows:

1) The read status of a certain message of the sender is X;
2) The receiver a confirms that it has been read, and the read status is changed to X+a;
3) The receiver b confirms that it has been read, and the read status is changed to X+b;
4) The status of receiver a is written first, and the status of receiver b is written later. This final state is X+b;
5) Actually the correct state is X+a+b.

There are only a few ways to deal with this kind of problem.

Solution 1: Because concurrent operations are distributed, you can use distributed locks to ensure consistency. Before operating the storage, apply for a distributed lock. This solution is too heavy and is not suitable for this kind of high-frequency and multi-account scenario.

Scheme 2: Read and write with version number. There is only one version lock for the message flow of an account. In the scenario of high-frequency writing, version conflicts are likely to occur, resulting in low writing efficiency.

Scheme 3: mq serialization processing. To avoid overwriting problems, the key is to play a good role in the merge scene. The requests of the same account are serialized, and even if there is a backlog in the queue, the combined strategy can also improve processing efficiency.

Enterprise WeChat adopts scheme three, which is simple and easy to implement, with fewer logic changes, for users with the same id to request serialization.

12. Design and optimization of to B business function 3: Withdraw the message

12.1 Technical difficulties
"Withdrawing a message" is equivalent to updating the status of the original message. Can it also be pointed to by referid?

The receipt message has been analyzed: it is pointed by the referid, and the msgid of the original message must be known.

Different from the receipt message: Withdrawing a message requires modifying the message status of all recipients, not just the sender and a single recipient. Message diffusion is written to the message stream of each receiver, and the msgid corresponding to each message stream is different. If the referid method is used, then the msgid of all receivers needs to be recorded.

12.2 Solution
Analysis: The withdrawal message is simpler than the receipt message. The withdrawal message only needs to update the status of the message, and does not need to know the content of the original message. The appinfo of the recipient's message is the same, and can be pointed to by appinfo.

Agreement process:

1) Users a, b, and c all have the same message, appinfo=s, sendtime=t;
2) If a withdraws the message, insert a withdrawn control message in the message stream of a, and the message body contains {appinfo=s, sendtime=t};
3) The client syncs to the withdrawn control message, obtains the appinfo and sendtime of the message body, displays the original message with local appinfo=s and sendtime=t as the withdrawn state, and deletes the original message data. The reason why the sendtime field is introduced is to prevent appinfo collisions and add double verification;
4) The receiver's withdrawal process is the same as the sender's, and it is also by inserting a withdrawal control message.

The advantages of this scheme are obvious, the reliability is high, and the protocol is simple.

Logic diagram of withdrawn message:

13. Thinking and summarizing

The IM messaging architecture of enterprise WeChat is similar to that of WeChat, but it faces some new challenges in the to B business scenario. Combining the product form, analysis strategy, and optimizing the plan to ensure the reliability, stability, and security of the message system.

Enterprise WeChat’s to B business is complicated and has many customized requirements. The design of the message system needs to consider versatility and scalability in order to support various requirements. For example, the scheme of withdrawing a message can be applied to the update of any attribute of the message to meet more scenarios.

Appendix: More Essence Articles

[1] Articles about IM architecture design:

"On the architecture design of IM system"
"A brief description of the pits of mobile IM development: architecture design, communication protocol and client"
"A set of mobile IM architecture design practice sharing for massive online users (including detailed graphics and text)"
"An Original Distributed Instant Messaging (IM) System Theoretical Architecture Plan"
"From Zero to Excellence: The Evolution of the Technical Architecture of JD's Customer Service Instant Messaging System"
"Mushroom Street Instant Messaging/IM Server Development Architecture Selection"
"Tencent QQ's 140 million online users' technical challenges and architecture evolution PPT"
"WeChat background based on the time series of massive data cold and hot hierarchical architecture design practice"
"WeChat Technical Director Talks about Architecture: The Way of WeChat-Dao Zhi Jian (Full Speech)"
"How to Interpret "WeChat Technical Director Talking about Architecture: The Way of WeChat-The Road to the Simple""
"Rapid Fission: Witness the evolution of WeChat's powerful back-end architecture from 0 to 1 (1)"
"17 Years of Practice: Technical Methodology of Tencent's Massive Products"
"How to ensure the efficiency and real-time performance of large-scale group message push in mobile IM? 》
"Discussion on the Synchronization and Storage Scheme of Chat Messages in Modern IM System"
"Technical Challenges and Practice Summary Behind the 100 Billion Visits of WeChat Moments"
"Take Weibo application scenarios as an example to summarize the architectural design steps of massive social systems"
"Behind the glamorous bullet message: the chief architect of Netease Yunxin shares the technical practice of the billion-level IM platform"
"IM development basic knowledge supplementary lesson (5): easy to understand, correctly understand and make good use of MQ message queue"
"WeChat Technology Sharing: Practice of Generating Massive IM Chat Message Sequence Numbers in WeChat (Principles of Algorithms)"
"WeChat Technology Sharing: Practice of Generating Massive IM Chat Message Serial Numbers in WeChat (Disaster Recovery Plan)"
"Beginner's Introduction: A zero-based understanding of the evolution history, technical principles, and best practices of large-scale distributed architectures"
"A set of high-availability, easy-scalable, and high-concurrency IM group chat and single chat architecture design practices"
"Social Software Red Envelope Technology Decryption (1): Comprehensive Decryption of QQ Red Envelope Technology Scheme-Architecture, Technical Implementation, etc."
"Social Software Red Envelope Technology Decryption (2): Decrypt WeChat Red Envelope Technology Evolution from 0 to 1"
"Social Software Red Envelope Technology Decryption (3): The technical details behind the WeChat Shake Red Envelope Rain"
"Social software red envelope technology decryption (4): How does the WeChat red envelope system deal with high concurrency"
"Social software red envelope technology decryption (5): How does the WeChat red envelope system achieve high availability"
"Social software red envelope technology decryption (6): The storage layer architecture evolution practice of WeChat red envelope system"
"Social software red envelope technology decryption (7): Alipay red envelope massive high-concurrency technical practice"
"Social Software Red Envelope Technology Decryption (8): Comprehensive Decryption of Weibo Red Envelope Technology Plan"
"Social software red envelope technology decryption (9): talk about the functional logic, disaster tolerance, operation and maintenance, architecture, etc. of the mobile Q red envelope"
"Social software red envelope technology decryption (10): Mobile QQ client's technical practice for the 2020 Spring Festival red envelope"
"Social Software Red Envelope Technology Decryption (11): Random Algorithm for Decrypting WeChat Red Envelope (including code implementation)"
"Introduction to Instant Messaging: What is Nginx?" Can it achieve IM load balancing? 》
"From guerrilla to regular army (1): the evolution of the IM system architecture of Mafengwo Travel Network"
"From guerrilla to regular army (2): Mafengwo Travel Network's IM Client Architecture Evolution and Practice Summary"
"From Guerillas to Regular Army (3): Technical Practice of Distributed IM System of Mafengwo Travel Network Based on Go"
"IM Development Fundamentals Supplementary Lesson (6): Does the database use NoSQL or SQL? Enough to read this! 》
"The data architecture design of Guazi IM intelligent customer service system (organized from the on-site speech, with supporting PPT)"
"Ali DingTalk Technology Sharing: Enterprise-level IM King-DingTalk's outstanding features in the back-end architecture"
"Design Practice of a New Generation of Mass Data Storage Architecture Based on Time Sequence in WeChat Backend"
"IM Development Basic Knowledge Supplementary Lesson (9): Want to develop an IM cluster? First understand what RPC is! 》
"Ali Technology Sharing: E-commerce IM messaging platform, technical practice in group chat and live broadcast scenarios"
"A set of IM architecture technical dry goods for hundreds of millions of users (Part 1): overall architecture, service split, etc."
"A set of IM architecture technical dry goods for hundreds of millions of users (Part 2): reliability, orderliness, weak network optimization, etc."
"From novice to expert: How to design a distributed IM system with billions of messages"

[2] Original technical articles from QQ and WeChat teams:
"Technical Challenges and Practice Summary Behind the 100 Billion Visits of WeChat Moments"
"Tencent Technology Sharing: How Tencent significantly reduces bandwidth and network traffic (Picture Compression)"
"Tencent Technology Sharing: How Tencent significantly reduces bandwidth and network traffic (Audio and Video Technology)"
"Shared by WeChat Team: Solution to the Multi-phonetic Word Problem in Full-Text Search on WeChat Mobile"
"Tencent Technology Sharing: Cache Monitoring and Optimization Practice of Android Mobile QQ"
"WeChat Team Sharing: Technical Practice of High-Performance Universal Key-value Component of WeChat for iOS"
"WeChat team sharing: How does the iOS version of WeChat prevent group explosions and APP crashes caused by special characters? 》
"Tencent Technology Sharing: Technical Practice of Thread Deadlock Monitoring System for Android Mobile Q"
"Original Sharing by the WeChat Team: Technical Practice of the Memory Monitoring System of WeChat on iOS"
"Make the Internet Faster: A New Generation of QUIC Protocol's Technical Practice Sharing in Tencent"
"IOS background wake-up combat: Summary of WeChat voice reminder technology for receipt of funds"
"Tencent Technology Sharing: The Evolution of Bandwidth Compression Technology for Social Network Pictures"
"WeChat team sharing: super-resolution technology principles and application scenarios of video images"
"WeChat Team Sharing: The Technology Decryption Behind WeChat Real-time Audio and Video Chats 100 Million Times a Day"
"QQ Music Team Sharing: Detailed Explanation of Image Compression Technology in Android (Part 1)"
"QQ Music Team Sharing: Detailed Explanation of Image Compression Technology in Android (Part 2)"
"Shared by the Tencent Team: Detailed Explanation of the Cool Animation Effect of Face Recognition in Mobile QQ"
"Tencent Team Sharing: One-time sharing of tracking process of bugs displayed in pictures in mobile QQ chat interface"
"WeChat team sharing: the pits filled in by the WeChat Android version of the small video encoding"
"The road to optimize the full-text search of local data on WeChat mobile phone"
"Optimization of the synchronization update plan of organizational structure data in the enterprise WeChat client"
"WeChat Team Disclosure: The ins and outs of the super bug "15..."
"QQ 18 years: Decrypting 800 million monthly active QQ background service interface isolation technology"
"How the 889 million monthly active WeChat Super IM WeChat is tested for compatibility on the Android side"
"Take mobile QQ as an example to discuss "light applications" in mobile IM"
"An article get everything about WeChat open source mobile terminal database component WCDB! 》
"Technical Interview with WeChat Client Team Leader: How to Start Client Performance Monitoring and Optimization"
"WeChat background based on the time series of massive data cold and hot hierarchical architecture design practice"
"WeChat team original sharing: the bloatedness of WeChat Android version and the road to modular practice"
"WeChat background team: optimization and upgrade practice sharing of WeChat background asynchronous message queue"
"Original Sharing by the WeChat Team: Practice of Repairing SQLite Database Damage on the WeChat Client"
"Tencent Original Sharing (1): How to greatly increase the speed and success rate of mobile phone QQ picture transmission under mobile networks"
"Tencent Original Sharing (2): How to significantly reduce the data consumption of APP under the mobile network (Part 2)"
"Tencent Original Sharing (3): How to significantly reduce the data consumption of APP under the mobile network (Part 1)"
"WeChat Mars: The network layer packaging library being used inside WeChat, will be open source soon"
"As promised: WeChat's own mobile IM network layer cross-platform component library Mars has been officially open sourced"
"Open source libco library: the cornerstone of the backend framework that supports tens of millions of connections on a single machine and supports 800 million WeChat users [Source code download]"
"WeChat New Generation Communication Security Solution: Detailed Explanation of MMTLS Based on TLS1.3"
"WeChat team original sharing: Android version of WeChat background keep-alive actual sharing (process keep-alive)"
"WeChat team original sharing: Android version of WeChat background keep-alive actual sharing (network keep-alive)"
"The technological evolution of WeChat for Android from 300KB to 30MB (PPT) [Attachment download]"
"WeChat team original sharing: the technological evolution of WeChat for Android from 300KB to 30MB"
"WeChat Technical Director Talks about Architecture: The Way of WeChat-Dao Zhi Jian (Full Speech)"
"WeChat Technical Director Talks about Architecture: The Way of WeChat-Dao Zhi Jian (PPT) [Attachment Download]"
"How to Interpret "WeChat Technical Director Talking about Architecture: The Way of WeChat-The Road to the Simple""
"Background System Storage Architecture Behind Massive WeChat Users (Video + PPT) [Attachment Download]"
"The Practice of Asynchronous Transformation of WeChat: 800 Million Monthly Lives and Tens of Millions of Connections in the Back Office"
"WeChat Moments Massive Technology PPT [Attachment Download]"
"Technical Test and Analysis of WeChat's Influence on the Network (Full Paper)"
"A Concluding Note of WeChat Back-end Technical Architecture"
"The Way of Architecture: 3 Programmers Achieve WeChat Moments with an average daily publishing volume of 1 billion [with video]"
"Rapid Fission: Witness the evolution of WeChat's powerful back-end architecture from 0 to 1 (1)"
"Rapid Fission: Witness the evolution of WeChat's powerful back-end architecture from 0 to 1 (2)"
"WeChat team original sharing: Android memory leak monitoring and optimization skills summary"
"Comprehensive summary of the various "pits" encountered in the iOS version of WeChat upgrading iOS9"
"WeChat team original resource obfuscation tool: Let your APK decrease by 1M"
"WeChat team original Android resource obfuscation tool: AndResGuard [source code]"
"Android version of the WeChat installation package "weight loss" actual combat record"
"The actual combat record of the iOS version of the WeChat installation package "weight loss""
"Mobile terminal IM practice: iOS version of WeChat interface freeze monitoring program"
"Technical Difficulties Behind WeChat "Red Envelope Photos""
"Mobile IM Practice: Technical Solution Record of WeChat Small Video Function of iOS Version"
"Mobile IM Practice: How to Significantly Improve Interactive Performance of WeChat for Android (1)"
"Mobile IM Practice: How to Significantly Improve Interactive Performance on WeChat for Android (2)"
"Mobile IM Practice: Realizing the Smart Heartbeat Mechanism of Android WeChat"
"Mobile IM Practice: Analysis of the Heartbeat Strategy of WhatsApp, Line and WeChat"
"Mobile IM Practice: Google Message Push Service (GCM) Research (from WeChat)"
"Mobile IM Practice: Discussion on the Multi-device Font Adaptation Scheme of WeChat for iOS"
"Carrier Pigeon Team Original: Let's Walk Through the Pit of APNS on iOS10"
"Tencent pigeon technology sharing: practical experience of tens of billions of real-time message push"
"IPv6 Technology Detailed Explanation: Basic Concepts, Application Status, Technical Practice (Part 1)"
"IPv6 Technology Detailed Explanation: Basic Concepts, Application Status, Technical Practice (Part 2)"
"Tencent TEG Team Original: Ten Years of Forging Experience Sharing of MySQL-based Distributed Database TDSQL"
"Interview with WeChat Multimedia Team: Learning from Audio and Video Development, WeChat Audio and Video Technology and Challenges, etc."
"Understanding iOS Push is enough: The most comprehensive iOS Push technology in history"
"Tencent Technology Sharing: The Story Behind WeChat Mini Program Audio and Video Technology"
"Summary of Tencent's Senior Architect Dry Goods: An article to understand all aspects of large-scale distributed system design"
"Interview with Liang Junbin from the WeChat Multimedia Team: Talk about the audio and video technologies I know"
"Tencent Audio and Video Lab: Using AI Black Technology to Achieve Ultra-low Bit Rate HD Real-time Video Chat"
"Tencent Technology Sharing: Technical Ideas and Practice of Intercommunication Between WeChat Mini Program Audio and Video and WebRTC"
"Teach you to read the chat records of Android version of WeChat and Mobile QQ (for technical research and study only)"
"WeChat Technology Sharing: Practice of Generating Massive IM Chat Message Sequence Numbers in WeChat (Principles of Algorithms)"
"WeChat Technology Sharing: Practice of Generating Massive IM Chat Message Serial Numbers in WeChat (Disaster Recovery Plan)"
"Tencent Technology Sharing: Detailed Explanation of GIF Motion Picture Technology and Practice of Mobile QQ Dynamic Expression Compression Technology"
"WeChat team sharing: Kotlin is gradually recognized, a technical early adopter tour of WeChat for Android"
"Shared by the QQ Design Team: The Functional Design Ideas Behind the New Version of QQ 8.0 Voice Messages"
"WeChat team sharing: extreme optimization, practice summary of 3 times faster compilation speed of iOS version of WeChat"
"IM "Scan" function is easy to do? Take a look at the complete technical realization of WeChat "Scan for Knowledge""
"WeChat team sharing: Thoughts on the mobile terminal software architecture brought about by the reconstruction of WeChat payment code"
"IM Development Collection: The most complete in history, a summary of various function parameters and logic rules of WeChat"
"WeChat Team Sharing: The Evolution of 15 Million Online Message Architecture in a Single Room of WeChat Live Chat Room"

This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".

▲ The link of this article on the official account is: click here to enter. The synchronous publishing link is: http://www.52im.net/thread-3631-1-1.html

Secrets of the IM architecture design of enterprise WeChat: message model, tens of thousands of people, read receipts, message withdrawal, etc.

1. Content overview

2. Noun explanation

3. Technical background

4. Overall architecture design 1: architecture layering

5. Overall architecture design 2: Message sending and receiving model

6. Overall architecture design 3: News spread writing

7. System stability design 1: Flexible strategy

8. System stability design 2: System decoupling

9. System stability design 3: Business isolation

10. Design and optimization of to B business functions 1: Ten thousand people

11. Design and optimization of to B business function 2: Receipt message

12. Design and optimization of to B business function 3: Withdraw the message

13. Thinking and summarizing

Appendix: More Essence Articles

JackJiang

引用和评论

长连接网关技术专题(十二)：大模型时代多模型AI网关的架构设计与实现

即时通讯安全篇（一）：正确地理解和使用Android端加密算法

全民AI时代，大模型客户端和服务端的实时通信到底用什么协议？

融云数据监控平台「北极星」教程，聊天室洪峰、连接异常、消息未达正确解法

极致出海友好，融云 IM 支持消息免打扰设置时区

如何基于 Go 语言设计一个简洁优雅的分布式任务系统

视频直播技术干货(十三)：B站实时视频直播技术实践和音视频知识入门