Rongyun technology sharing: comprehensively reveal the reliable delivery mechanism of billion-level IM messages
This article was originally shared by the Rongyun technical team. The original title is "Comprehensive Analysis of IM Message Synchronization Mechanism". In order to better understand the article, the content has been re-summarized and revised in detail.
1. Content overview
The most basic and most important thing of an instant messaging (IM) system is the timeliness and accuracy of messages. Timeliness is reflected in delay, and accuracy is specifically represented by no loss, no repetition, and no disorder.
Considering business scenarios, system complexity, network traffic, terminal energy consumption, etc., our billion-level distributed IM messaging system has meticulously designed a message sending and receiving mechanism, and has been continuously refined and optimized to form the current reliable message delivery mechanism.
The overall idea is:
1) The client and server cooperate and complement each other;
2) Adopt multiple mechanisms to guarantee from different levels;
3) Split the uplink and downlink and deal with them separately.
Based on the technical practice of Rongyun billion-level IM messaging system, this article summarizes the reliable delivery mechanism of distributed IM messages, hoping to play a role in inducing your IM development and knowledge learning.
- 5 groups for instant messaging/push technology development and communication: 215477170 [recommended]
- Introduction to Mobile IM Development: "One entry is enough for novices: Develop mobile IM from scratch"
- Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK
(This article has been simultaneously published at: http://www.52im.net/thread-3638-1-1.html)
2. Recommended reading
The following is a summary of related articles, you can read them if you are interested:
"Introduction to zero-based IM development (2): What is the real-time nature of the IM system? 》
"Introduction to zero-based IM development (3): What is the reliability of the IM system? 》
"Introduction to zero-based IM development (4): What is the message timing consistency of the IM system? 》
"Implementation of IM Message Delivery Guarantee Mechanism (1): Guarantee the reliable delivery of online real-time messages"
"Implementation of IM Message Delivery Guarantee Mechanism (2): Guaranteeing the Reliable Delivery of Offline Messages"
"IM development dry goods sharing: how to elegantly realize the reliable delivery of a large number of offline messages"
"Understanding the "Reliability" and "Consistency" Issues of IM Messages and Discussion of Solutions"
"How to ensure the "sequence" and "consistency" of IM real-time messages? 》
"IM group chat is so complicated, how to ensure that it is not lost or heavy? 》
"From the perspective of the client to talk about the message reliability and delivery mechanism of the mobile terminal IM"
"A set of IM architecture technical dry goods for hundreds of millions of users (Part 2): Reliability, orderliness, weak network optimization, etc."
"From novice to expert: How to design a distributed IM system with billions of messages"
"On the principle of multi-sign-in and message roaming on mobile IM"
"How to design a "failure retry" mechanism for a completely self-developed IM? 》
"IM development dry goods sharing: how do I solve a large number of offline messages causing the client to freeze"
The following are other articles shared by Rongyun technical team:
"IM Message ID Technology Topic (3): Decrypting the Chat Message ID Generation Strategy of Rongyun IM Products"
"Rongyun Technology Sharing: Real-time Audio and Video First Frame Display Time Optimization Practice Based on WebRTC"
"Rongyun Technology Sharing: Practice of Network Link Keep Alive Technology of Rongyun Android IM Products"
"Instant Messaging Yunrongyun CTO's Entrepreneurship Experience Sharing: Technology Entrepreneurship, are you really ready? 》
3. The overall principle of message interaction between client and server
A complete IM message interaction logic usually consists of two sections:
- 1) Uplink segment of the message: that is, sent by the message sender to the server through the IM real-time channel;
- 2) Downlink segment of the message: The server will deliver it to the final message recipient in accordance with a certain strategy.
3.2 Upstream segment of message
The upstream segment of the message mainly relies on the real-time channel of IM to deliver the message to the server.
The reliable delivery of messages at this stage needs to be guaranteed from the protocol layer. The protocol layer needs to provide reliable and orderly two-way byte stream transmission. We implement it through the self-developed communication protocol RMTP (RongCloud Message Transfer Protocol).
A long connection is used between the client and the server, and data is transmitted based on the RMTP protocol.
Schematic diagram of RMTP protocol interaction:
As shown in the figure above, the protocol layer uses QoS, ACK and other mechanisms to ensure the reliability of data transmission in the upstream segment of IM messages.
3.3 Downstream segment of the message
After summarizing, there are mainly three behaviors in the downstream segment of the message.
1) The client actively pulls messages, and there are two triggering methods for active pull:
① Pull offline messages: successfully establish a new connection with the IM service, which is used to obtain messages that have not been received during this period of offline;
②Pull messages regularly: start a timer after the client receives the message last, for example, execute it every 3-5 minutes. There are two main purposes. One is to prevent notification failures caused by uncertain factors such as the network and intermediate devices, and the state of the server and the client is inconsistent, and the other is to keep the state machine alive for the business layer through this request. .
2) The server takes the initiative to send a message (direct message):
This is one of the online message sending mechanisms. It is simply understood as the server sends the message content directly to the client. It is suitable for low message frequency and continuous interaction, such as normal communication and discussion between two people or in a group.
3) The server takes the initiative to send notifications (notification pull):
This is one of the online message sending mechanisms. It is simply understood as the server sends a notification to the client. The notification contains a time stamp and other content that can be used as a sort index. After the client receives the notification, it compares the time stamp in the notification based on its own data. Initiate the process of pulling messages.
This scenario is suitable for more messaging: for example, someone has many large-scale groups, and there are many members in each group that are actively discussing. Through the notification pull mechanism, the number of network interactions between the client and the server can be effectively reduced, and multiple messages can be packaged to increase the effective data load. It can guarantee timeliness and performance.
Schematic diagram of the message interaction between the client and the server in the downlink segment:
4. The specific realization of the message interaction between the client and the server
As mentioned in the previous section, we split the message interaction process: that is, the upstream and downstream are split.
In the upstream process, the order of sending messages is guaranteed. In order to ensure the order of messages, the best way is to distinguish them by userId, and then sort them by timestamp. Then in the case of distributed deployment, assigning users to a fixed service server (PS: refers to the fixed connection of different ends of the same account to the same service server), which will make it easier to sort upstream. At the same time belonging to the same server, it is easier to maintain at multiple ends.
Client connection process:
- 1) The client obtains the token used for connection through the APP server;
- 2) The client uses the token to obtain the specific connected IM access server (CMP) through the navigation service. The navigation service uses the userId to calculate the access server, and then delivers it so that a certain client can connect to the same access server (CMP). ).
The schematic diagram is as follows:
The summary is: after the client sends a message, through the access service, it is delivered to the designated message server according to the userId, and the message Id is generated, and the timestamp of the current message is confirmed to be updated according to the time of the last message (if the same timestamp exists, it will be delayed) . Then the timestamp and message Id are returned to the client through Ack; then the userId + timestamp is used for the upstream message for caching and persistent storage, and subsequent business operations all use this timestamp.
The above business process is called the upstream process, and the message stored in the upstream process is the outbox message.
PS: Regarding the message ID, you need to add some explanation:
We adopt a globally unique message ID generation strategy. Ensure that the message can be identified by ID and deduplicate. The structure of the message ID is shown in the figure below.
How to achieve unique ID generation in distributed scenarios, please read: "IM Message ID Technology Topic (3): Decrypting Rongyun IM Product Chat Message ID Generation Strategy" for details.
After the message node finishes processing the upstream process, the message is delivered to the message node where it is located according to the target user, and enters the downstream process.
In the downstream process, according to the target userId and the timestamp generated during the upstream process of this message, calculate whether the timestamp needs to be updated (forward).
If it needs to be updated, the timestamp is added until the current user timestamp is not repeated.
After such processing, the storage of the target user and the re-weighting after the client receives the message can be consistent, and the timestamps in the same session can be ordered. This ensures that messages from the same receiving user will not appear out of order.
So far: We have introduced the downlink interaction process of the message. The specific implementation of the message downlink process is not simple, and we will expand it in detail below.
1) Send messages directly:
That is, the message that the server actively sends (to the target client):
- 1) The client SDK judges based on the latest message timestamp stored locally, and is used for sorting and other logic;
- 2) Directly send 1 message to the same user, and forward the other notifications. When the notification pulls, the client selects the latest local message timestamp as the start pull time;
- 3) During the message sending process, if the previous message sending process is not finished, the next message will not be sent directly (s_msg), but a notification (s_ntf) will be used.
Straight hair logic diagram:
2) Notification pull:
That is, the server actively sends a notification (to the target client):
- 1) The server carries the current message timestamp in the notification body. Delivery to the client;
- 2) After receiving the notification, the client compares the local message timestamp and chooses whether to send pull message signaling;
- 3) After receiving the pull message signaling, the server starts with the timestamp carried in the signaling, queries the message list (200 or 5M), and responds to the client;
- 4) After the client receives it, ack the server, and the server maintains the status;
- 5) The timestamp used when the client pulls a message is the timestamp of the latest local message on the client.
In the above figure, steps 3-7 may need to be looped multiple times, with the following considerations:
- a. The client receives too many messages at one time, the response volume is too large, and the transmission process requires higher network quality, so it is carried out in batches according to the quantity and message volume;
- b. If there are too many messages pulled at one time, the client processing will take up a lot of resources, there may be freezes, etc., and the experience is poor.
3) The logic of switching between direct message sending and notification pull on the server side:
It mainly involves the update of the state machine.
The following schematic diagram integrates the direct message and notification pull process for the update of the state machine:
At this point, the entire core process of message sending and receiving has been introduced, and the rest of the content will introduce multi-terminal online message synchronization processing.
5. Multi-terminal online message synchronization
According to the two phases of the upstream and downstream of the message, the multi-terminal is also divided into the multi-terminal synchronization of the sender and the multi-terminal synchronization of the receiver.
5.1 Multi-terminal synchronization of the sender
In the previous process of connecting the client to the IM service (see section 4.1 of this article), we have aggregated multiple clients of the same user in the same service, so maintaining multiple ends of a userId becomes very simple.
The specific logic is:
- 1) After the user's multiple terminals are successfully connected, a message is sent. After this message arrives at the CMP (IM access service), the CMP does a basic check, and then obtains the connection of other terminals of the user;
- 2) The service encapsulates the client's upstream message into the server's downstream message, and delivers it directly to the user's other clients. This completes the multi-terminal CC of the sender, and then delivers the message to the IM service. Enter the normal delivery process.
Regarding point 2) above, the sender's multi-terminal synchronization does not go through the IM Server. The advantages of doing so are:
- 1) Relatively fast;
- 2) The fewer service nodes there are, the smaller the probability of problems.
5.2 Multi-terminal synchronization of the receiver
The specific logic is:
1) After the IM service receives the message, it first judges the delivery range of the receiver. This range refers to which terminals of the receiver user want to receive the message;
2) The IM service sends the scope and current message to the CMP. The CMP matches the receiver's terminal according to the scope, and then delivers the message.
The application scenario of the receiver's multi-terminal message synchronization range is generally for all terminals.
But there are some special services: For example, when I control the state of another terminal on the A client, I may need some command messages. At this time, I need this scope to deliver messages in a targeted manner.
At this point, we have finished sharing the core processing flow of IM messages, and provided a reliable message delivery mechanism through layer-by-layer disassembly logic.
Appendix: More articles on IM architecture design
"On the architecture design of IM system"
"A brief description of the pits of mobile IM development: architecture design, communication protocol and client"
"A set of mobile IM architecture design practice sharing for massive online users (including detailed graphics and text)"
"An Original Distributed Instant Messaging (IM) System Theoretical Architecture Plan"
"From Zero to Excellence: The Evolution of the Technical Architecture of JD's Customer Service Instant Messaging System"
"Mushroom Street Instant Messaging/IM Server Development Architecture Selection"
"Tencent QQ's 140 million online users' technical challenges and architecture evolution PPT"
"WeChat background based on the time series of massive data cold and hot hierarchical architecture design practice"
"WeChat Technical Director Talks about Architecture: The Way of WeChat-Dao Zhi Jian (Full Speech)"
"How to Interpret "WeChat Technical Director Talking about Architecture: The Way of WeChat-The Road to the Simple""
"Rapid Fission: Witness the evolution of WeChat's powerful back-end architecture from 0 to 1 (1)"
"17 Years of Practice: Technical Methodology of Tencent's Massive Products"
"How to ensure the efficiency and real-time performance of large-scale group message push in mobile IM? 》
"Discussion on the Synchronization and Storage Scheme of Chat Messages in Modern IM System"
"IM Development Basic Knowledge Supplementary Lesson (2): How to design a server-side storage architecture for a large number of image files? 》
"IM Development Basic Knowledge Supplementary Lesson (3): Quickly Understand the Principles of Separation of Reading and Writing of Server-side Databases and Practical Suggestions"
"IM Development Basic Knowledge Supplementary Lesson (4): Correctly understand Cookie, Session and Token in HTTP short connection"
"WhatsApp Technical Practice Sharing: A Technical Myth Created by a 32-person Engineering Team"
"Technical Challenges and Practice Summary Behind the 100 Billion Visits of WeChat Moments"
"Behind the 200 million users of King Glory: product positioning, technical architecture, network solutions, etc."
"MQ message middleware selection for IM system: Kafka or RabbitMQ? 》
"Summary of Tencent's Senior Architect Dry Goods: An article to understand all aspects of large-scale distributed system design"
"Take Weibo application scenarios as an example to summarize the architectural design steps of massive social systems"
"Quickly understand the principle of load balancing technology on the high-performance HTTP server"
"Behind the glamorous bullet message: the chief architect of Netease Yunxin shares the technical practice of the billion-level IM platform"
"Knowing the technology sharing: the road to practice of Redis high-performance caching from a single machine to 20 million concurrent QPS"
"IM development basic knowledge supplementary lesson (5): easy to understand, correctly understand and make good use of MQ message queue"
"WeChat Technology Sharing: Practice of Generating Massive IM Chat Message Sequence Numbers in WeChat (Principles of Algorithms)"
"WeChat Technology Sharing: Practice of Generating Massive IM Chat Message Serial Numbers in WeChat (Disaster Recovery Plan)"
"Getting Started: A Zero-Basic Understanding of the Evolution History, Technical Principles, and Best Practices of Large-scale Distributed Architectures"
"A set of high-availability, easy-scalable, and high-concurrency IM group chat and single chat architecture design practices"
"Alibaba Technology Sharing: Demystifying the 10-year history of changes in Alibaba's database technology solutions"
"Alibaba Technology Sharing: The Hard Way to Growth of Alibaba's Self-developed Financial-Level Database OceanBase"
"Social Software Red Envelope Technology Decryption (1): Comprehensive Decryption of QQ Red Envelope Technology Scheme-Architecture, Technical Implementation, etc."
"Social Software Red Envelope Technology Decryption (2): Decrypt WeChat Red Envelope Technology Evolution from 0 to 1"
"Social Software Red Envelope Technology Decryption (3): The technical details behind the WeChat Shake Red Envelope Rain"
"Social software red envelope technology decryption (4): How does the WeChat red envelope system deal with high concurrency"
"Social software red envelope technology decryption (5): How does the WeChat red envelope system achieve high availability"
"Social software red envelope technology decryption (6): The storage layer architecture evolution practice of WeChat red envelope system"
"Social software red envelope technology decryption (7): Alipay red envelope massive high-concurrency technical practice"
"Social Software Red Envelope Technology Decryption (8): Comprehensive Decryption of Weibo Red Envelope Technology Plan"
"Social software red envelope technology decryption (9): talk about the functional logic, disaster tolerance, operation and maintenance, architecture, etc. of the mobile Q red envelope"
"Social software red envelope technology decryption (10): Mobile QQ client's technical practice for the 2020 Spring Festival red envelope"
"Social Software Red Envelope Technology Decryption (11): Random Algorithm for Decrypting WeChat Red Envelope (including code implementation)"
"Introduction to Instant Messaging: What is Nginx?" Can it achieve IM load balancing? 》
"Introduction to Instant Messaging: Quickly Understand RPC Technology-Basic Concepts, Principles and Uses"
"Comparing 5 mainstream distributed MQ message queues in multiple dimensions, mom no longer worry about my technology selection"
"From guerrilla to regular army (1): the evolution of the IM system architecture of Mafengwo Travel Network"
"From guerrilla to regular army (2): Mafengwo Travel Network's IM Client Architecture Evolution and Practice Summary"
"From Guerillas to Regular Army (3): Technical Practice of Distributed IM System of Mafengwo Travel Network Based on Go"
"IM Development Fundamentals Supplementary Lesson (6): Does the database use NoSQL or SQL? Enough to read this! 》
"The data architecture design of Guazi IM intelligent customer service system (organized from the on-site speech, with supporting PPT)"
"Ali DingTalk Technology Sharing: Enterprise-level IM King-DingTalk's outstanding features in the back-end architecture"
"Design Practice of a New Generation of Mass Data Storage Architecture Based on Time Sequence in WeChat Backend"
"IM Development Basic Knowledge Supplementary Lesson (9): Want to develop an IM cluster? First understand what RPC is! 》
"Ali Technology Sharing: E-commerce IM messaging platform, technical practice in group chat and live broadcast scenarios"
"A set of IM architecture technical dry goods for hundreds of millions of users (Part 1): overall architecture, service split, etc."
"A set of IM architecture technical dry goods for hundreds of millions of users (Part 2): reliability, orderliness, weak network optimization, etc."
"From novice to expert: How to design a distributed IM system with billions of messages"
"The Secret of the IM Architecture Design of Enterprise WeChat: Message Model, Ten Thousands of People, Read Receipt, Message Withdrawal, etc."
"Rongyun Technology Sharing: Fully Revealing the Reliable Delivery Mechanism of 100 Million-level IM Messages"
This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".
▲ The link of this article on the official account is: click here to enter. The synchronous publishing link is: http://www.52im.net/thread-3638-1-1.html