Chat technology in live broadcast system (6): technology practice of real-time chat message distribution in live broadcast room with millions of people online

This article was originally shared by the Rongyun technical team. The original title "Message Discarding Strategy for Mass Message Distribution in Chat Rooms" has been revised.

1 Introduction

With the popularization of live broadcast applications, especially the concept of live broadcast with goods, the scene of live broadcast room with large number of users has become normal.

The real-time interaction in the live broadcast room with a large number of users is very frequent, which is technically reflected in real-time messages such as various user chats, barrages, gifts, likes, bans, and system notifications.

How to deal with such a large amount of real-time messages when distributing them so as not to crash the server, and when it reaches the client, it will not make the APP frantically swipe and freeze (so as not to affect the user experience), which obviously requires special Technical means and implementation strategies can be dealt with.

In fact, the real-time message distribution in the live broadcast room is technically the same as the concept of the traditional online chat room, but in the traditional Internet era, the number of simultaneous online users in the chat room will not be so large, although the magnitude is different, But the technical model is completely applicable.

Based on the background of live broadcast technology practice, this article will share the technical experience summary of real-time message distribution with millions of users online in a single live broadcast room, hoping to bring you inspiration.

study Exchange:

Introductory article on mobile IM development: "One entry is enough for beginners: developing mobile IM from scratch"
Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK
(This article has been simultaneously published on: http://www.52im.net/thread-3799-1-1.html)

2. Series of articles

This article is the sixth in a series of articles:

"Chat Technology of Live Broadcasting System (1): The Road to Real-time Push Technology Practice of the Million Online Meipai Live Broadcasting Barrage System"
"Chat technology of live broadcast system (2): Alibaba e-commerce IM messaging platform, technical practice in the scene of group chat and live broadcast"
"Chat technology of live broadcast system (3): The evolution of the message structure of WeChat live chat room with 15 million online messages in a single room"
"Chat Technology of Live Broadcasting System (4): Real-time Messaging System Architecture Evolution Practice for Massive Users of Baidu Live Broadcasting"
"Chat technology of live broadcast system (5): Cross-process rendering and streaming practice of WeChat mini-game live broadcast on Android side"
"Live System Chat Technology (6): Real-time Chat Message Distribution Technology Practice in Live Room with Millions of People Online" (* this article)

3. Technical challenges

Let’s analyze a live broadcast room watched by a million people as an example to see what technical challenges need to be faced.

1) There will be waves of message peaks in the live broadcast, such as the "screen-swiping" messages in the live broadcast, that is, massive real-time messages sent by a large number of users at the same time period. Generally, the message content of such "screen-swiping" messages is basically the same. same. If all messages are displayed on the client, the client is likely to experience problems such as lag and message delay, which seriously affects the user experience.

2) In the case of massive messages, if each message on the server is stored for a long time, it will lead to a surge in the usage of the service cache, making memory and storage performance bottlenecks.

3) In other scenarios, such as the notification message or system notification after the operation of the room administrator in the live broadcast room, this kind of message is generally more important, and how to prioritize its arrival rate.

Based on these challenges, our services need to be optimized based on business scenarios.

4. Architecture model

Our architecture model diagram is as follows:

As shown in the figure above, the following will briefly describe the main services.

1) Live room service:

The main function is to cache the basic information of the live room. Including user list, banned/banned relationship, whitelisted users, etc.

2) Message service:

The main function is to cache the user relationship information and message queue information that the node needs to process.

Specifically, the following two main things.

User relationship synchronization in the live broadcast room:

a) When a member actively joins and exits: the live broadcast room service is synchronized to ==> message service;
b) When it is found that the user is offline after distributing the message: the message service is synchronized to ==> the live room service.

Send a message:

a) The live room service broadcasts the message to the message service after passing the necessary verification;
b) The live room service does not cache the message content.

3) Zk (that is Zookeeper):

The main function is to register each service instance to Zk, and the data is used for the calculation of the placement point when the service is transferred.

Specifically:

a) Live room service: place according to the live room ID;
b) Message service: according to the user ID placement.

4）Redis：

It is mainly used as a secondary cache and a backup of memory data when the service is updated (restarted).

5. Overall scheme of message distribution

The complete logic of message distribution in the live room service mainly includes: message distribution process and message pull process.

5.1 Message Distribution Process

As shown in the figure above, our message distribution process mainly consists of the following steps:

1) User A sends a message in the live room, which is first processed by the live room service;
2) The live room service synchronizes messages to each message service node;
3) The message service sends a notification pull to all members of the cache of this node;
4) As shown in the "message service-1" in the figure above, a notification will be sent to user B.
In addition, because the amount of messages is too large, we have a notification merging mechanism in the process of distribution. The notification merging mechanism is mainly mentioned in the above step 3.

The principle of the notification merging mechanism in the above step 3 is as follows:

a) Add all members to the queue to be notified (if it already exists, update the notification message time);
b) Distribute the thread, and obtain the queue to be notified by rotation training;
c) Send notification pulls to users in the queue.
Through the notification merging mechanism, we can ensure that the issuing thread will only send one notification pull to the same user in one round, that is, multiple messages will be merged into one notification pull, which effectively improves the performance of the server and reduces the performance of the client. Network consumption with the server.

PS: The above notification merging mechanism is very suitable for implementing the Actor distributed algorithm in the case of a large amount of messages. Interested students can further study "The Actor Model is So Excellent under Distributed High Concurrency" and "Distributed Computing Technology" Actor Computation Mode".

5.2 Message Pull Process

As shown in the figure above, our message pulling process mainly consists of the following steps:

1) User B will send a pull message request to the server after receiving the notification;
2) The request will be processed by the "message service-1" node;
3) "Message Service-1" will return the message list from the message queue according to the timestamp of the last message passed by the client (see the following figure for the principle);
4) User B gets a new message.
The specific logic of pulling messages in the above step 3 is shown in the following figure:

6. Discard strategy for message distribution

For users in the live broadcast room, many messages do not actually have much practical significance, such as a large number of repeated screen refresh messages and dynamic notifications, etc. In order to improve the user experience, such messages can be strategically discarded (this It is the biggest difference from real-time chat messages in IM, which is not allowed to lose messages).

PS: The discarding strategy of message distribution in the live broadcast room, together with the notification merging mechanism in the previous section, makes it possible to distribute massive amounts of messages directly and smoothly.

Our drop strategy mainly consists of the following 3 parts:

1) Uplink rate limit control (drop) strategy;
2) Downlink rate limit control (drop) strategy;
3) Important message anti-drop strategy.

As shown below:

Let's explain them one by one.

1) Uplink rate limit control (drop) strategy:

For the uplink speed limit control, we default to 200 pieces per second, which can be adjusted according to business needs. Messages sent after the rate limit is reached will be discarded in the live broadcast room service and will no longer be synchronized with each message service node.

2) Downlink rate limit control (drop) strategy:

For the downlink rate limit control, that is, the control of the length of the message circular queue (see the detailed logic diagram of pulling messages in "5.2 Message Pulling Process"), the oldest message will be eliminated and discarded after reaching the maximum value.

The server marks the user as "pulling" every time a notification is sent to pull, and the user removes the mark after actually pulling the message.

The role of the pull mark: For example, when a new message is generated, the user has a pull mark, and if the time to set the mark is within 2 seconds, the notification will not be issued (reduce the client pressure, discard the notification without discarding the message), if it exceeds 2 seconds Continue to send notifications (if the notification is not pulled multiple times in a row, the user will be kicked out of the policy, which will not be repeated here).

Therefore, whether the message is discarded depends on the client's pull speed (influenced by client performance and network), and the client pulls the message in time without the discarded message.

3) Important message anti-drop strategy:

As mentioned above: in the live room scenario, some messages should have higher priority and should not be discarded.

For example: a notification message or a system notification after the room administrator of the live broadcast room performs an operation.

For this scenario: we have set up the concepts of message whitelist and message priority to ensure that they are not discarded. As shown in the figure at the beginning of this section, there can be multiple message ring queues, which are separated from ordinary live room messages to ensure that important messages are not discarded.

Through the above "1) Uplink rate limit control (discard) strategy" and "Downlink rate limit control (discard) strategy", the following are guaranteed:

1) The client will not have problems such as lag and delay due to massive messages;
2) Avoid situations where messages are swiped and cannot be viewed with the naked eye;
3) At the same time, the storage pressure on the server is reduced, and the service will not be affected due to memory bottlenecks caused by massive messages.

7. Write at the end

With the development of the mobile Internet, the real-time message business model and pressure of the live broadcast room are constantly expanding and changing, and there may be more challenges in the future. Our services will keep pace with the times and follow up with better solutions. strategies to respond.

Appendix: Multi-Crowd Chat Technical Articles

[1] "Should "Push" or "Pull" be used for online status synchronization in IM single chat and group chat? 》
[2] "IM group chat messages are so complicated, how to ensure that they are not lost or heavy? 》
[3] How to ensure the efficiency and real-time performance of large-scale group message push in mobile IM? 》
[4] "Discussion on Synchronization and Storage Scheme of Chat Messages in Modern IM System"
[5] "Discussion on the disorder of IM instant messaging group chat messages"
[6] How to implement the read receipt function of IM group chat messages? 》
[7] "Should IM group chat messages be stored in one copy (ie, diffusion reading) or multiple copies (ie, diffusion writing)? 》
[8] "A set of high-availability, easy-to-scale, high-concurrency IM group chat, single chat architecture design practice"
[9] "The IM group chat mechanism, what other ways are there other than circularly sending messages? How to optimize? 》
[10] "NetEase Yunxin Technology Sharing: Summary of the Practice of Ten Thousand People Chatting Technology Solutions in IM"
[11] "Ali Dingding Technology Sharing: Enterprise-level IM King - Dingding's Superiority in Back-end Architecture"
[12] "Discussion on the realization of the read and unread function of IM group chat messages in terms of storage space"
[13] "Revelation of IM Architecture Design of Enterprise WeChat: Message Model, Ten Thousand Crowds, Read Receipts, Message Withdrawal, etc."
[14] "Rongyun IM Technology Sharing: Thinking and Practice of the Message Delivery Scheme for Thousands of People"

(This article has been simultaneously published on: http://www.52im.net/thread-3799-1-1.html)

Chat technology in live broadcast system (6): technology practice of real-time chat message distribution in live broadcast room with millions of people online

1 Introduction

2. Series of articles

3. Technical challenges

4. Architecture model

5. Overall scheme of message distribution

6. Discard strategy for message distribution

7. Write at the end

Appendix: Multi-Crowd Chat Technical Articles

JackJiang

引用和评论

长连接网关技术专题(十二)：大模型时代多模型AI网关的架构设计与实现

极致出海友好，融云 IM 支持消息免打扰设置时区

Bilibili直播信息流：连接方法与数据解析

支持百万人超大群聊的Web端IM架构设计与实践

全平台开源即时通讯IM框架MobileIMSDK：7端+TCP/UDP/WebSocket协议

uniapp ios打包保利威直播SDK后，聊天室提示viewerI不能为空是什么意思？

uniapp集成保利威直播SDK，ios为什么不能后台挂起uniapp插件？