Guide:
NetEase Yunxin's new IM top-tier product "Circle Group" has attracted great attention after its debut. Many Yunxin customers are also very concerned about the underlying technical details and principles of "Circle Group" while accessing. For this reason , we decided to launch a series of technical articles related to Yunxin's "Circle Group" to share some of NetEase Yunxin's thoughts on the technical design of "Circle Group".
Text|Cao Jiajun, Senior Server Development Engineer, NetEase Yunxin
1. Technical selection
Before introducing the technical details of "Circle Group", let's analyze the technical characteristics of "Circle Group" (to understand "Circle Group", you can read the following two articles: Industry Insights 丨 Behind Discord's rush and the long term of NetEase Yunxin "Circle Group" Doctrine or officially out of "Circle"丨NetEase Yunxin Circle Group's short-term planning and foresight ), what is the biggest feature of "Circle Group" products? The first is the secondary structure of server/channel ; the second is a large-scale community (hundreds of thousands or even millions of members in a single server) built on the secondary structure, and the use of a complex identity group system to manage such a large-scale community Group organization and members.
So for such a novel IM system, how should it be implemented technically?
A simple idea is to transform the existing IM system . For a Discord-like community such as "Circle Group", the first idea is to expand our group function. At first glance, it is quite similar in many aspects. We do A simple comparison:
As can be seen from the above table, the biggest difference between "circle group" and group is that one is the difference in capacity and the other is the secondary structure. Others, such as identity groups and personalized push strategies, seem to be fine as long as they are adapted. So, should we just find a way to increase the capacity of the group and encapsulate the secondary structure at the business layer? The answer is obviously no, or at least scaling based on groups is not a good idea.
The first is the secondary structure . In the secondary structure of the Discord class, the management of members is at the server layer, while the channel members are inherited from the server, and there are many visibility configurations on top of the channel ( Yunxin "Circle Group" Provides a black and white list mechanism, and Discord provides viewing channel permissions ), under this mechanism, any member change at the server level may affect all or part of the channel member list; in the face of this complex structure, the group There are two ways to realize it, one is that N groups belong to the same server logically; the other is that one group is mapped to one server. Either way, let's not talk about the logic of message delivery, just the coupling of member management logic and the complexity of interweaving are enough to dissuade anyone.
The second is capacity . The capacity of conventional groups is generally only hundreds, and can be expanded to thousands at most. For the management of group members, we generally adopt a combination of full + incremental synchronization. The client and server are mapped to the same Group mirroring (group information + group members, etc.), at this time, many operations, such as group member display, retrieval, message Aite, etc., can be performed based on pure clients. The "circle group" requires hundreds of thousands or even millions of capacity. Obviously, the client cannot obtain all members at one time. If you join multiple servers at one time, the number of members will expand even more. Therefore, in the design of a large-scale community such as "Circle Group", a lot of logic will be transferred to the cloud . At this time, whether it is the SDK or the server, the original design logic needs to be modified.
In addition, large-scale communities bring about an explosion of news . In the original group design, if a person joins 1,000 groups at the same time, all messages in these 1,000 groups will be sent to the Client, but in general business scenarios, not all groups are active at the same time. Assuming that these 1,000 groups become 1,000 servers/channels, as a community organization, the possibility of being active at the same time will be greatly increased. Moreover, the number of people per server/channel far exceeds that of ordinary groups. The message explosion phenomenon brought about by superposition will bring great pressure to the original group system. The pressure includes many aspects: the first is the storage of massive messages pressure , followed by the bandwidth and server pressure brought by mass message online broadcasting/offline message push, and how the client can effectively accept and reasonably display when faced with the impact of a large number of messages.
In addition to capacity and secondary structure, including identity group, member management, personalized push strategy , etc., is it really suitable to add these complex logics to groups? Forcibly bound together, will there be no useful class? The Discord platform also makes the original group functions complicated, but reduces the ease of use?
After some analysis above, we can basically draw a conclusion that it is obviously not a good idea to extend the existing group to realize a community with Discord-like functions, so are there other "shortcuts"? Chat room is also a potential option. A major feature of chat room is that it supports super large-scale simultaneous online (Reference article: NetEase Practice | Tens of millions of online live broadcast barrage solutions ), capacity does not seem to be a problem, but when considering adding some other strong When it comes to the characteristics of social relationships (such as members, identity groups, etc.), it is a bit embarrassing. The chat room itself is an open space that can come and go freely. This conflicts with the positioning of the products of the circle group itself. Therefore, the plan based on the expansion of the chat room Also basically lost the pass.
2. Technical difficulties
Based on the above-mentioned thoughts and discussions, NetEase Yunxin finally chose to break away from the existing IM system and developed a new community program "Circle Group" from scratch. "Circle Group" is not a simple IM function, but a set of The independent running IM system , after the above discussion, I believe that everyone has an understanding of the technical characteristics and difficulties of the "circle group" itself, which can be summarized as follows:
- Design of a social relationship system with unlimited members under the secondary structure.
- The design of the message system under the super community.
- Complex and efficient identity group system design.
3. Technical Analysis of "Circle Group" Message System
This article will focus on the second of the above three points: the design of the message system under the super community, and share some of our technical design principles and experience.
(1) The overall structure of the “Circle Group”
The above shows the overall architecture of the "Circle Group" service. It can be seen that the entire "Circle Group" service is a layered architecture . The first is the access layer , including the LBS service, the long link server and the API gateway, corresponding to the client SDK and User server; behind it is the network layer , including the large network WE-CAN and protocol routing services; followed by the service layer , which is divided into multiple service modules, each of which includes multiple microservices; and finally the infrastructure .
(2) Messaging system architecture
The modules associated with the message system include the access layer, the network layer, and the back-end login/subscription/message/retrieval modules. The basic architecture is as follows:
Below we will introduce each module in the message system separately.
(3) Technical details of the message system
The first point to be discussed in the message system is the storage and distribution of messages, including online broadcasting, offline push, and historical messages.
online radio
For a general group, the general process of online broadcasting is as follows: query the online status of everyone in the group in turn, and if online, send a message to the corresponding long-link server. Obviously, this mechanism cannot be replicated to the "circle group", because there may be more than 100w people in one server of the "circle group" ; in addition, the broadcast mode of the chat room cannot be directly reused, because in the chat room architecture, each Each long link is mapped to a chat room, so when you log in to a chat room, you will only receive messages from that chat room, and for "circle group", each user will join multiple servers at the same time /channel, and will receive messages from multiple servers/channels at the same time.
In response to the above characteristics of "circle group", Yunxin has designed a message subscription model , that is, after users log in, they need to subscribe to relevant servers/channels they are interested in. The server will record the subscription information. When there is a new message, the server will pass The subscription relationship (instead of online status) queries the list that needs to be broadcast, in this way, it is no longer necessary to traverse all users in the server/channel;
But when there are a lot of people online in a server/channel, the subscription relationship is still huge. For this reason, Yunxin has designed a two-tier subscription model , and all subscription relationships will be saved on the long-link server (QChatLink/QChatWebLink). ), and the long-link server will regularly send heartbeats to the back-end subscriber servers. The heartbeat information will be greatly simplified compared to the original subscription information . For example, the long-link server will record the message that account A subscribes to a channel A. If there are 1w account, there are 1w subscription records, and the heartbeat information will only report the news that 1w individuals subscribed to a channel A, and the specific account list has been simplified; when a message needs to be broadcast, the message service will access the subscription service, obtain the list of long-link servers to which the server/channel is subscribed, and send messages to the long-link servers in the list in turn to send notifications. After the long-link server receives the notification, it will broadcast to all clients according to the subscription details.
In addition, we also provide a variety of subscription types , when you are very concerned about a channel message (such as the page is staying on the channel), you can subscribe to the channel's message; for other channels, if you only need to know the How many unread messages the channel has (or whether there are unread messages), you can choose to subscribe to the unread count (or unread status) of the channel. At this time, only the simplified message body will be broadcast when the service is delivered to maintain customers. The unread count on the client side, and when the unread count reaches a certain threshold (such as 99+), the server can choose not to send any notification messages without affecting the user experience.
Through the message subscription model introduced above, the efficiency of online broadcast of super-large circle channel/server messages is greatly improved, and the pressure on the server is reduced; in addition, we have also designed a special strategy for small channels. Channel, even if you do not subscribe, the server will send a message to notify everyone in the channel, thereby reducing the maintenance cost of the terminal-side message subscription model; for the message subscription mechanism itself, we will provide more information based on different business scenarios in the future. A one-stop strategy to help reduce access costs and improve overall ease of use.
Offline push
In a strong social scenario, offline push plays a great role in maintaining user stickiness + improving product experience . From a technical point of view, it mainly solves two problems. The first is the efficiency of message push of super-large servers/channels; the other is to provide a rich enough push strategy to help C-end users and avoid being sent by excessive push messages. disturb.
In response to the first question, Yunxin "Circle Group" adopts different strategies for servers/channels of different scales. For small channels , a message push model similar to a group is adopted; for large channels , for each message that needs to be pushed For messages, the task will be sharded according to the ID of the target user, and multiple nodes will operate in parallel to improve the push efficiency. In addition, sharding will adopt a consistency strategy to ensure that a single user is fixed to certain nodes, thereby improving the efficiency of cache hits.
In response to the second question, the push strategy of Yunxin's "Circle Group" can be described in the following sentences:
- Not only pay attention to promoting life, but also ensure not to disturb
- Large servers are playgrounds and only push important messages related to users (like @messages)
- Small server is a small world to get along with friends, supporting all push messages
And in the future, users can also customize the priority of messages , and match different push configurations (such as different DND configurations, etc.):
- historical news
The storage of historical messages in the "circle group" scenario also requires some special design. Taking groups as an example, generally speaking, there are two ways to store messages, write diffusion and read diffusion . In small groups or multi-person conversations, the write diffusion mode can simplify the design, but when the group size expands to a certain extent Read diffusion becomes a choice if the level (such as 10,000 people), and for a "group" such as a "circle group" that may have millions of people on a single server, in addition to the regular read diffusion, we also designed a multi-level The structure of the cache is used to deal with massive read requests. The basic storage architecture is roughly as follows:
The storage of messages mainly includes two parts, one is the message itself , and the other is the unread count .
The first is writing. For the above two, we will use a centralized cache server to store recent data, and use asynchronous + batch + aggregation and other means to asynchronously drop the database through MQ, so as to balance the writing efficiency (single write performance Low) and write and read latency (asynchronous write has latency), and for the characteristics of different data types, we also choose different storage schemes (distributed time series database for historical messages, and distributed unread counts). kv database) to maximize the performance and efficiency of message storage and query.
For read operations, all recent messages and unread counts are stored in the centralized cache, and different strategies such as first-in, first- out and cache expiration are used to ensure that the cache is always up-to-date and up-to-date. The hottest data; in addition, for the message ID and the message content itself, the centralized cache will also have different data structures and expiration strategies to balance the cache hit rate and cache capacity consumption; when the cache expires, if there is an associated read and write request, will trigger the reconstruction of the cache to ensure that the cache hit rate is always kept at a high water level; finally, when there are high-frequency read requests, it will also trigger the detection of the hotspot cache, and sink some of the read requests to each computing In the memory of the node to cope with the shock of burst traffic.
The above-mentioned special design for "circle group", the message storage system can handle small circle group channels with dozens of hundreds of people, and can also calmly handle millions of super-large channels.
Special feature
After talking about the core storage and distribution mode of the message system, the message system of "Circle Group" also provides many additional features:
- Message update: The so-called message update means that the message is allowed to be modified after it is sent. Message withdrawal and deletion are considered as a special state of message update, which plays an important role in managing messages in a large community.
- Message interaction (coming soon): "Circle Group" provides a stronger thread chat function than Discord's message reply. Unlike the sub-area, which is a separate space, thread chat allows you to automatically filter out information about the complex message flow. All associated messages for a topic. "Circle Group" also provides a quick comment function, you can add various custom expressions to a message, which is almost just needed in large channels, because you no longer have to suffer from news explosions. Yunxin has also specially optimized the quick comments for large channels. When a message is unanimously liked by everyone, everyone in the channel can give it a 😊 or 👍, and there is no upper limit on the number.
- Message retrieval (please look forward to it): The retrieval system of "Circle Group" is also a major feature, and message retrieval is naturally an important part of it. Message retrieval will help you find the news you want in the complicated news flow
- Third-party callback and cc: This fully reflects the major feature of “Circle Group” as a product of the Yunxin PaaS platform. Through third-party callback and cc, you can perform various operations (such as sending messages, pulling people Before and after kicking people, modifying information, etc.), implant any logic you want, such as robots, content auditing, etc. (Of course, Yunxin also has a one-stop content security solution for Safetong, which can be selected on demand).
4. Summary
Having said so much, as a newly designed product, Yunxin "Circle Group" does not have any limitations of historical baggage (but it can fully absorb historical advantages), you can use it to build a Discord-like product, or anything you want. Social/entertainment/game products, you are welcome to choose.
about the author
Cao Jiajun, senior server development engineer of NetEase Yunxin, graduated from the Chinese Academy of Sciences. After graduating with a master's degree, he joined NetEase and is responsible for the server development of Yunxin IM/RTC signaling and other services. Focusing on technologies such as instant messaging, RTC signaling and related middleware, he is the author of Camellia, an open source project of Yunxin.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。