1

1. Background

The customer service one-stop workbench includes four functional modules: online, telephone, work order and tools. Many common modules, such as work order details and order details, are nested in the form of iframes, which is time-consuming in the system loading process. In addition, the online message communication module strongly relies on the third-party SDK of tinode. Many methods are It directly calls the API provided by tinode, and also inherits many unreasonable ways of tinode. Since the use of tinode so far, due to the investment of iterative resources, there has been no optimization and improvement of the tinode source code. When the mode of message communication is changed to After the broadcast, the session stuck problem was exposed. By reading the message link module of the tinode source code, it is found that there is a lot of room for optimization. This article is a specific optimization implementation for the message link.

Second, find the problem

1. There are defects in the message data processing process

After reading the source code of the third-party sdk of tinode, I found that the customer service has a lot of room for optimization in the link of "receiving" and "sending" messages. In the original logic, from sending messages to quickly rendering pages to tinode response After returning the result and then refreshing the rendering page, and when the customer service receives the message, the entire message will be refreshed, deserialization, sorting, deduplication, status processing, etc. , all require multiple cycles, plus the communication mode change. For the broadcast mode, the large amount of data cyclic task is a serious challenge for performance.

Customer service "receive" and "send" message link overview (1)

Overview of the "receive" and "send" message links of customer service (2)

There are many for loops in the red area in the figure, which is the most time-consuming scenario . The reason is to obtain the communication records between users and customer service ( the method provided in the original tinode, topic.message() will be executed n times ), deserialization, Session state processing, sorting, and deduplication will traverse all chat messages, and deserialization is the most time-consuming scenario. In addition, JavaScript is a single thread, and if the number of traversals is too large, it will cause blocking. As a result, when the customer service is switching sessions quickly, the loop has not ended, and the page has not been rendered, resulting in a freeze phenomenon.

3. Optimization ideas

When each user enters the line from the client to the agent customer service workbench, a session id (sessionId) will be generated, and each manual message under each session id will have a message id (msgid)

There are many rounds of messages that customer service communicates back and forth with users. In order to reduce the operation of multiple loops that reduce performance in the "old code", the core task is to try to avoid traversing the message data of the chat (because there are too many messages) , follow the principle of not traversing the chat messages without traversing, and rewrite the "duplication" and "sorting" logic in the original logic. At this time, the session id and message id mentioned above play a very important role effect.

1. De-duplication

In this optimization plan, a msgidCacheMaps Map data structure is globally maintained. This data structure has two dimensions, sessionId and msgid, which are used to save the msgid of each message in the current session (sessionId). In the message dialogue, the messages sent by the human customer service It will go through two stages from virtual messages to real messages (the virtual message here refers to the fact that after the customer service sends a message to the gateway in a manual conversation, in order to quickly display the message in the chat area, a virtual message is generated by the previous message seq + 0.002 The seq is: virtualSeq , wait until the gateway returns the real seq, and then replace the virtualSeq with the real seq), the virtual message stage will save the msgid to the Map, for the message pushed by the system, there is no msgid, you do not need to go through this process, directly Put it into the session pool, the real message (tinode returns seq) stage, query according to the msgid to the msgidCacheMaps Map data structure, the existence of this msgid means that it is duplicate data, and it can be replaced with seq.

2. Sort

This optimization plan is to use the binary search insertion sort * method to maintain a seqCacheMaps Map data structure globally. This data structure is somewhat similar to the above deduplication, and also has two dimensions, sessionId and seq. The binary search insertion sort method, use seq ( real seq ) and virtualSeq ( virtual seq ) are used as the basis for searching, each time a message comes in, quickly find the current seq insertable position according to the dichotomy method, virtual message stage, direct insertion, real message stage (msgidCacheMaps has this msgid), directly Replacement, but there is a problem at this time, because every message sent by the customer service to the user during the manual session will be checked for sensitive words at the gateway, and if no sensitive words are triggered, the message will be sent to the customer service and displayed to the user. If a sensitive word is triggered, the message containing the sensitive word will be intercepted by the gateway, and the message will not reach the user side. At this time, the gateway will not return the seq, so what should I do if the seq is not returned? That is, in the return stage of tinode, the previous virtualSeq will be replaced with the previous message seq + 0.002 to ensure that its position is ordered and displayed in the chat area*.

"Deduplication" and "Sort" overview

3. Cache recycling (end session destruction)

As mentioned in the above deduplication and sorting, in order to reduce the number of traversals, two data warehouses (msgidCacheMaps Map data structure, seqCacheMaps Map data structure) are maintained globally, but the daily session volume of each customer service is 100+, plus each The number of back-and-forth messages between the customer service and the user in the conversation is about 40+. If the customer service checks the historical messages again, there are 20 messages per page. If you only save but not delete, the amount of data stored is still relatively large, which may easily lead to memory overflow. So when will you delete it? more appropriate? According to the business situation, the final choice is to destroy the hash map mounted on the global and release the memory when the session is ended, the session is transferred, and the push is offline.

Overview of Data "Storage" and "Deletion"

4. Message status

The message status here refers to: read, unread, received, sending, sending failed, etc.

During the communication between the customer service and the user, the message status displayed on the customer service side and the user side is updated in real time. The customer service sends a message to the user. When the user reads the message, it will return the info protocol (push message notification) to tell h5 The message has been read on the side, and then the h5 side updates the status of the message

  • Original processing method: After the customer service sends a message to the user, it traverses all the historical messages of the user in the current session and performs all reset operations. At this time, if there are many messages communicated between the user and the customer service, it will lead to traversal The number of times is large, resulting in serious performance consumption and other problems.
  • Optimization plan: first filter out historical messages and messages sent by non-customers, find the message by means of dichotomy, and then directly change the status. After receiving the message sent by the user, update the status of the messages sent by the customer service in messagePools (all conversations of the current user) to read in reverse order, because since the users have all sent messages, it means that the messages sent by the customer service have been read. , you don't need to traverse each message to set the state according to the old logic before, which wastes performance. Except for the messages that are being sent and failed to be sent, all are rendered as read.
  • Specific implementation : The client pushes the long-chain note event to tell H5 that the H5 side records the seq of the message that has been read, and updates the status of the message data sent by the customer service whose seq is less than or equal to seq, that is: recv(received) => read( Have read)
  • Sending a message: At present, sending a message will only be performed twice. The first time it will quickly display the message on the communication page, and then send the message (wss). When the ack is received, the message status will be updated twice. msgid will find the message that needs to be updated to update, and it is no longer necessary to use the topic.message method provided by tinode for full traversal
  • Receiving messages: Customer service will only trigger a message update when receiving user messages. There is no need to traverse the full data of the current user to update the new state, and it will also return ack.

5. Sensitive word interception processing

After the user enters the IM chat page, the messages sent between the user and the agent customer service will be monitored for online sensitive words (only monitoring and prohibition of sending).

  • Original solution: After editing the message, the agent customer service clicks send and calls the back-end sensitive word interface. It can only be sent after the sensitive word verification is not triggered. If there is a network fluctuation and the interface returns slowly, it will make the customer service feel like sending a message Cards to get out.
  • Optimization scheme: Through the gateway interception, when the customer service sends a message, it is directly rendered to the chat area, and the gateway checks whether the sent message triggers a sensitive word. If a sensitive word is triggered, the gateway will return a status to tell h5, and h5 will follow The returned result changes the state to prompt the customer service.

Sensitive word logic overview

4. Data comparison before and after optimization

The overall implementation of the optimized link technical solution was released and launched on February 28. Therefore, with February 28 as the cut-off point , the data comparison before and after optimization was pulled, as shown below.

1. Before optimization

As shown in the figure above, two data indicators of the total incoming line from February 1, 2022 to February 28, 2022 are counted:

  • Average first response time: 8.40 seconds
  • Average response time: 19.9 seconds

2. After optimization

As shown in the figure above, two data indicators of the total incoming line from March 1, 2022 to March 9, 2022 are counted:

  • Average first response time: 6.82 seconds, 1.58 seconds less than before optimization
  • Average response time: 18.22 seconds, 1.68 seconds less than before optimization

V. Summary

Generally speaking, the number of users and activity of IM products are usually very large, and it is easy to cause traffic peaks at some special time points. Therefore, it is technically necessary to be able to deal with sudden orders of magnitude. At the same time, IM generally mainly includes these four characteristics : Real-time, reliability, consistency, security , there is still a long way to go for the optimization of IM. Under the condition of ensuring the stability of the business, we will continue to work hard to polish the four characteristics in the future, so as to meet the requirements. Its own IM SDK is becoming more and more perfect, forming a benchmark for industry message communication.

Text/YU BO

Pay attention to Dewu Technology and be the most fashionable technical person!


得物技术
854 声望1.5k 粉丝