1
头图

This article was shared by Yi Ang, the technical team of Ali Xianyu. The original title of "Optimization of Weak Perception Links in Message Link Optimization" has been revised and changed. Thanks to the author for sharing.

1 Introduction

Xianyu’s IM messaging system serves as a communication tool between buyers and sellers to enhance understanding and promote trust. It is of great value to Xianyu’s commodity transactions and is the most critical link in enhancing user experience.

However, with the rapid growth of business volume, the current messaging system is facing many urgent problems to be solved.

The following issues are the most typical:

1) Improve the experience of online messaging;
2) The arrival rate of offline push;
3) The coupling between message play and the underlying message system is too strong.

After evaluation, we believe that the arrival rate of offline push is the most critical issue at this stage, which has a greater impact on user experience.

This article will share the technical practice of Xianyu IM messages in solving the arrival rate of offline push. The content includes problem analysis and technical optimization ideas. I hope to inspire you.

study Exchange:

  • 5 groups for instant messaging/push technology development and communication: 215477170 [recommended]
  • Introduction to Mobile IM Development: "One entry is enough for novices: Develop mobile IM from scratch"
  • Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK

(This article has been simultaneously published at: http://www.52im.net/thread-3748-1-1.html )

2. Series of articles

This article is the sixth in a series of articles. The general content is as follows:

"Alibaba IM Technology Sharing (1): Enterprise-level IM King-Nailed in the back-end architecture"
"Alibaba IM Technology Sharing (2): Xianyu IM's Flutter-based mobile terminal cross-terminal transformation practice"
"Alibaba IM Technology Sharing (3): The Road to the Evolution of the Architecture of Xianyu's Billion-level IM Message System"
"Alibaba IM Technology Sharing (4): Reliable Delivery Optimization Practice of Xianyu's 100-million-level IM Message System"
"Alibaba IM Technology Sharing (5): Timeliness Optimization Practice of Xianyu's Billion-level IM Message System"
"Alibaba IM Technology Sharing (6): Optimization of the offline push arrival rate of Xianyu billion-level IM messaging system" (* This article)

3. The division of communication link types

From the technical point of view of data communication links, we roughly divide the overall message link into a strong sense link and a weak sense link according to whether the Xianyu client is online.

The strong sense link consists of the following subsystems or modules:

1) The sender client;
2) idleapi-message (Xianyu's message gateway);
3) heracles (Xianyu's message underlying service);
4) accs (a long connection channel developed by Ali);
5) The receiver client is composed.

The core indicators of the entire link are end-to-end delay and message arrival rate.

Both parties in the strong sensing link are online, and the message reaches the client to ensure that the receiver is aware of it. The main pain point of the strong sense link is the end-to-end delay of the message.

The main difference between the weak-sensing link and the strong-sensing link is that the receiver of the weak-sensing link is offline and needs to be delivered in such a way as offline push.

Therefore, the user perception of the weakly aware link is not strong, and its core indicator is the arrival rate of the message, not the delay.

Therefore, at the current stage, the focus of optimizing weak-sensing links is to increase the arrival rate of offline messages. In other words, to improve the arrival rate of offline messages is to optimize the weakly aware link itself.

4. Overview of the message system architecture

The following figure is an architecture diagram of the entire IM messaging system, and feel the overall link:

As shown in the figure above, the division of labor between the main components and subsystems is as follows:

1) HSF is a remote service framework, which is the internal version of dubbo;
2) Tair is a distributed caching framework developed by Ali, which supports different storage engines such as memcached, Redis, and LevelDB;
3) agoo is Ali’s offline push platform, responsible for integrating offline push channels of different vendors and providing a unified offline push service to group users;
4) accs is a long connection channel developed by Ali, which facilitates the real-time two-way interaction between the client and the server;
5) Lindorm is a NoSQL product developed by Ali, which is similar to HBase;
6) The domain ring is the core structure of Xianyu message optimization performance, which is used to store the latest messages of the user.

Strong-sensing links and weak-sensing links are different in channel selection:

1) The strong sensing link uses the online channel accs;
2) The weak-sensing link uses the offline channel of agoo.
5. How to define a weakly aware link?

In layman's terms, the weakly aware link refers to the offline message push system.

Compared with online messages and end-to-end push (that is, the strong perception link mentioned above), offline push is difficult to ensure that users are aware of it.

Typical situations include:

1) Not sent to the user device: that is, the push has not been delivered to the user device. In this case, you can analyze the return from the channel;
2) Sent to the user's device but not displayed in the system notification bar: Xianyu once encountered a successful channel return, but the user did not see the push case;
3) Displayed in the notification bar and folded by the system: Different Android manufacturers have different folding strategies for push notifications. After being folded, the user needs to actively expand to see the content, and the reach effect is obviously worse;
4) Displayed in the notification bar and ignored by users: The click-through rate of offline push is lower than that of online push.

For "1) not sent to user equipment", the reasons are as follows:

1) The token of the offline channel is invalid;
2) Parameter error;
3) The user closes the application notification;
4) The user has uninstalled, etc.

For "3) displayed in the notification bar and folded by the system, the reasons are as follows:

1) The click-through rate of notifications;
2) The weight applied to the manufacturer;
3) The number of pushes, etc.

For "4) displayed in the notification bar and ignored by the user, the reasons are as follows:

1) The user is unwilling to view the push;
2) The user saw the push, but was not interested in the content;
3) The user is busy with other things and has no time to deal with it.

In short: the above offline message push scenarios are not highly perceptible to users, and we also call them weak perceptual links.

6. The logical composition of weakly aware links

Our weak perception link is divided into 3 parts, namely:

1) System;
2) Channel;
3) Users.

It includes Hermes, agoo, manufacturers, equipment, users, and acceptance pages. The details are shown in the figure below.

From the generation of the push to the user's final entry into the APP, it is divided into the following steps:

Step 1: Hermes is Xianyu's user reach system, responsible for crowd management, content management, and timing control. It is the starting point of the entire weak-sensing link. ;
Step 2: agoo is Alibaba's internal platform to undertake offline push, and it is the foundation of Xianyu's offline push capability;
Step 3: The offline push of agoo relies on the push channel of the manufacturer (such as Apple's apns channel, Google's fcm channel, and the self-built channels of various domestic manufacturers.;
Step 4: Through the channel of the manufacturer, the push will finally appear on the user's device, which is a prerequisite for the user to perceive the push;
Step 5: If the user happens to see this push, the content of the push is also very interesting. Under the user's active click, the APP will be invoked, the acceptance page will be opened, and the personalized products will be displayed to the user.

After the above 5 steps, the weak-sensing link has completed its mission.

7. Specific problems faced by weakly aware links

The core problems of weakly aware links are:

1) Whether the pushed message is delivered to the user;
2) Whether the user is aware of the delivered message.

This corresponds to the two stages of push:

1) Whether the push message has reached the device;
2) Whether the user views the push and clicks.

Among them: reaching the equipment stage is the most basic and the core of this optimization.

We can tile the message processing volume of each step in turn and expand it into a funnel chart to visually check the bottleneck of the link.

The place with the largest slope of the funnel chart is the focus of optimization, and the place with small differences does not need to be optimized:

By analyzing the above funnel diagram, the optimization of weak-sensing links focuses on three aspects:

1) agoo acceptance rate: refers to the funnel between the number of push requests we send to the number that can be forwarded to the vendor channel through agoo (the middle station that Alibaba undertakes offline push);
2) Vendor acceptance rate: refers to the funnel between the amount accepted by agoo middle station and the amount that the vendor returns successfully;
3) Push click-through rate: whether the message finally sent to the user terminal through the above channel is finally converted into the user's active "click".

With the optimization direction, let's take a look at the optimization methods.

8. Our technical optimization methods

Follow the push perspective and follow the link to see how we optimize.

8.1 agoo acceptance rate optimization
User’s push, take the "shuttle" from the Hermes site and drive to the next stop: agoo.

This is the first stop of the push experience. When I arrived at the station, I was dumbfounded, and only less than half of them were pushed to the station and got off the train. What's the matter?

Let’s talk about agoo first. There are two ways to call agoo:

1) Specify the device and client, agoo directly delivers the push to the corresponding device;
2) Specify the user and client, agoo finds the device corresponding to the user according to the internal conversion table, and then delivers.

Our system does not save the user's device information. Therefore, agoo is called according to the user.

At the same time: Since there is no user's device information, it is not known whether the user is an iOS client or an Android client. The project side had to send a push to both iOS and Android. Although the arrival is guaranteed, half of the calls are invalid.

To understand this problem: We used agoo's device information. Advance the user's device conversion stage before calling agoo, first clarify the user's corresponding device, and then specify the device to call agoo, so as to avoid invalid calls.

After the optimization of the agoo calling method, invalid calls were immediately eliminated, and the acceptance rate of agoo has been significantly improved.

So far: we can finally do a tall analysis of the real reason for agoo's acceptance failure.

According to statistics: The main reason why push notifications are rejected by agoo is that the user has turned off the notification permission. At the same time, our further analysis of agoo call data found that some users could not find the corresponding device. At this point of optimization, we suddenly discovered two more problems.

Then continue to optimize:

1) The notification experience is optimized, and the notification authority is guided to open;
2) Build a device library with agoo to solve the problem of device conversion failure.

These two optimization directions are a new world. Let's talk about it at a later date.

8.2 Optimization of acceptance rate of vendor push channels
When the push arrives at agoo, you will take the manufacturer's "special train" by model and drive to the next stop: user equipment.

This is the second stop of the push experience. Going out of the station to check the ticket and found that it was overcrowded.

Ever since: We have a large number of pushes blocked every day because they exceed the limit set by the manufacturer.

Why is this so?

In fact: vendors that provide push channels (yes, each mobile phone manufacturer's own push channels vary from good to bad), in order to ensure user experience, limit the total amount of messages that can be pushed by each application.

For manufacturers, this limit will be set according to the type of push and the user scale of the application-push is mainly divided into product push and marketing push.

The limitations of the vendor push channel for different types of messages are:

1) For product pushes, the manufacturer will guarantee the arrival;
2) For marketing pushes, manufacturers will limit quotas;
3) Unmarked pushes are treated as marketing pushes by default.

We just didn't mark the push, which triggered the manufacturer's push restriction.

This will cause confusion for our users. The transaction of idle fish relies heavily on the interaction of messages between buyers and sellers. This part of the message needs to be guaranteed to arrive.

The same: order-type messages and user concerns must also be pushed to users.

According to the interface protocols of mainstream manufacturers, we divide the pushed messages into the following categories and mark them accordingly:

1) Instant messaging;
2) Change of order status;
3) User concerned content;
4) These types of marketing messages.

At the same time, in terms of business, we have also implemented push management-cancel the push of messages that are not of high user attention to avoid interruption.

After these optimizations, pushes that were blocked because they exceeded the manufacturer's quota were cleared.

8.3 Push click rate optimization
By optimizing the acceptance rate of agoo and manufacturers, we solved the bottleneck of push arrivals. But even if the message is finally delivered, did the user click it? This is the fundamental meaning of news push.

Therefore, in the daily development and testing process, we found two experience problems with push:

1) When the user clicks on Push, there is a screen-opening advertisement;
2) Marketing Push also has permission verification, and it cannot be clicked after changing the user login.

For the open-screen advertising function, we have added the ability to skip ads by Push.

For the permission verification function of Push, Xianyu has made a breakdown according to the scene:

1) Push related to personal privacy, keep permission verification unchanged;
2) For marketing pushes, let go of permission verification.

The above is the optimization of the click experience, we also need to consider the user's click willingness.

The amount of user clicks is related to the exposure of the posts and the interesting degree of the posted materials. The amount of push exposure is related to the amount of push arrival and the timing of push arrival.

The specific optimization methods are:

1) On the push content: what we need to optimize is the timing of the push and the corresponding material;
2) On the timing of the push: the algorithm will calculate the personalized push time of each user according to the user's preferences and personalized behavior data, and push it in the user's free time (to avoid disturbing the user at an inappropriate time, but also to improve the user See the possibility of push).
3) On pushing the material: the algorithm will do real-time horse racing on the material according to the real-time click feedback of the material. Only post materials that users are interested in to increase users' willingness to click.

9. Actual optimization effect

Through the above analysis and technical optimization methods, the overall weak push link link has been improved, and the arrival rate of offline messages has increased by double digits.

10. Write at the end

This article mainly talks to you about only one link in the IM messaging system link-the optimization of the weakly aware link, and the specific business is the problem of the delivery rate of offline messages.

The overall IM messaging system is still a relatively complex area.

In the development process of the message system, we are faced with the following problems:

1) How to track the link of the message;
2) How to ensure the rapid arrival of IM messages (see "Timeliness Optimization Practice of Xianyu Billion-level IM Message System");
3) How to separate the gameplay of messages from the underlying capabilities;
4) How to find the corresponding device through the user in offline push.

We have shared these issues in previous articles, and will continue to share more in the future, so stay tuned.

Appendix: Related Information

[1] The official version of Android P is coming: the real nightmare of background application keep alive and message push
[2] A set of high-availability, easy-scalable, high-concurrency IM group chat and single chat architecture design practices
[3] A set of IM architecture technical dry goods for hundreds of millions of users (Part 1): overall architecture, service split, etc.
[4] A set of IM architecture technical dry goods for hundreds of millions of users (Part 2): reliability, orderliness, weak network optimization, etc.
[5] From novice to expert: how to design a distributed IM system with billions of messages
[6] Secrets of the IM architecture design of enterprise WeChat: message model, tens of thousands of people, read receipts, message withdrawal, etc.
[7] Rongyun technology sharing: fully reveal the reliable delivery mechanism of 100 million-level IM messages
[8] How to ensure the efficiency and real-time performance of large-scale group message push in the mobile terminal IM?
[9] Discussion on the synchronization and storage of chat messages in modern IM systems
[10] One entry is enough for novices: develop mobile IM from scratch
[11] Must-read for mobile IM developers (1): easy to understand, understand the "weak" and "slow" of mobile networks
[12] A must-read for mobile IM developers (2): Summary of the most comprehensive mobile weak network optimization method in history
[13] Implementation of IM message delivery guarantee mechanism (1): Guarantee reliable delivery of online real-time messages
[14] Implementation of IM message delivery guarantee mechanism (2): Guarantee reliable delivery of offline messages
[15] Introduction to zero-based IM development (1): What is an IM system?
[16] Introduction to zero-based IM development (2): What is the real-time nature of the IM system?
[17] Introduction to zero-based IM development (3): What is the reliability of the IM system?
[18] Introduction to zero-based IM development (4): What is the message timing consistency of the IM system?

This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".
The synchronous publishing link is: http://www.52im.net/thread-3748-1-1.html

JackJiang
1.6k 声望808 粉丝

专注即时通讯(IM/推送)技术学习和研究。