The practice of analysis of tens of billions of individual push log streams: revealing the technical realization of the after-effect analysis function of individual push

As an effective tool for user activation, message push has obvious advantages of low cost and high efficiency, and has become one of the most important user reach methods in App operations. In order to help developers to strategically improve the effect of message push and increase the arrival rate and click-through rate of messages, the push message push SDK launched the after-effect analysis function at the beginning of this year, aiming to scientifically adjust the push for developers and operators. Strategies provide effective support.

After the after-effect analysis function was launched, we combined product goals and user suggestions to carry out multiple iterations. This article will share with you the experience accumulated in the development and iteration process, and interested developers are also welcome to communicate with us through enterprise WeChat.

Development background of after-effect analysis function

During the message push process, from the server pushing the message, the message arriving at the client, to the user clicking push, and opening the application, there may be message loss.

In the past, the message system calculated the degree of loss of a message by simply comparing the data of four dimensions: delivery, arrival, display, and click. However, this method of calculating the message loss is not accurate enough, and it is difficult for operators to understand the reasons for the loss of the message, and it is impossible to make scientific and effective improvements to the push parameters and push settings. In addition, in the past, when customers encountered problems with message push, they would directly contact technical support staff to solve them, and communication and time costs were high.

Therefore, we need to design an automated way to help developers clearly understand the after-effect data of push messages, and be able to independently and efficiently find out the reasons for message damage.

Development Ideas of After Effect Analysis Function

Our solution is:

1. Launch the after-effect data report function. By sorting out and counting the loss of the message in the process of delivering the message on the server and the client's receipt data, a clear report of the after-effect data of each link is formed to help developers and operators to quickly locate the message through the data representation The reason for the loss, find the key link to improve the push effect.

2. Automatically collect the logs of each push module and form an after-effect analysis report. Obtain push logs through different modules: In a manner similar to manual log query, some logs with reason identification are stored and sorted out in a unified manner, so as to sort out all the abnormalities and breakage reasons that occur when a certain task is issued. This includes reasons such as "the target is on the blacklist" and "requests are restricted by frequency control (or flow control)". Compared with manual technical support, this not only improves the efficiency of after-effect analysis, but also automatically extracts problems from some of the damages that may have been overlooked in the past, helping users to self-check and avoid some improper use.

Difficulties in the development of after-effect analysis functions

In the process of developing the after-effect analysis function, we also encountered some technical difficulties as follows:

Difficulty 1: The aggregation and classification of logs and the extraction of after-effect reasons

In the process of investigating and analyzing the cause of message damage through the log, we first need to filter out the effective part from the massive log data, and summarize the part of the log data. According to our pre-set log analysis strategy, we need to analyze the whole chain The log data of the road is marked with corresponding marks to help us analyze the reasons for the breakage of the message at each stage.

To this end, we have done a thorough review of the entire link of message push. Starting from the push stage, the push module is divided into four layers: the entry layer, the processing layer, the delivery layer, and the client. The reasons for the loss of the news are refined:

✦At the entry layer, we mainly focus on whether the request content received by the server passes the format verification, and whether various target parameters are set correctly, such as "Is the CID valid", "Is the authentication information abnormal", etc.

✦At the processing layer, we pay attention to whether the target client meets the delivery conditions. For example, due to push policy restrictions, the server may not be able to perform subsequent push to some clients.

✦At the delivery layer, we pay attention to whether the network connection between the client and the server is normal, for example, whether the online channel is valid, etc.

✦Finally, when the client receives the push and the user clicks on the message, the client will also report the receipt to the server module. The reason for the loss of messages at this stage may be "notification switch is not turned on" and so on.

Based on the above business level distinction, we can see the overall business process of message push more clearly. We distill the abnormal concerns that may exist in each stage so that we can sort out the corresponding log modules. In the end, we summarized the reasons for the after-effect abnormality into 12 categories, which correspond to the possible damages that may be encountered in each stage of the message push.

Difficulty 2: Processing and accurate calculation of TB-level log data

Based on the above scenarios, we screened out the relevant logs, and refined the reasons for the loss of messages through early markings.

In the process of sending a message, the server will generate a large number of logs, and the log volume of a single function node can reach the level of TB per day. How to filter and calculate billions of logs has become the first problem for us to analyze after-effect data.

We transfer logs through Flume, write log data to HDFS, use Spark as the calculation engine, HDFS stores the original log data and dimensional data, and the final report data is stored in ElasticSearch, Hbase, and Mysql.

Cleaning and calculation of massive log data

Based on our understanding of the characteristics of push services, we concluded that the possible problems with push log data are as follows:

✦ The log is duplicated. For example, users constantly log in and log out of the service, resulting in a large number of duplicate logs.
✦ The request is repeated. For example, a user initiates the same request multiple times and sends the same message to a certain client. In the end, the client will only receive and display the message once, but the server will generate multiple duplicate client/message association logs.
✦ The receipt is repeated. In the downstream receipt, due to the complex network environment of the client, there are sometimes repeated receipts, which causes the server to repeatedly print receipt logs
✦ Insufficient logs. For example, in general, client information such as closing notifications and device activation will not be generated in the push process. Therefore, relevant data must be used as a supplement to comprehensively evaluate the status information of the client.

In response to the above problems, the solution we propose is "crowd marking". We divide the log data according to the push process, and divide the people affected by the push task into three categories: arriving people, sending people, and requesting people. We mark the client based on the association between the message and the client. For example, when the "online delivery module" log is collected, if it contains related information between a message and the client, then we will mark the client under the push task with a successful delivery mark. Each mark has only 0 or 1, and different logs will not be marked repeatedly, thus avoiding duplicate statistics of logs.

Combining the above-mentioned crowd marking logic, we aggregate the marking data of four dimensions, and finally get the original data of a single push task. In this data, a client has multiple tags. We only need to sort these tags according to the filtering logic and summarize them into the final state, and then we can distinguish the final stage of the delivery process of the message to this client, or At what stage and for whatever reason.

Solve the problem of data skew

During the calculation of log data, we also encountered the problem of data skew.
We split the entire log calculation task into four according to the message delivery stage. According to the push funnel, there is an upstream and downstream relationship between these four tasks. When aggregating indicator dimensions, there will be too large a difference in the size of the dimensional aggregation, leading to data skew, and even the overall calculation progress will be slowed down due to the long calculation time of individual tasks.

In order to solve this problem, we need to process Spark tasks before and during calculation to reduce data skew.

The processing methods we adopt are: 1. Divide large files into small files, or merge small files into large files to ensure that the amount of log data processed by each task is even; 2. Optimize the partitioning strategy to prevent a large All data under the push request are all gathered to the same node, so that the node calculation pressure is more balanced; 3. Optimize the calculation link of the task to ensure that the data processing is completed with the optimal calculation link.

So far, based on the log data processing and calculation logic as described above, we can store the aftereffect data of each task in HBase for query and call by the business layer.

Summarize

Recently, we have also introduced the Flink streaming computing engine to improve the real-time performance of the after-effect data calculation; we have also combined more detailed message logs to further improve the data, upgraded the after-effect data report, and launched the message link query function. This helps developers to better understand the delivery of push messages and quickly increase the overall arrival rate of messages based on corresponding suggestions.

Register and log in to the Push Developer Center ( https://dev.getui.com/) on the "Code", and experience the after-effect analysis function and the latest news link query function!

The practice of analysis of tens of billions of individual push log streams: revealing the technical realization of the after-effect analysis function of individual push

Development background of after-effect analysis function

Development Ideas of After Effect Analysis Function

Difficulties in the development of after-effect analysis functions

Difficulty 1: The aggregation and classification of logs and the extraction of after-effect reasons

Difficulty 2: Processing and accurate calculation of TB-level log data

Cleaning and calculation of massive log data

Solve the problem of data skew

Summarize

个推

引用和评论

霸王茶姬、小米SU7...每日互动（个推）拆解火爆案例，背后是数据营销！

【专题】2024年人工智能AI行业报告汇总PDF洞察（附原数据表）

adb 无线调试固定手机的端口

2025消费趋势及增长策略洞察报告汇总PDF洞察（附原数据表）

工业人工智能白皮书2025年：边缘AI驱动，助力新质生产力报告汇总PDF洞察（附原数据表）

【专题】2024年消费趋势行业报告汇总PDF洞察（附原数据表）

DeepSeek背景下的知识库搭建指南