头图

图片

01 Automated processing of crash information is a trend

Program crash is the phenomenon that the program cannot continue to execute due to some serious error, and thus exits abnormally. It is one of the most frequent problems encountered in the quality assurance process. Usually, such problems need to be taken seriously. When users encounter frequent flashbacks or crashes while using your APP, it will cause a large number of user losses.

There are many reasons for a program to crash, usually because of the following:

1. Program logic problems, such as array out-of-bounds, stack overflow, null pointer exception, etc. occurred;

2. Device compatibility issues, because of the diversity of devices and systems, especially the Android system, there may be thousands of types, and it is difficult to achieve complete device compatibility;

3. Memory management error, there is a memory leak problem inside the program, long-term operation and accumulation of objects that cannot be released, resulting in memory overflow and eventually program crash; or the running memory required by the program exceeds the device limit, etc.

In the actual production process, due to the iteration of multiple versions, we may face a large amount of crash data. We need to manually verify each data one by one, filter out duplicate problems, retain valid data, and submit bugs for tracking. . Not only is it time-consuming, labor-intensive, and inefficient, it’s also an omission that can cause serious problems. Therefore, it is necessary to establish a convenient and efficient automated closed-loop processing process.

02 Shortcomings of Common Solutions

The common solution in the past is to integrate Tencent's Bugly SDK to capture crash information in Android or iOS APPs. Bugly provides a complete set of crash information monitoring and solutions. The developer integrates the mobile application with Bugly, and then monitors the background service through the crash, which can easily display the crash/ANR and other problems that occur during the user's use of the APP, and quickly locate and solve the problem according to the reported crash information. However, this solution cannot be applied to desktop applications (Windows/Mac, etc.), and Bugly does not currently open a third-party interface, allowing us to obtain a list of crash data for automated analysis and processing. The crash information has to be processed and filtered manually, and the final effect is not satisfactory.

03 Cross-platform crash automation closed-loop processing solution

In order to solve the above problems, Agora has developed a set of cross-platform (Android/iOS/Mac/Windows) crash information collection and processing solutions. When a crash occurs, Agora's SDK will submit relevant information (version number, platform, compilation number, crash offset address, symbol table address, DMP file link, etc.) to our background system, and the background will pass the bound stack information. Symbolize with the symbol table, extract the correspondence between addresses and symbols, and then restore the crash stack information that developers can understand.

After the symbolization is completed, the system will determine whether the current SDK version is a JIRA that has already submitted the same issue, and if it has not been submitted, a new JIRA will be added. In the process of adding JIRA, you can also assign the corresponding person in charge according to the crashed module. For example, if the crash of the audio module is finally located, it is assigned to the development director of the audio module, and the crash of the video module is assigned to the video module. The person in charge, when the network module crashes, is assigned to the person in charge of the network module, which optimizes the process of manually assigning the person in charge and greatly improves the efficiency of problem handling. If the result of the current analysis is that JIRA has been submitted, it will be automatically associated with the relevant issue and the crash count of the corresponding issue will be updated. The entire processing flow is shown in the following figure:

图片

How to distinguish whether an issue with the same version has been submitted? There are currently two dimensions, one is confirmed by the compilation number and the crash offset address. If the compilation numbers and crash offset addresses of multiple crash data are consistent, then we classify these crash data as the same problem. , when submitting JIRA, the number of occurrences of the same issue will be aggregated. But after a period of practice, we found that in many cases the compilation number and crash offset address of the same version are inconsistent, but it may be caused by the same problem. So we need to introduce the second dimension and extract the stack details for analysis. We spliced the information of the lines that can be parsed to get the hash value of the spliced string, and then judged according to whether the hash value is the same, and whether the same problem has been submitted. Through two-dimensional filtering, JIRA submissions of duplicate issues can be effectively removed, and the number of crashes can be counted more effectively. Among them, we can formulate some different Hash generation schemes to filter repeated crash problems. We can use the most relaxed scheme, such as generating Hash through the final crashed class name + method name; strict schemes such as according to the final crashed file name + Module name + class name + method name + parameter name to generate Hash. The following figure is a JIRA submitted by automated analysis of crash data in our practice process:

图片

JIRA contains the current version number, compilation number, crash offset address, counted crash times, system information, etc. During the analysis, the key information in the crash stack is also extracted and placed in the JIRA description, which can easily allow developers to locate related problems. Through this effective screening analysis, we can aggregate tens of thousands of crash data of a version into dozens of JIRAs, which greatly improves the processing efficiency of crash problems.

04 Crash Statistics

Platform-based management of current crash issues allows development/testing/project managers to more easily find statistics-related issues based on platform, version number, JIRA status and other information:

图片

We can also quickly see which versions have the same problem and how often it occurs based on the Hash summary, which can be better avoided in the future development process:

图片

At the same time, it can also regularly record daily crash data, obtain JIRA with the highest crash increment and unresolved daily crash Top10 alerts, and remind relevant developers to follow up on the highest priority issues:

图片

Through our automation practice, the efficiency of problem solving has been improved, the task accumulation and R&D costs have been reduced, the speed of version iteration has been accelerated while the code quality has been continuously improved, and the user experience has been objectively continuously improved, contributing to the development of the company's business. .

Dev for Dev column introduction

Dev for Dev (Developer for Developer) is a developer interactive innovation practice activity jointly initiated by Agora and the RTC developer community. Through various forms of technology sharing, communication and collision, and project co-construction from the perspective of engineers, the power of developers is gathered, the most valuable technical content and projects are mined and delivered, and the creativity of technology is fully released.


RTE开发者社区
658 声望971 粉丝

RTE 开发者社区是聚焦实时互动领域的中立开发者社区。不止于纯粹的技术交流,我们相信开发者具备更加丰盈的个体价值。行业发展变革、开发者职涯发展、技术创业创新资源,我们将陪跑开发者,共享、共建、共成长。