Realize the performance optimization of face recognition conference system through Umeng+U-APM

Introduction
Name: Xu Xinyu
Scenario segmentation: application of mobile app + face recognition in industrial Android tablets in the Internet of Things industry
(Note: The company’s project information is desensitized, and the relevant pictures are replaced with online pictures. I have no intention of infringing copyright and have indicated the relevant source)

1. Project background

The company is an Internet of Things company and intends to develop a face conference attendance and sign-in panel machine by itself. In order to improve product competitiveness, it focuses on cost-effectiveness + customized differences (the hardware is cheap and easy to use, and the software page is cool); the software R&D department needs it Support the matching face meeting attendance Android tablet application. The main business functions are: face recognition function (face collection, comparison recognition, face database management), conference module, attendance check-in function, customized interactive module.

Schematic diagram of face recognition interaction (picture source Google Images)
(The realization of the original page is composed of recognition positioning frame + skeleton contour map + information card + animation)

After communicating with the hardware product manager, provide a set of prototypes and a set of product lists to support the development and testing of software R&D.
The main board is RK3288 quad-core 1.8GHZ.2g memory. 8G storage board, Android 5.1 operating system. The screen is a 15.6-inch 1920*1080 resolution 10-point capacitive touch screen.

Schematic diagram of RK3288 motherboard (picture source Google Images)

RK3288 motherboard core parameters

The framework selection is to use React Native+tracker.js, considering the cost control, there is no integration of the Android face recognition SDK on the market. Through experience, use tracker.js instead of opencv to achieve face recognition and capture on the end, and the server implements face comparison (here is a foreshadowing for subsequent errors). After a period of overtime, the face meeting attendance system V0.1- Alpha version.
The procurement process is relatively slow. When the development board arrives, it starts a round of real machine test and run. After a round, the team members say that they can't do anything about it, and the other machines are fine. This is not working, and the hardware problem is suspected.
After excluding the hardware reasons, I took the lead as a whole to optimize the performance of the system.

2. Problems and challenges encountered

1. Problem

1. The face recognition is not smooth, the drawing is out of sync, and it is obviously delayed. The high frequency of people entering and exiting the camera frame will be accompanied by frustration.
2. POE power supply, long-time running software runs hot and hot.
3. Probability of occasional flashbacks, and no valuable abnormal logs can be captured.

2. Challenge

1. Does not use pure Android development (the team members are basically not capable of Android native development) + face recognition Android SDK (the demand for cost control). Limit the upper limit of performance.
2. The hardware performance is low. The RK3288 processor, equipped with Mali-T764GPU, was regarded as the god U in 2014 and was known as the strongest domestic ARM processor, but it has been 6 or 7 years, and we are also using the basic version. The Android face machine also needs to have some other related software built-in, and the requirements for performance and stability are still relatively high.
3. Probabilistically, there are ANR/flash back and crash problems, the error report is vague, and the problem cannot be located.
4. The team members as a whole are front-end developers, with insufficient experience in app optimization and debugging.

III. Steps to solve the problem

1. Review design

Advice, don't stare at the problem itself. Especially for performance issues, this is a taboo. This is a mistake that is easy for many developers to make, and even uses subtle techniques to conceal flaws in system design. (Product design, architecture design, prototype design, interaction design, UI design, etc.)
If there is a problem that is far above the threshold during system operation or testing, the first step must be to look back at the design of the system. (definitely is)
Inexperienced programmers will plunge headlong into the bug, and experienced programmers will use their own way of thinking to understand the problem, locate the problem, analyze the problem, solve the problem, and verify the problem. And as a qualified architect, or the leader of the technical team. Must learn to "pull hair" thinking.
Many system problems that need to be optimized are often not just a technical problem. The root cause may be an unreasonable product design, redundant architecture, anti-human interaction, and deep UI. However, due to the complexity of the system and the communication cost of the team, as well as the later changes in requirements and the refinement of the scene, it is often difficult to expose some problems at the beginning of the project. Therefore, for the performance optimization of the software system, the first step is to review whether the previous design and behavior are reasonable.
In fact, the so-called differentiated design, after combing and streamlining, eliminating unreasonable factors, its animation and interaction for an industrial tablet app is still too complicated.

2. Data analysis

Acceptance criteria for face detection of this project:
Package size: ~ 100M
Minimum face detection size: 50px * 50px
Recognizable face angle: yaw ≤ ±30°, pitch ≤ ±30°
Detection speed: 100ms 720p*
Tracking speed: 50ms 720p*
Time-consuming face detection: <200ms
Face database retrieval speed: <100ms
The whole process of detection + identification takes less than 500ms (other performance indicators of the app will not be described too much)
An element of engineering is to use set standards to measure discrete data. If there is no quantifiable evaluation standard for rendering performance for optimization, it is the developer's leader who decides. So not only testers need to understand these indicators, but developers also need to learn to use test tools to locate problems and verify data.
ok to start action, Android adb network connection Android motherboard test, install apk.

3. Rendering mode analysis

Turn on the Android developer mode, check the GPU rendering speed and over-drawing, and filter out pages with excessive rendering pressure.

Schematic diagram of GPU rendering mode analysis (picture source Baidu)

Rendering color description

Over-drawing: In fact, for the optimization related to over-drawing, the input-output ratio should be considered. The overall output of too fine optimization is not high. In this project, only the red area is over-drawn (over-drawing 4 times or more). optimization.

4. Analyze power consumption

As the software runs hot and hot, it is necessary to analyze the power consumption.
Power consumption statistics are system components, which means that he has been counting the system running. Therefore, you need to reset the statistics when obtaining the statistics report.
1. Disconnect the adb service first, then turn on the adb service

杀死adb服务：执行adb kill-server 防止冲突和脏数据。
重启adb： 执行adb devices或者adb start-server

2. Reset battery data collection
adb shell dumpsys batterystats --enable full-wake-history
adb shell dumpsys batterystats --reset

Under normal circumstances, we should disconnect the charger and disconnect the usb connection (charging when connected), which will greatly affect the statistical validity. But because we are powered by poe, we analyze the specific situation and use data to assist in finding abnormal points. Because we are a 5.1 system, use the adb command:

Since the txt report is really large, it is not realistic to see with the naked eye with more than 10 m. Generally, it is used with the Battery Historian tool. (Note: Battery Historian is used on android 5.0 (api 21) and above. If you are lucky enough to still use the Android 4.4 industrial panel, you can skip this article.)

Battery Historian example diagram (picture source Baidu)

5. Thread activity and CPU analysis

There are many tools for thread activity and CPU analysis, but is it not popular with Android Studio? (Rn Android packaging still uses Android Studio, there are too many pits for packaging with vscode.)
Analyze the abnormal points.

Example image of Android Studio CPU analyzer (picture source https://developer.android.com/)

6. Data summary

The data shows that the burden of the CPU is too heavy, and tracking causes the process to be blocked.
In fact, everyone has always thought that the page freeze is caused by the high rendering pressure (rendering is the bottleneck of the entire framework of RN). The report data shows the opposite. For face recognition, the GPU is not full, and the rendering of the graphical interface is only Part of it is performed by the GPU. When the tracking is blocked, it will temporarily wait for the stall to occur, and then complete the canvas key point rendering and positioning one by one, call the interface, get the returned data, render the information card and execute the animation, resulting in a second slight stall (RN rendering Stuttering), and then the performance reflects the fluctuation of the sine function, and the stuttering and stuttering disappear at the same time.
The reason for the "braking head" positioning problem is that front-end colleagues are generally inadequate in the use of logs and data analysis tools.

7. Positioning problem

There are many ways to locate the problem, like the commonly used binary search method (binary annotation, binary rollback). Or breakpoint debugging and log analysis. Can effectively help us locate the problem quickly.
Then through the analysis of the data and the key categories provided by the tool, we also found the problem relatively clearly: information card animation + canvas special effects + face recognition related functions.

8. Analyze the problem

The original implementation: introduce all relevant js, new multiple tracking.objectTracker to detect the face, eyes, mouth area. Realizing the display effect of key points of the face through canvas

Tracking.js file directory diagram

And collect the face. Tracking.js uses the CPU to perform calculations. In terms of the efficiency of image matrix operations, it is slower than the GPU.
At this time, with the support of the data, it was decided to replace the face recognition framework layer with RN for tentative optimization, using face-api.js

face-api.js

Based on the TensorFlow.js kernel, three convolutional neural network architectures are implemented to complete face detection, recognition and feature point detection tasks; a very lightweight, fast and accurate 68-point facial landmark detector is implemented inside. Support a variety of tf models, the tiny model is only 80kb. In addition, it also supports GPU acceleration, and related operations can be run using WebGL.
The core principle is to implement an SSD (Single Shot Multibox Detector) algorithm for face detection. It is essentially a Convolutional Neural Network (CNN) based on MobileNetV1, with some face border prediction layers added to the top of the network.

face-api facial landmark detector (picture source official document)

confirming the replacement of 1618a005b7fcec, I will do some tuning for React Native thread scheduling. To facilitate understanding, I simply drew a schematic diagram to explain the process:
• JS Thread: JavaScript code such as React is executed on this thread.
• Bridge: connection bridge, with the characteristics of asynchronous, serialization, and batch processing
• Shadow Thread: The thread that performs layout calculation and constructs the UI interface.
• Native modules provide Native functions (such as photo albums, Bluetooth)
• UI Thread: The main thread in an Android/iOS (or other platform) application.

ReactNative thread diagram

For example, when we draw a UI, JS thread will serialize it first to form a UIManager.createView message, and then send it to Shadow Thread through Bridge. After Shadow Tread receives this information, it deserializes it to form a Shadow tree, then converts the original layout information and sends it to the UI thread.

After the UI thread gets the message, it also deserializes it first, and then draws it according to the given layout information.
And this series are strongly dependent on the bridge, such as the height calculation, UI update, each operation is passed through the bridge, when there are more tasks, a task queue will be generated, asynchronous operations are batch processed, and some front-end updates are difficult to reflect to the UI in time. Above, especially similar to the animation operation with high update frequency, there are many tasks, and it is difficult to ensure that each frame is rendered in time.
Then, the direction of optimization:
1. Reduce the asynchronous communication between JS Thread and UI Thread, or reduce the size of less JSON
2. Minimize the calculation on the JS Thread side

9. Solve the problem

The overall solution is face-api instead of tracker; React Native should be tuned. The following is mainly divided into three steps to talk about React Native tuning.
1. Turn on the native animation driver
useNativeDrive: true
Messages are passed between JS Thread and UI Thread through JSON strings. For some non-layout properties and direct events, (useNativeDriver can only be used on animation properties that are only non-layout related, such as transform and opacity. Layout related properties, such as height and position related properties, will report an error after opening .) For example, if the face recognition is successful and the personnel information card is animated, we can use useNativeDrive: true to turn on the native animation driver.
Animated.timing(this.state.animatedValue, { toValue: 1, duration: 500, useNativeDriver: true, // <-- Add this }).start();
By enabling the native driver, we send all its configuration information to the native end before starting the animation, and use the native code to execute the animation on the UI thread instead of communicating back and forth between the two ends every frame. In this way, the animation is completely separated from the JS thread at the beginning, so even if the JS thread is stuck at this time, it will not affect the animation.

2. Use InteractionManager
Use InteractionManager to perform some tasks that need to be optimized after the interactive operations and animations are completed, such as the jump animation of the venue distribution. The goal is to balance the execution timing between complex tasks and interactive animation.
const handle = InteractionManager.createInteractionHandle();// Perform animation... ( runAfterInteractions in 0618a005b95863 are now waiting in line) // After the animation is completed, start clearing the handle: InteractionManager.clearInteractionHandle(handle);// Clear all the handles After that, now start to execute the tasks in the queue in order

According to the official explanation: runAfterInteractions accepts a callback function, or a PromiseTask object, which returns a Promise. If the provided parameter is a PromiseTask, it will block the task queue even if it is asynchronous, and will not execute the next task until it finishes executing. In this way, the smoothness of the animation can be optimized on demand.

3. Re-render
First of all, in RN and React, when setState is triggered in the parent component, not modifying any value in the state will also cause all child components to re-render, or when the props passed from the parent component to the child component change, regardless of whether the props are sub-components. When used, it will also re-render the sub-components.
Then, for re-rendering problems, use PureComponent and shouldComponentUpdate to optimize ordinary functions; use memo to optimize hook components;
After verification, the overall performance is improved, the interaction is relatively smooth, and the basic performance indicators are reached. Now it is mainly for the recurrence of probabilistic problems. Seek help from testing colleagues.

10. Verification problem (application of performance monitoring platform)

First of all, why use a performance monitoring platform: 1. To deal with duplicate information, to avoid the repeated processing of some problems on multiple apps, or to deal with it repeatedly on one APP; 2 to continuously capture important suspicious information, improve efficiency, and reduce labor costs.
Second, when and in what scenarios to use the performance monitoring platform: In addition to the performance monitoring platform required for testing and operation and maintenance, developers must also learn to use the performance monitoring platform to assist in positioning and solving problems. Here are two recommended solutions:

1. Google Android Vitals + Firebase
Android vitals is a program launched by Google to improve the stability and performance of Android devices. The Android vitals console of Google Play can highlight indicators such as crash rate, ANR occurrence rate, excessive wakeups, and stuck wakeup locks. Contains the functions commonly used by developers, the key is not to invade the code, the application is more convenient.
In addition, Firebase can also obtain detailed custom crash report data to understand the crashes in the application. The tool will classify crashes according to similar stack traces, and classify them according to the severity of the impact of the crash on the user. In addition to receiving automatically generated reports, you can also record custom events to learn about the actions that caused the app to crash.

Vitals + Firebase function comparison chart (picture source official website)

So under normal circumstances, using Android Vitals can handle most simple problems, and can be used with Firebase to flexibly handle custom events.
What is less convenient is Google's domestic restrictions, which require companies to apply for dedicated lines for cross-border networking, and when the network fluctuates, identity verification is often required (this is annoying).
Cost: Android Vitals is free to use, but requires 25$ to register a developer account; Firebase has a free version and a paid version. It is suitable for R&D and use by foreign companies, multinational companies or companies with relevant qualifications.

2. U-Meng + U-APM
2.1 Product Overview :
Due to Google's domestic restrictions, many companies cannot connect to the external network without online reporting, so the U-APM of Umeng+ can also perfectly meet the above needs. For my project, I chose to connect to Youmeng+SDK to assist in problem detection.
Umeng’s push and statistics are relatively good in the industry, and friends who are more familiar with Umeng should understand the stability function of U-APP, then U-APM is Umeng + based on the U-APP stability function An upgraded version of the stability data product for developers to monitor applications.

U-APM core technology and advantages (picture source Umeng official website)

Why choose Umeng + U-APM application performance monitoring platform:
The product not only builds a systematic online quality monitoring platform by discovering online problems-quickly locating problems-efficiently solving problems. And it has the characteristics of supporting real-time monitoring of online App crash trends, 7*24 hours monitoring and warning and repair verification, recurring user crash scenes, key monitoring of key links, and repair testing.

The focus is also on the blessing of Ali technology, which can provide long-term stable product iterations and project services and expert consulting capabilities. Intimate, what enterprise engineering needs is long-term stability! The products of small factories may not find anyone if they are used.

Function comparison between U-APM and competing products (picture source from Youmeng official website)

2.2 Development preparation
If you have used U-APP before, you can directly view the upgrade instructions on the official website and click Experience U-APM; then those who have not used Umeng products need to register on the [Umeng+] official website and add new applications to obtain an AppKey.
Note: Please be sure to read the U-APM compliance guide carefully to meet the relevant compliance requirements of the Ministry of Industry and Information Technology. Avoid app removal due to privacy policy risks.

2.3 Integrated SDK
Maven automatic integration:
Maven automatic integration is relatively simple and fast
First, add [Youmeng+] sdk new maven warehouse address in the buildscript and allprojects sections of the project build.gradle configuration script. As shown below.

Then add the SDK library dependency in the dependencies section of the project App corresponding to the build.gradle configuration script. Is it very simple?

dependencies {
implementation fileTree(include:['*.jar'], dir:'libs')
// The following SDKs are introduced on demand based on whether the host App uses related services.
implementation'com.umeng.umsdk:common:9.4.4'// required
implementation'com.umeng.umsdk:asms:1.4.1'// Required
implementation'com.umeng.umsdk:apm:1.4.2' // required
}

**Manual Android Studio integration:
So here I am using manual integration**
(1) First select the U-APM SDK component and download it, decompress the .zip file to get the corresponding component package

Get the following files:

umeng-common-9.4.4.jar // Statistics SDK required
umeng-asms-armeabi-v1.4.1.aar // required
And in the apm directory
umeng-apm-armeabi-v1.4.2.aar//U-APM SDK Mandatory
If you have UTDID requirements, integrate thirdparties
utdid4all-1.5.2.1-proguard.jar UTDID service supplement package
If you need ABTest module, it can be integrated under common
umeng-abtest-v1.0.0.aar ABTest module

(2) Copy the above jar package into the libs directory of the Android Studio project project.
Right-click the Android Studio project —> select Open Module Settings —> in the Project Structure pop-up box —> select the Dependencies tab —> click the "+" at the bottom left —> select the component package type —> import the corresponding component package.

(3) Introduce the corresponding component package in the build.gradle file of the app. The reference example is as follows:

repositories{
flatDir{
dirs 'libs'
}
}
dependencies {
implementation fileTree(include:['*.jar'], dir:'libs')
implementation (name:'umeng-asms-armeabi-v1.4.1', ext:'aar')
implementation (name:'umeng-apm-armeabi-v1.4.2', ext:'aar')
}
Note: If you need to adapt to a platform other than armeabi, or if you encounter a multi-CPU architecture so library loading failure problem [SA10070], in addition to importing the corresponding package, you must download and enter the corresponding .so file separately.

2.4 permission granted
Follow the official website tutorial to grant the following permissions:

<manifest ……>
<uses-sdkandroid:minSdkVersion="8"></uses-sdk>
<uses-permissionandroid:name="android.permission.ACCESS_NETWORK_STATE"/>
<uses-permissionandroid:name="android.permission.ACCESS_WIFI_STATE"/>
<uses-permissionandroid:name="android.permission.READ_PHONE_STATE"/>
<uses-permissionandroid:name="android.permission.INTERNET"/>
<application ……>

2.5 Confusion settings
If code obfuscation is used in the APP, the following configuration needs to be added

-keep class com.umeng.* { ; }
-keep class com.uc.* { ; }
-keep class com.efs.* { ; }
-keepclassmembers class *{
public<init>(org.json.JSONObject);
}
-keepclassmembers enum *{
publicstatic**[] values();
publicstatic** valueOf(java.lang.String);
}

2.6 Initialize SDK
In rn's Android native application.onCreate function, call the initialization function provided by the basic component package:

/**
- Note: Even if you have configured the appkey and channel values in AndroidManifest.xml, you need to adjust them in the App code.
- Use the initialization interface (if you need to use the appkey and channel values configured in AndroidManifest.xml,
- Please set the appkey and channel parameters to null in the UMConfigure.init call).
*/
UMConfigure.init(Context context,String appkey,String channel,int deviceType,String pushSecret);

Or call this pre-initialization function

public static void preInit(Context context,String appkey,String channel)
Then turn on the log switch
/**
*Set componentized Log switch
*Parameter: boolean is false by default, if you need to view LOG, set it to true
*/
UMConfigure.setLogEnabled(true);
So far, you can use basic functions such as freeze analysis function, Java, Native crash analysis, ANR analysis function and so on. Because the principle is through the response time of the main thread, the device information and the stall log of the stall experience will be reported. Then after waiting for the device to report, we can see the uploaded Error (print SDK integration or runtime error information), Warn (print SDK warning information), Info (print SDK prompt information), Debug (print SDK debugging information) in the web console . And reports.

U-APM crash information log example diagram

However, it is very troublesome to look at the error stack directly from the message. U-APM uses the aggregation algorithm to provide the function of the stuttering module, screening the 200 stacks that affect users with a large number of stacks, from the top of the stack to the bottom of the stack, and displaying the top 10 modules with frequency. , The subtree depth supports up to 50 layers, with the help of digging detailed information about the lagging module.

U-APM Caton module example diagram

In addition, U-APM also provides advanced functions such as startup analysis, memory analysis, network analysis, and user detailed inspection modules. In addition to memory analysis, other functions need to be configured before they can be used. Everyone can go and experience it.
Then finally passing U-APM is also a smooth verification and solution to the problem. Completed the entire R&D closed loop. If you are interested, you can experience U-APM for free.

Four. Project summary

1. Don't stare at the problem. It is good for app performance optimization or system optimization. The appearance of the problem may be due to essential side effects. For example, the local phenomenon in this project is stuttering, not smooth, and just staring at the phenomenon, we are likely to fall into an optimization dilemma, to optimize rendering, reduce canvas drawing, or even streamline business. The ultimate breakthrough of our performance bottleneck is achieved by modifying the implementation method, which is more suitable for business scenarios and can give more play to the performance of the machine. All this requires data to support.

2. Use data to speak. Do not rely on feelings to detect performance problems and evaluate the effects of performance optimization. There must be quantifiable rendering performance evaluation standards, as well as quantifiable and visual optimization tools. Using experience to feel and guess is not precipitated by the team, while data and tools can be passed on. For example: If there is no standard for optimized performance, there is no data to reflect the results. Then the overall work is meaningless, success or failure is determined by the leader's forehead.

3. Use low-end equipment: the same procedure, in low-end equipment, the same problem will be more obvious. For example: In the early stage of Android development, there was no stuttering phenomenon on the real machine, but it was only exposed to the stuttering problem on the industrial real machine. It has always been a very important issue to bring a good user experience to both high-end and low-end devices.

4. Weigh the pros and cons: optimize on the premise that the product can be stable and the demand can be completed on time. When the input-output ratio is too high, other solutions should be adopted instead of over-optimization. Never forget that the purpose of optimizing performance is to improve the user experience, not to show off skills.

5. Abandon the sunk costs: For the unrecoverable costs that have been paid in R&D, do not affect future decisions. For example, for the face recognition module that has been developed using track, the data proves that the selection affects the performance. The input-output ratio is within an acceptable range, and the earlier the replacement, the higher the expected return.

Realize the performance optimization of face recognition conference system through Umeng+U-APM

1. Project background

2. Problems and challenges encountered

1. Problem

2. Challenge

III. Steps to solve the problem

1. Review design

2. Data analysis

3. Rendering mode analysis

4. Analyze power consumption

5. Thread activity and CPU analysis

6. Data summary

7. Positioning problem

8. Analyze the problem

face-api.js

9. Solve the problem

10. Verification problem (application of performance monitoring platform)

Four. Project summary

性能优化实践者

引用和评论

友盟"无所不能"

【成功解决】JetBrains PyCharm 激活提示 “Key is invalid” (秘钥无效) 的终极解决方案

个人博客目录在此

【前瞻技术布局】打破"沙漏“现象→提高生成式搜索/推荐的上限

好用的开源埋点方案-ClkLog埋点用户分析系统

移动应用加固工具测评：12大移动应用加固厂商技术对比

图解「模型上下文协议（MCP）」