头图

Author: Li Lei (Qiannuo)

APM provides data related to frame rate, namely FPS (Frames Per Second) data. FPS reflects the smoothness of the page to a certain extent, but the FPS provided by APM is not very accurate. Coinciding with the launch of the performance optimization project of low-end mobile phones, relevant indicators are urgently needed to measure the optimization of the sliding experience, and the practice of frame rate data exploration has been launched.

In the exploration practice, we encountered many problems:

  • The proportion of high-brush mobile phones is relatively high, which affects the overall FPS data
  • Non-artificial sliding data is mixed in FPS, which cannot directly reflect user operation experience
  • When calculating the average data, the freeze data is submerged in the massive normal data. Does a freeze only affect one FPS value or a user experience?

After a period of exploration, we have settled down some indicators, including: sliding frame rate, frozen frame ratio, scrollHitchRate, and stuck frame rate. In addition to the relevant frame rate indicators, in order to better guide performance optimization, APM also provides the main factor analysis of frame rate, and also provides the stutter stack in order to better locate the stutter problem.

The following is a detailed introduction to the platform-based features of APM and a detailed introduction to frame rate-related exploration practices. I hope this article can bring you some help.

System rendering mechanism

Before introducing the implementation of indicators, we first need to understand how the system renders. Only by knowing the system rendering mechanism can we help us better calculate and process frame rate data.

The rendering mechanism is an important part of Android, and it involves a lot, including the measure/layout/draw principle we often talk about, freeze, over-drawing, etc., which are all related to it. Here we mainly have an overall understanding of the rendering process, and know which parts need to be calculated later and which parts are obtained through the system API in order to calculate the target data.

rendering process

We all know that when the rendering is triggered, it will go to the scheduleTraversals of ViewRootImpl. At this time, the scheduleTraversals method is mainly to register the callback of the next VSync with Choreographer. When the next VSync comes, Choreographer first cuts to the main thread (the native code passed from VSync does not run on the main thread), of course, it does not sendMessage directly to Looper, but msg.setAsynchronous(true), which improves the response rate of the UI .

After switching to the main thread, Choreographer starts to execute all the callbacks registered with this VSync. The callback types are divided into the following four types:

  1. CALLBACK_INPUT, input event
  2. CALLBACK_ANIMATION, animation processing
  3. CALLBACK_TRAVERSAL, UI distribution
  4. CALLBACK_COMMIT

Choreographer classifies all callbacks by type and organizes them in a linked list. The headers are stored in a fixed-size array (because only these four callbacks are supported). In the message sent by VSync to the main thread, the fetching sequence of a linked list and a linked list will be executed and cleared.

The callback of type CALLBACK_TRAVERSAL is registered in scheduleTraversals. This callback executes the most familiar ViewRootImpl#doTraversal() method. The doTraversal method calls the performTraversals method. The most important thing in the performTraversals method is to call the familiar performMeasure and performLayout. , performDraw method.

Detailed code can be viewed: android.view.Choreographer and android.view.ViewRootImpl

From here, we can see that if you want to display a frame of data, at least include: time-consuming VSync switching to the main thread, time-consuming processing input events, time-consuming processing animations, processing UI distribution (measure, layout, draw) time consuming.

However, when the draw process ends, only the CPU calculation part ends, and then the data will be handed over to the RenderThread to complete the GPU part of the work.

screen refresh

Android 4.1 introduced VSync and triple buffering mechanism, VSync gives the opportunity to start CPU calculation, and the opportunity to exchange buffers between GPU and Display, which is conducive to making full use of time to process data and reduce jank.

In the above figure, A, B, and C represent the three buffers respectively. We can see that the CPU, GPU, and display can get the buffer as soon as possible, reducing unnecessary waiting for . If both the display and the GPU are using a buffer now, if the next rendering starts, because there is still a buffer that can be used to write CPU data, the rendering of the next frame of data can be started immediately, such as the first VSync in the figure. .

Is there any problem with the introduction of the three-buffer mechanism? When we look at the above picture carefully, we can find that data A is ready when the third VSync comes, and can be refreshed to the screen at any time. Four VSyncs are coming. It can be seen that triple buffer effectively utilizes the time waiting for VSync and reduces jank, it brings a delay of .

Here is just a brief review of this piece of knowledge. It is recommended that you turn to the history of development and know why.

Mining of frame data information

When we know the rendering process of the entire system, what do we need to monitor and how to monitor, this is a problem.

Industry Solutions

APM original scheme:

After receiving the Touch event, APM will collect the number of draws within 1s of the page. The advantage of this scheme is that the performance loss is low, but it has a fatal flaw. If the total page rendering time is less than 1s and the refresh is stopped, the data will be artificially low. Second, touching the screen doesn't necessarily bring a refresh, and refresh doesn't necessarily come from a touch event. In the above cases, all the data calculated are dirty.

However, Android has implemented a Debug FPS solution in ViewRootImpl. The principle is similar to the appeal solution. The accumulated time is 1s during draw. Therefore, if you want a low-cost offline test FPS with lossless performance, this is a solution. .

If you are interested, you can see the trackFPS method of ViewRootImpl.

Matrix:

In the frame rate part, Matrix innovatively hooked the Choreographer's CallbackQueue, and also added a custom FrameCallback to the head of each callback queue by calling addCallbackLocked through reflection. If the Callback is called back, then the rendering of this frame will start, and the message currently being executed in Looper is the rendered message. In this way, in addition to monitoring the frame rate, it can also monitor the time-consuming data of each stage of the current frame.

In addition, the combination of frame rate callback and Looper's Printer can dump the main thread information when a stuck frame occurs, which is convenient for the business side to solve the stuck, but frequent string splicing will bring certain performance overhead (println There is string concatenation when the method is called).

Regular:

Use the doFrame(frameTimeNanos: Long) method of Choreographer.FrameCallback to calculate the difference between the two frames in each callback, and you can get the FPS by calculation.

sliding frame rate

FPS is a simple and common indicator in the industry. It is the abbreviation of Frames Per Second, that is, the number of frames rendered per second. In general, it is the number of pictures rendered per second.

Calculating the FPS is not our goal. We always hope to calculate the sliding frame rate. For FPS, we are more concerned about the user's frame rate during the interaction process. Monitoring this type of frame rate can better reflect the user experience. .

First of all, in the face of the previous acquisition scheme, the FPS that meets the definition cannot be acquired at all, so the original scheme must be discarded and redesigned. When I saw Matrix's solution, I thought it was a great idea, but it was too hacky. We preferred to open APIs for systems with lower maintenance costs and higher stability.

Therefore, in terms of choice, we still decided to use the most common Choreographer.FrameCallback for implementation. Of course, it's not perfect, but you can try to avoid this flaw in design.

So how do we calculate an FPS value?

When Choreographer.FrameCallback is called back, the doFrame method has a timestamp, and it can be regarded as the time of one frame by calculating the difference from the last callback. When the accumulation exceeds 1s, an FPS value can be calculated.

In this process, there is a point for everyone to know, when doFrame calls back:

First of all, after each callback, we need to call postFrameCallback to Choreographer, and calling postFrameCallback is to add a node to the linked list of CALLBACK_ANIMATION type in the next frame. Therefore, the doFrame callback timing is not the start of calculation in this frame, nor the screen on this frame, but CPU processing animation.

When an FPS value is calculated, the following states need to be superimposed on it:

View sliding frame rate

In the initial implementation, the View monitors the frame rate as long as it slides, and the frame rate is output until it does not slide. According to the requirements, our frame rate acquisition becomes as follows:

So how to monitor whether the View is sliding? Then you need to introduce this ViewTreeObserver.OnScrollChangedListener. After all, you can only decide whether it is available if you understand the implementation principle.

// ViewRootImpl#draw
private void draw(boolean fullRedrawNeeded) {
     // ...
     if (mAttachInfo.mViewScrollChanged) {
            mAttachInfo.mViewScrollChanged = false;
            mAttachInfo.mTreeObserver.dispatchOnScrollChanged();
     }
     // ...
     mAttachInfo.mTreeObserver.dispatchOnDraw();
     // ...
 }

We can see that in ViewRootImpl#draw, it is judged whether the View in the mAttachInfo information has slid, and if it has slid, it will be distributed. So when does the set View position change (swipe)? When View's onScrollChanged is called:

    // View#onScrollChanged
    protected void onScrollChanged(int l, int t, int oldl, int oldt) {
        // ...
        final AttachInfo ai = mAttachInfo;
        if (ai != null) {
            ai.mViewScrollChanged = true;
        }
    // ...
    }

onScrollChanged is directly connected to View#scrollTo and View#scrollBy, which is general enough in most scenarios.

According to the rendering process we explained before: we can see that the callback of ViewTreeObserver.OnScrollChangedListener is in ViewRootImpl#draw, then the callback of Choreographer.FrameCallback is prior to ViewTreeObserver.OnScrollChangedListener.

For a single frame, it can be expressed as follows:

In this way, each frame has the state of whether it is sliding. When a certain frame is a sliding frame, it can start to count, and the accumulated time is up to 1s, and a sliding frame rate data is calculated.

Finger swipe frame rate

View sliding frame rate, when verified offline, is consistent with the data from the test platform, and can meet the basic requirements, and the acceptance is passed. After going online, it also started running and was able to undertake frame rate related work.

However, View scrolling does not mean that it is caused by user operations, and the data is not always the result of user experience. So, we started to implement the sliding frame rate of the finger.

Finger sliding frame rate, first we need to be able to receive the touch behavior of the finger. Since there is already a hook for the dispatchTouchEvent interface of Callback in APM, it is decided to use this interface to recognize finger swipe directly.

At this time, we need to know a few timing issues:

  • There is dispatchTouchEvent will not immediately generate doFrame
  • Calculating the movement time/distance through dispatchTouchEvent exceeds TapTimeout/ScaledTouchSlop, not necessarily generating doFrame immediately

Therefore, when the movement time/distance calculated by dispatchTouchEvent exceeds TapTimeout/ScaledTouchSlop, only a flag will be given to notify the subsequent ViewTreeObserver.OnScrollChangedListener that the doFrame can start to calculate the frame rate of finger sliding.

Performance optimization/swipe count identification

After we receive the doFrame callback for each frame, we need to repostFrameCallback. Every postFrameCallback will register VSync (if it is not registered), when Vsync comes, it will throw a message to the main thread, which is bound to bring some pressure to the main thread.

As we all know, the system will not render when the page is static, and no VSync will be registered. So when there is no rendering, do you also need to post? No need, meaningless, can be filtered out. Based on this idea, we optimized the calculation of sliding frame rate.

To reduce unnecessary frame callbacks and registrations, several issues need to be clarified:

  1. Starting point (when to start postFrameCallback): when the scroll event is first received (onSrollChanged)
  2. End point (when not postFrameCallback): After calculating the FPS of a finger slide, if the next frame does not slide any more, then stop registering the callback for the next frame.

If you are careful, you will find that the starting point here can be regarded as the rendering starting point of the sliding brought by the finger, and the end point here can be regarded as the rendering end of the sliding brought by the finger (including Fling). This data is very important, we are quite It recognizes a finger swipe and can provide data such as the time-consuming of each finger swipe.

Does this optimization work flawlessly? In fact, it is not. If you look closely at the calculation start time point in the above figure, you will find that the first frame of data that starts to slide is lost. Because we calculate the difference between two doFrame callbacks, even if we know that the current frame is the frame that needs to be calculated, but there is no timestamp of the previous frame, we cannot calculate the real time-consuming of the frame that starts to slide.

Freeze frame ratio

Freeze frame is a kind of frame officially defined by Google:

Frozen frames are UI frames that take longer than 700ms to render.

As a special kind of frame, freeze frame is not a frame that is strongly recommended not to appear, and it has also been mentioned in Huawei and other documents. Once such a frame appears, the page also appears to freeze. Therefore, in APM, this type of special frame is also included in the monitoring range, and the proportion of frozen frames is calculated:

Freeze frame ratio = number of freeze frames during sliding / number of frames generated by sliding

scrollHitchRate**

The concept of scrollHitchRate comes from iOS, which is mainly used to describe the proportion of hitch duration in the sliding process. What is a hitch? It can be simply understood that the time-consuming part of a single frame that exceeds the rendering standard time-consuming is the hitch.

The calculation formula is shown in the figure:

The numerator here refers to the cumulative value of hitch during the entire sliding process, and the denominator here is the entire sliding time (including Fling).

You may ask: Then why not use FPS? Isn't it possible to use FPS to detect sliding freezes, why should there be a Hitch rate?

This is because FPS is not suitable for all situations. For example, when there is a pause in an animation, the FPS cannot reflect the smoothness of the animation, and not all applications aim to achieve 60 fps/120 fps, for example, some games only want to run at 30 fps. For Hitch rate, our goal is always to make it reach 0.

Is the introduction of scrollHitchRate purely to solve the data inconsistency problem of high-swipe mobile phones? no. When we collect a scrollHitchRate data, also implicitly brings the number of slides . For example, in the mobile shopping scene, a classmate on the home page asked a question, will the card become more serious the further down the page is swiped? Once this data has been collected, an answer can be made.

Main factor analysis of frame rate

Whether it is sliding frame rate or freezing frame, it is more inclined to monitor data. If you want to analyze the main reason for the current low frame rate on the data, there is still no way to start.

Before rendering process, talked about the rendering process is divided into steps which, if able to render every step of the process will be monitored , then we can think: When a frame is an exception occurs, the main problems in which a stage, but we still hope not like the Matrix as invasive system code . Based on this idea, we found that the system provides an API that meets our needs: Window.OnFrameMetricsAvailableListener. Google Firebase is also using this API for frame data monitoring, and there are unlikely to be subsequent compatibility issues.

FrameMetrics, see development documentation https://developer.android.com/reference/android/view/FrameMetrics

In the FrameMetrics data that is called back asynchronously, it will tell us the time-consuming of each frame and each stage, which is very suitable for our monitoring requirements. But there are still two issues worth paying attention to:

  • The FrameMetrics API is provided on Android 24, which can be found by viewing the hand-tao user data, which can meet basic needs;
  • If one frame of data is not processed in time, there is a risk of data loss, but you can know how many frames of data have been discarded through the interface.

Let's take a closer look at which rendering stages are defined in the FrameMetrics data:

Excerpted from Android 26. In addition to the fields mentioned in the appeal, there are several relatively good timestamp fields, and you can also explore some novel ways of playing. You can explore together.

Have you noticed that it is exactly the same as the rendering process. After tracking the relevant source code, registering a listener does not cause much performance loss. The timestamps recorded in FrameMetrics will be collected even if they are not registered, so there will be no additional performance overhead.

First, we define a frame time-consuming threshold that needs to be analyzed. If this threshold is exceeded, it can be considered that statistical reasons are required. We define: When a certain stage of a frame takes more than half of the threshold , otherwise the main cause does not exist.

In this way, for a certain Activity, it can be analyzed whether the main thread is stuck causing low frame rate, or the layout problem causes slow layout & measure, or there is a problem with draw, When optimizing performance, directly lock the main cause for optimization .

freeze frame rate

First, let's review the human eye's stutter perception. In principle, a higher frame rate can result in smoother and more realistic animation. To generate a smooth and coherent animation effect, the frame rate cannot be less than 8FPS; the more frames per second, the smoother the displayed animation will be. Generally speaking, the human eye can continue to retain the image of about 1/24 second of its image, so the frame rate of the general movie is 24FPS. Compared with games, no matter how high the frame rate is, 60 frames or 120 frames, in the end, the average person can distinguish no more than 30 frames. Although the movie only has 24 frames per second, because the interval between every two frames is 1/24 second, the human eye will not feel obvious lag, even if the refresh of the game or our interface reaches 30 frames per second, But if 30 frames are not evenly distributed in this second, even if it is 60 frames per second, 59 of them are very smooth, and one frame delay exceeds 1/24 second, it will still make us feel obvious stuttering.

This is the reason why our interface has been sliding very smoothly in most cases, but occasionally we still notice the lag. According to 1/24 second, the frame time is 41.6ms. If there is more than 41.6ms in the middle, we can feel the freeze. If according to 1/30, the frame time is 33.3ms. If the delay time of a certain frame is If it exceeds 33.3ms, then the human eye can easily perceive this process. In order to reflect these stuttering conditions, we need to make some records when encountering these frames. But if we only record those frames that take more than 33.3ms during the recording process, in this case, on the one hand, the time factor will be lost, and it is difficult to measure the severity of the freeze (after all, the card will appear uninterrupted for a period of time. On the other hand, because of the influence of multiple buffers, 100% of the frames may not be dropped, so we just take this frame that exceeds a certain moment may not be accurate.

Based on the above considerations, a concept of instantaneous FPS is used here to measure stuttering. Instantaneous FPS is the value calculated in some time-consuming intervals generated during the sliding process. For example, if the user slides for 500ms, there may be several instantaneous FPS counted by the user during this process. How is this process calculated?

  1. The sliding process obtains the time interval of each frame;
  2. Refine the freezing interval according to the time of about 100 (99.6ms, 6 frames) milliseconds;
  3. Start recording from frames with a time interval greater than 33.3 milliseconds as the starting point of the interval;
  4. The end point is the addition of the frame time from the starting point, reaching 99.6ms and the next frame taking less than 17ms (or reaching the last frame), otherwise it will continue to find the end point;
  5. During this period of time, the frame rate is counted, which is the stutter frame rate to be found here.

It can be seen that 3 frames are obviously exceeded. According to the previous statistical method, the frame time is: 1535ms, the number of frames is: 83, then the FPS of this interface is 54. We can see that the FPS of the frame rate is relatively high, and there is no lag at all. Even if there are some relatively high time-consuming frames in the front, they are averaged out by the subsequent normal time-consuming frames. Therefore, the previous statistical methods can no longer reflect these stuck problems.

According to the new calculation method, the first instantaneous FPS interval should be counted from the 7th frame. From this frame, the count of at least 99.6ms time, then 69+16+15, has reached 100ms, 3 frames, so The FPS is 30, because it is lower than 50, so this time the FPS will be higher than the record, and the maximum frame time is 69ms.

The second time starts at 17 frames, 5 frames 114ms, FPS is 43ms, and the maximum frame interval is 61ms.

The third time starts from 26 frames, 98+10=108ms, but the time-consuming of the following frames is 19ms, which is more than 16.6ms, so it will still be added to the statistics. 3 frames, 127ms, FPS is 23. The maximum frame interval is 98.

According to this statistics, there were 3 FPS freezes, 30, 43, and 23 respectively, and the maximum frame time was 98.

Caton stack

If you use the Looper Printer of the main thread to dump the stuck stack, it will bring performance loss due to a large number of string concatenation. On Android 10, the new Observer is added to Looper, which enables performance-lossless callbacks, but cannot be used because it is a hide API. The final solution can only be to keep posting messages to the main thread, but throwing messages to the main thread every once in a while will put pressure on the main thread.

Is there a better way? Yes, through Choreographer postFrameCallback, it will post the main thread message itself. If the difference between the two callbacks is higher than a certain threshold, it can be considered to be stuck. And this recognized stutter is still a stutter during the sliding process.

Knowing what Caton is, when will it be dumped? We use the watchdog to dump out the stuck stack , that is, post a message from the main thread of dumping in the sub-thread. If the time for a single frame exceeds the threshold, the dump will be performed. If the current frame is completed within the specified time, the dump will be cancelled. information. After we collect the stacks, we will cluster the stuck stacks to better determine the main contradiction and alarm handling.

Exploring the use of frame data

AB with APM

The above mainly explains how we calculate an indicator and how to troubleshoot the problem, but for a broad market indicator, of course, it needs to be used to measure the optimization results, so how to measure optimization? The best means is AB. The APM indicator data is connected with the AB test platform, and the performance data is output with the APM experiment.

The AB platform here includes the Yixiu platform and the Magic Rabbit 2 platform. The index access method of the Yixiu platform uses a custom indicator, and the frame rate is only one of the indicators to access, and data such as startup and page is also one of them.

Yixiu is a one-stop service platform for A/B experiments of Alibaba Group, which provides one-stop experimental procedures such as visual operation interface, scientific data analysis, and automated experimental report to various businesses; User behavior to validate the best solution to drive business growth.

When we optimize the page performance, we can directly use the relevant indicators to compare the benchmark bucket and the optimization bucket, and directly and obviously display the optimization of page performance.

[]()

write at the end

For the performance monitoring of mobile shopping, frame rate monitoring and freeze monitoring are only a small part of performance monitoring, and it is also crucial to polish every detail. In addition to being used in conjunction with the AB platform, the relevant data has been connected with the full-link investigation data, public opinion data, and version release performance. It is revealed by means of background clustering, alarms, automated email reports, and the proprietary data platform. undertake. Attitude towards data, we should not only have, but also comprehensive and powerful.

Under the round of technical iterations, the high-availability performance of Hand Taobao has been continuously improved and reconstructed. It is hoped that in the future, the high-availability data of the Hand Taoist client can better assist all aspects of research and development and prevent user experience from corrupting. Help to continuously improve the user experience.

Follow [Alibaba Mobile Technology] WeChat public account, 3 mobile technology practices & dry goods per week for you to think about!


阿里巴巴终端技术
336 声望1.3k 粉丝

阿里巴巴移动&终端技术官方账号。