1
头图

Author: Chen

This article is from the in "1612ee5bc06468 "2021 Umeng + Mobile Application Performance Challenge" ". This article describes how the author uses + U-APM 1612ee5bc0646e tool to optimize performance.

background

As a VR real-time operation game APP, we need to monitor the angle of the mobile phone in real time according to the gravity sensing system, and render the VR image of the corresponding position. Therefore, between different Android devices, due to the use of chipsets and GPUs of different architectures, Game performance will be affected as a result. For example: the game may be rendered at 60fps on the Galaxy S20+, but its performance on the HUAWEI P50 Pro may be very different from the former. Because the new version of the mobile phone has a good configuration, and the game needs to consider the operation based on the underlying hardware.

If players experience a drop in frame rate or slow loading time, they will quickly lose interest in the game.
If the game runs out of battery power or the device overheats, we will also lose gamers on long journeys.
If unnecessary game materials are pre-rendered in advance, it will greatly increase the startup time of the game and cause players to lose patience.
If the frame rate is not compatible with the mobile phone, it will crash due to the mobile phone's self-protection mechanism during operation, resulting in a very poor gaming experience.

Based on this, we need to optimize the code to adapt to the different frame rates of different mobile phones on the market.

Challenges encountered

First, we use Streamline to obtain the configuration file of the game running on the Android device. When running the test scenario, we visualize the CPU and GPU performance counter activities to accurately understand the CPU and GPU workload of the device to locate the main problem of the frame rate drop. .

The frame rate analysis chart below shows how the application runs over time.

In the figure below, we can see the correlation between the execution engine cycle and the FPS drop. Obviously the GPU is busy with arithmetic operations, and the shaders may be too complicated.

In order to test the frame rate in different devices, use Union+U-APM test the freezing conditions on different models. It is found that the freezing occurs when rendering in the onSurfaceCreated function. The above analysis has been verified. Determine if the GPU is stuck during arithmetic operations:

Because different devices have different performance expectations, it is necessary to set their own performance budget for each device. For example, if the highest frequency of the GPU in the device is known and the target frame rate is provided, the absolute limit of the GPU cost per frame can be calculated.

$$ 每帧 GPU 成本 = GPU 最高频率 / 目标帧率 $$

There are certain constraints in the scheduling of CPU to GPU. Due to the limitation in scheduling, we cannot reach the target frame rate.
In addition, due to the serialization of the workload on the CPU-GPU interface, the rendering process is performed asynchronously.
The CPU puts the new rendering work in the queue, which is later processed by the GPU.

Data resource problem

The CPU controls the rendering process and provides the latest data in real time, such as the transformation and light position of each frame. However, GPU processing is asynchronous. This means that data resources will be referenced by queued commands and stay in the command stream for a period of time. The OpenGL ES in the program needs to be rendered to reflect the state of the resources when the draw call is made, so the resources cannot be modified until the GPU workload referencing them is completed.

debugging process

We have tried to edit and optimize the code of the referenced resource, but when we try to modify this part of the content, it will trigger the creation of a new copy of this part. This will be able to achieve our goal to a certain extent, but it will generate a lot of CPU overhead.

So we used Streamline to identify instances of high CPU load. In the libGLES_Mali.so path function inside the graphics driver, you can see the extremely high occupancy time in the view.

Since we want to adapt to different frame rates on different mobile phones, we need to find out whether libGLES_Mali.so has a very high occupancy time on different models of devices. Here, Umeng+U-APM is used to detect The function occupancy ratio of users on different models.

After U-APM custom exception test, the following models will cause high libGLES_Mali.so occupancy problems. Therefore, we need to solve the fluency problem based on the operation of the underlying hardware. At the same time, because there are more than one problem models, we need Starting from the memory level, consider how to call fewer memory buffers and release the memory in time.

Solution and optimization

Based on the previous analysis, we first try to optimize from the buffer zone.

Single buffer solution

• Use glMapBufferRange and GL_MAP_UNSYNCHRONIZED. Then use the sub-regions in a single buffer to construct the rotation. This avoids the need for multiple buffers, but this solution still has some problems. We still need to deal with managing sub-area dependencies. This part of the code brings us extra workload.

Multi-buffer solution

• We try to create multiple buffers in the system and use the buffers in a circular fashion. By calculating the number of suitable buffers, the code can reuse these circular buffers in subsequent frames. Since we use a large number of circular buffers, a large number of log records and database writes are very necessary. But there are several factors that can cause poor performance here:

  1. Generated additional memory usage and GC pressure
  2. The Android operating system actually writes log messages to logs instead of files, which requires additional time.
  3. If there is only one call, then the performance cost here is minimal. However, due to the use of a circular buffer, multiple calls are needed here.

We will enable the memory allocation tracking function in the Mono analyzer based on c# to locate the problem:

$ adb shell setprop debug.mono.profile log:calls,alloc

We can see that the method takes time every time it is called:

Method call summary
Total(ms) Self(ms)      Calls Method name
     782        5        100 MyApp.MainActivity:Log (string,object[])
     775        3        100 Android.Util.Log:Debug (string,string,object[])
     634       10        100 Android.Util.Log:Debug (string,string)

It took a lot of time to locate our log records here, and our next direction may need to improve a single call or seek a brand new solution.
log:alloc also allows us to see memory allocation; log calls directly lead to a large number of unreasonable memory allocations:

Allocation summary
     Bytes      Count  Average Type name
     41784        839       49 System.String
      4280        144       29 System.Object[]

hardware acceleration

Finally, I tried to introduce hardware acceleration and obtained a new drawing model to render the application on the screen. It introduces the DisplayList structure and records the drawing commands of the view to speed up rendering.

At the same time, you can render the View to the off-screen buffer and modify it as you like without worrying about being referenced. This function is mainly suitable for animation, very suitable for solving our frame rate problem, and can set up animation for complex views faster.

If there is no layer, after changing the animation properties, the animation view will make it invalid. For complex views, this failure will propagate to all subviews, which in turn will redraw themselves.

After using the view layer supported by the hardware, the GPU creates a texture for the view. So we can animate complex views on our screen and make the animation smoother.

Code example:

// Using the Object animator
view.setLayerType(View.LAYER_TYPE_HARDWARE, null);
ObjectAnimator objectAnimator = ObjectAnimator.ofFloat(view, View.TRANSLATION_X, 20f);
objectAnimator.addListener(new AnimatorListenerAdapter() {
    @Override
    public void onAnimationEnd(Animator animation) {
        view.setLayerType(View.LAYER_TYPE_NONE, null);
    }
});
objectAnimator.start();

// Using the Property animator
view.animate().translationX(20f).withLayer().start();

In addition, there are several points that still need to be noted when using the hardware layer:

(1) Clean up after use:

The hardware layer takes up space on the GPU. In the above ObjectAnimator code, the listener will remove the layer at the end of the animation. In the Property animator example, the withLayers() method will automatically create the layer at the beginning and delete it at the end of the animation.

(2) The hardware layer needs to be updated and visualized:

Using developer options, you can enable "Display Hardware Layer Updates".
If you change the view after applying the hardware layer, it will invalidate the hardware layer and re-render the view to this off-screen buffer.

hardware acceleration optimization

But this brings about a problem is that in interfaces that do not require fast rendering, such as scroll bars, the hardware layer will render them faster. When the ViewPager is scrolled to the sides, its pages will be highlighted in green throughout the scrolling phase.

So when I scroll the ViewPager, I use DDMS to run TraceView, sort method calls by name, search for "android/view/View.setLayerType", and then track its references:

 ViewPager#enableLayers():

private void enableLayers(boolean enable) {
    final int childCount = getChildCount();
    for (int i = 0; i < childCount; i++) {
        final int layerType = enable ?
                ViewCompat.LAYER_TYPE_HARDWARE : ViewCompat.LAYER_TYPE_NONE;
        ViewCompat.setLayerType(getChildAt(i), layerType, null);
    }
}

This method is responsible for enabling/disabling the hardware layer for the children of ViewPager. It is called once from ViewPaper#setScrollState():

private void setScrollState(int newState) {
    if (mScrollState == newState) {
        return;
    }

    mScrollState = newState;
    if (mPageTransformer != null) {
        enableLayers(newState != SCROLL_STATE_IDLE);
    }
    if (mOnPageChangeListener != null) {
        mOnPageChangeListener.onPageScrollStateChanged(newState);
    }
}

As shown in the code, the hardware is disabled when the scrolling state is IDLE, otherwise it is enabled when DRAGGING or SETTLING. PageTransformer is designed to "use animation properties to apply custom transformations to page views" (Source).

Based on our needs, we only enable the hardware layer when rendering the animation, so I want to override the ViewPager method, but since they are private, we cannot modify this method.

So I took another solution: On ViewPage#setScrollState(), after calling enableLayers(), we will also call OnPageChangeListener#onPageScrollStateChanged(). So I set up a listener to reset the layer type of all ViewPager's children to NONE when the scrolling state of ViewPager is different from IDLE:

@Override
public void onPageScrollStateChanged(int scrollState) {
    // A small hack to remove the HW layer that the viewpager add to each page when scrolling.
    if (scrollState != ViewPager.SCROLL_STATE_IDLE) {
        final int childCount = <your_viewpager>.getChildCount();
        for (int i = 0; i < childCount; i++)
            <your_viewpager>.getChildAt(i).setLayerType(View.LAYER_TYPE_NONE, null);
    }
}

In this way, after ViewPager#setScrollState() sets a hardware layer for the page-I reset them to NONE, which will disable the hardware layer, so the resulting frame rate difference is mainly displayed on the Nexus.

This article is from the in "1612ee5bc06c9e "2021 Umeng + Mobile Application Performance Challenge" ". This article describes how the author uses + U-APM 1612ee5bc06ca4 tools to optimize performance.


dgpvmsvm
6 声望0 粉丝