2
头图

Author: Xie Wei (Wei Sheng)

Different business backgrounds lead to different technical demands. "Special cool user experience" is Taote's unremitting pursuit. This article will introduce the author’s many optimization practices in Flutter fluency since joining Taote. These optimizations do not involve Engine transformation or involve The "wheel construction" on the tall, only needs to be careful and meticulous to go deep into the business and stick to the actual somatosensory orientation, which can bring a significant improvement to the user experience, and it is worthy of Flutter developers to apply it to every pixel of the product.

background

Taote has three distinct characteristics:

  1. Business characteristics: Taote has the most complex Taote e-commerce link in the industry
  2. User characteristics: There are a large number of middle-aged and elderly users among Taote users, a large number of users have lower mobile phone system versions, and a large number of users use mid-to-low-end devices.
  3. Technical features: Amoy uses Flutter cross-platform rendering technology on a large scale

In summary:

The most complex business link + the lowest performance user group + the latest cross-platform technology ==> one of the core issues: page fluency is severely challenged

Flutter core link20S fast scrolling frame rateStall rate (stutter rate per second)
Live Tab277.04%
mine41.36667.63%
Details26.715.58%

Note: The relevant data is based on vivo Y67, Taote

3.32.999.10 (103) measured

Target

Fluency is a key part of the user experience. Everyone does not want the mobile phone to be used like watching a movie or refreshing PPT, especially now that the popularity of high refreshing screens (90/120hz) greatly strengthens users’ perception of fluency. , But fluency is also strongly related to product complexity. It is also a trade-off between complexity and simplicity. The first-stage optimization goal of Taote fluency is:

Flutter core link page achieves high fluency (average frame rate: low-end machine 45FPS, mid-range machine 50FPS, high-end machine 50FPS)

The state after the first phase of optimization

matterAverage frame rateCaton rateBoost effect
1. Live Tab recommendation, classification column46.00.35%The frame rate is increased by 19 frames, and the jam rate is reduced by 6.7%
2. My page46.00%The frame rate is increased by 4.6 frames, and the freeze rate is reduced by 7.6%
3. Details45.02%The frame rate is increased by 18.3 frames, and the freeze rate is reduced by 13.58%

The old version 3.32 is like the video left, and the new version 3.37 is like the video right. Because the uiautomator tool will trigger the accessibility ISSUE, this version comparison is a manual test.

Please see the video: Amoy Flutter

In addition to the obvious improvement in data, in terms of somatosensory, the old version has obvious fast sliding and stuttering, and the picture sudden change is obvious. The new version basically eliminates the obvious stuttering, and the picture is continuous and stable.

problem

Back to the technology itself, why does Flutter freeze and the frame rate is low? Generally speaking, there are two reasons:

  1. UI thread is slow --> rendering instructions are slow
  2. GPU thread is slow --> slow rasterization, slow layer synthesis, slow pixel screen upload

So, how to solve the above two problems is the focus of our concern. Now that we know that there is a problem, we naturally need tools to measure the level of the problem systematically, as well as systematic theory to support practice, and look at the following 2 sections, in the process, interspersed with the practice of related strategies in Taote, and the combination of theory and practice is more understood. through.

How to solve

Solution-Case

Lower the trigger node of setState

Everyone knows the refresh mechanism of Flutter. The higher the Widget tree level, the setState is triggered to mark the dirty Element. "The bigger the dirty tree", the lower the level, the more local Widgets trigger the state update. "The smaller the dirty tree" is marked as After the dirty tree, Element.Rebuild will be triggered to traverse the component tree. For the principle, please see the following figure "Flutter page refresh mechanism source code analysis":

"Element.updateChild source code analysis" please see optimization two below.

Take Taote as an example in practice. Take the video preview function of the live tab as an example. Initially, the video playback index of the live tab is passed to the subcomponents through the state layer. Once the state changes, the top-level setState triggers the playback index to update, causing the entire page to refresh. But the actual whole page that needs to be updated is only the "original VideoWidget that needs to be paused" and the "VideoWidget to be played". We change it to the monitoring mechanism. All VideoWidgets in the page are registered and monitored. The top layer uses EventBus to uniformly distribute the playback index to each VideoWidget. Change its own state after Widget Check.

Another example is the detail page. Due to the use of the "previous page borrowing picture" function, the borrowed picture is hidden after the scroll is monitored, but the call node of setState is placed in the top-level Widget of the details, causing a global refresh. In fact, the monitoring refresh logic can be decentralized to the "borrowing picture component" to reduce the size of the "dirty tree".

Cache unchanged Widget

There are two major benefits of caching unchanging widgets. 1. The cached Widget will not need to be created repeatedly. Although Flutter officially believes that Widget is a very lightweight object, in actual business, it is still a common phenomenon that Build takes too much time. 2. Returning the same referenced Widget will cause Flutter to stop subsequent traversal of the subtree, that is, Flutter believes that the subtree has not changed and does not need to be updated. For the principle, please see the figure below "Element.updateChild source code analysis"

The application scenario takes the actual page of Taote as an example. Some components of the detail page use DXWidget. Theoretically, once the component content is created, the life cycle of the current page will not change. In this case, the unchanged Widget can be cached to avoid repeated dynamic rendering of DX and stop subtree traversal.

The Item component of the feed stream has a complex layout and high creation cost. Theoretically, the content will not change after being created once, but the item may be deleted. At this time, the Objectkey should be used to uniquely identify the component to prevent status dislocation.

Reduce unnecessary build(setState)

The live Tab uses a buried point exposure component. After checking by DevTools, it is found that it recreates the itemWidget in each progress callback. Although this will not cause business abnormalities, in theory the itemWidget only needs to be created once. This piece is used after investigation. The builder function was misinformed during the component, instead of passing the itemWidget instance directly.

The logic of the detail page is very complicated. AppBar calculates the transparency in real time based on the scrolling distance. This will result in a high-frequency setState. In fact, the state should be refreshed only after a difference before and after the transparency changes. For performance considerations, the transparency should only be a few. Value changes.

Separation of changeable and unchanging layers

In daily development, most of the elements on the page will often remain unchanged, and an element will change in real time. Such as Gif, animation. At this time, we need RepaintBoundary, but independent layer synthesis is also costly, this one needs to be measured and grasped. Take Taote as an example.

The Gif picture in the live feed is constantly beating with high frequency, which will cause the same layer of the page to be repainted. At this time, you can wrap the changeable Gif component with RepaintBoundary, let it be in a separate layer, and finally combine another layer on the screen.

Similarly, the spike countdown is also a common scenario in e-commerce, and this component is also applicable to the RepaintBoundary scenario.

Avoid frequent triggerGC

Because of AliFlutter, we can actively trigger DartGC, but GC is also costly, especially in high-frequency GC. Because of the memory pressure of iOS, ScrollEndNotification will trigger GC when the list scrolling stops. ScrollEndNotification will be triggered once after each Down->up event. If the user touches multiple times, the GC will be triggered more frequently. Actual measurement Affect the performance of Y67 about 4 frames. This increases the GC when the page is not visible and turns off the sliding GC on low-end Android devices such as Y67 to improve the sliding performance.

Big JSON parsing sub-threading

Flutter's isolate is a single-threaded model by default, and all UI operations are performed on the UI thread. If you want to apply the concurrency advantages of multithreading, you need to open isolate or compute. In any case, await and scheduleTask just delay the call timing of the task, and still occupy the "UI thread". Therefore, when parsing a large Json or a large number of channel calls, you must observe the consumption of the UI thread. In Taote, we enable json parsing and computing on low-end machines without blocking the UI thread.

Minimize or downgrade the use of components such as Clip and Opacity

In Flutter, Clip is mainly used for cutting, cutting rectangles, rounded rectangles, and circles. Once called, all subsequent drawing commands will be affected by its Clip. Some ClipRRects can be replaced by ShapeDecoration, and Opacitiy can be replaced by AnimatedOpacity. Clip cropping of pictures can be implemented by customizing the picture library Transform.

Downgrade the CustomScrollView pre-rendered area to a reasonable value

By default, CustomScrollView not only renders the content on the screen, but also renders the content of the components in the upper and lower 250 areas, such as a double-column waterfall flow. The current screen can display 4 components, but there are still 4 components in the display state. If setState (when more is loaded), 8 components will be redrawn. The actual user only sees 4, in fact, only 4 should be rendered, and sliding up and down will also trigger the creation and destruction of off-screen Widgets, causing scrolling freezes. High-performance mobile phones can be pre-rendered, and Taote degrades the distance in this area to 0 or a smaller value on low-end machines.

High-frequency buried channel batch operation

It is a common behavior to report buried points when a component is exposed, but in a fast scrolling scenario, the instant 10+ item and 20+ channel calls will also occupy a certain amount of UI thread resources and Native UI thread resources. Here, Taote does batch and scheduled uploads for some scenarios, and maintains a buried point queue. The default timing is 3S or 50, and the business is reported when the business is not visible. The combined 20+channel call is a single time. The business can also force the flush queue to report at an appropriate time, and at the same time, switch the burying behavior to the sub-thread on the Native side.

Other effective optimization measures

Some business special effects and business busyness can be moderately degraded on low-end devices. For example, Taote reduced the feed video preview playback delay time from 500ms to 1.5S, and the feed stream preloading threshold distance was reduced from 2000+ to 500. The picture circle The core idea of downgrading measures such as angle drop right angle is to first ensure that the lowest-end users can use it smoothly, and then beautify the details to add to the icing on the cake.

When Flutter is enabled for accessibility, there are performance problems with fast scrolling scenes. For example, if it is determined that the business does not need accessibility or the user accidentally triggers accessibility, you can add ExcludeSemantics Widget to block accessibility.

Through the DevTools detection, it is found that the high_available high-available frame rate detection has performance problems in the old version. This plug-in version can be upgraded or the low-end machine can block the detection.

Solution-Summary of optimization cases

The above ten optimization practices can be roughly divided into the following categories, apart from the details, to learn the truth from practice.

How to improve UI thread performance:

  • How to improve build performance

    • Reduce the starting point of traversal, reduce the trigger section of setState
    • Stop traversing the tree, if the content does not change, return the same component instance, Flutter will stop traversing the tree (SlideTransition)
    • Reduce unnecessary build(setState)
  • How to improve layout performance

    • layout is not easy to cause problems temporarily
  • How to improve paint performance

    • RepaintBoundary separates changeable and unchanging layers, such as Gif and animation, but the composition of multiple layers is also costly
  • other

    • Time-consuming methods such as large JSON parsing with compute sub-threading
    • Reduce unnecessary channel calls or batch merge
    • Reduce animation
    • Reduce log during Release
    • Increase the priority of UI thread in Android/iOS
    • List component supports partial build
    • A smaller cacheExtent value reduces the rendering range

How to improve GPU thread performance:

  1. Be cautious about saveLayer
  2. Minimize ClipPath. Once called, all subsequent drawing instructions need to intersect with Path. (ClipRect, ClipRRect, etc.)
  3. Reduce frosted glass BackdropFilter, shadow boxShadow
  4. Reduce the use of Opacity, use AnimatedOpacity when necessary

solution-measurement tool

If a worker wants to do his job well, he must first sharpen his tools. The tools are mainly divided into the following two parts.

  1. Fluency detection: There are several fluency detection schemes that do not need to invade the code. You can get surfaceflinger data through adb, or compare images based on VirtualDisplay, or use official DevTools. The third party is more mature such as PerfDog
  2. Caton troubleshooting: DevTools is an official development tool, very practical

    1. Performance detects single frame CPU time (build, layout, paint), GPU time, Widget Build times
    2. CPUProfiler detection method is time-consuming
    3. Flutter Inspector observes unreasonable layout
    4. Memory Monitor Dart memory status

DevTools

Flutter is divided into three compilation modes. Debug/Release are familiar to everyone. The biggest feature of Debug is HotReload can be debugged, Release is the highest performance, and the Profile mode is the middle one, dedicated to performance analysis. Its products are infinitely close to Release performance in AOT mode. Run, and retain a wealth of performance analysis methods.

How to run flutter in profile mode?

If it is a mixed project, take android as an example, just add profile{init with debug} in app/build.gradle. Some application resources are divided into debug/profile, and you can also copy a profile. Of course, in a more hack and thorough way, you can directly modify the buildModeFor method in the $flutterRoot/packages/flutter_tools/gradle/flutter.gradle file to return the desired Profile/Release mode by default.

How to open DevTools in Profile mode?

It is recommended to use the IDE's flutter attach or the command line to use flutter pub global run devtools, fill in the observatory address, and you can start using DevTools.

Flutter Performance&Inspector

Taking AS as an example, two functional areas, Flutter Performance and Inspector, will appear on the right. The Performance functional area is as follows:

The Overlay effect is as shown in the figure below. It can be seen that there are 2 rows of histograms. The upper part is the GPU frame time consumption, and the lower part is the CPU time consumption. The latest 300 frames are displayed in real time. When the current frame takes more than 16ms, the green scan line will turn red. This picture is often used for observation. The "instantaneous freeze point" in the dynamic process.

The Inspector is relatively simple, you can view the Widget tree structure and the actual Render Tree structure, including basic layout information, and the Inspector in DevTools contains more detailed information.

DevTools&Flutter Inspector

DevTools&Performance

The Performance function is the core tool for performance optimization. Here you can analyze the causes of most UI threads and GPU threads stuck. For the convenience of analysis, this graph is obtained in Debug mode, and the actual performance analysis is subject to Profile mode.

As shown in Figure 1 above, the Build function is obviously too long, and it continues for dozens of frames, which must be a serious problem in the logic of the Build. In theory, the Widget does not need to be rebuilt if the state does not change after it is created once. From the previous Taote case, it can be found that the actual business error here is caused by the repeated creation of the Widget in the scrolling progress callback. The actual Build should only be created and executed twice in the waterfall Layout logic.

The Paint function details can be turned on in debug mode through debugProfilePaintsEnabled=true. When changeable elements and unchanging elements are mixed in the same layer, it can cause excessive repetitive drawing of the entire layer. If the content of the element does not change, the drawing function should not be time-consuming to draw redundant elements. Through the aforementioned Repain RainBow switch or debugRepaintRainbowEnabled=true, you can observe the repainting situation in real time, as shown in the figure below.

Each layer has a corresponding frame with different colors. Only the color of the repainted layer will change, and the color of the extra layer will change. We need to check whether it is normal.

Excessive GPU time-consuming is generally due to the excessive use of heavyweight components such as Clip, Opacity, and shadows. If this is found to be too time-consuming, please refer to the previous solution to optimize or downgrade. For more GPU optimization, please refer to liyuqian's high-performance graphics. Engine sharing.

The CPU Profile at the bottom of Figure 1 represents the CPU time consumption of the frame. BottomUp is convenient for finding the most time-consuming method.

DevTools&CPU Profiler

Next to Performance is the CPU Profiler, which is used to calculate the CPU time-consuming situation over a period of time. Generally, it is judged whether the business is abnormal or normal time-consuming based on the method name combined with experience. According to the visitChilddren-->getScrollRenderObject method name search, a high-availability frame is found Rate monitoring has performance issues.

Devtools also has memory, Debugger, network, log and other functional modules. This fluency optimization is not used much. If you have better experience in the future, I will share with you.

DebugFlags&Build

The above figure is a common debug function table for the build phase. The debugPrintRebuildDirtyWidgets switch will print on the console which tree is currently being rebuilt. The debugProfileBuildsEnabled function is the same as the Track Widget Builds of Performance and monitors the details of the Build function. The first 3 fields are used in debug mode, and the last one can be used in profile mode.

DebugFlag&Paint

The picture above is a table of common debug functions for the Paint stage. The debugDumpLayerTree() function can be used to print the layer tree. The debugPaintLayerBordersEnabled can form a border (frame) around each layer. The debugRepaintRainbowEnabled function is the same as the RainBow Enable in the Inspector. The border color will change when the layer is redrawn. The debugProfilePaintsEnabled has been mentioned before, which is convenient for analyzing the details of the paint function.

Outlook

The above is the first-stage practice of Taote Flutter's fluency optimization, and it is also the most obvious first-stage optimization of somatosensory optimization. But there is still a big gap between the ultimate user experience goal. The group students provided a lot of practical learning. For example, UC Hummer's Engine fluency optimization, Xianyu's partial refresh and reuse list component PowerScrollView, online and offline high-precision multi-dimensional detection of stalls, and how to prevent fluency optimization from deteriorating solutions, Taote is also constantly learning and growing Challenge the limit. In the second phase of the practice, for the most extreme experience, Taote will combine the Hummer engine to deeply optimize high-performance image libraries, high-performance streaming containers, and establish a comprehensive offline and online data monitoring system. A cool Taote App for users".

Reference

, 3 mobile technology practices & dry goods for you to think about every week!


阿里巴巴终端技术
336 声望1.3k 粉丝

阿里巴巴移动&终端技术官方账号。