1
头图

Author: Liu Taiju (Nuo Liang)

"Dutter Series Articles" will describe the technical practice and experience of stepping on the four-terminal application framework (codenamed Dutter) built by DingTalk based on Flutter. Four- terminal solution design and technical practice , this article is the next part, thank you for reading.

This article mainly introduces several FlutterEngine-level bugs encountered and handled on the desktop side during the grayscale process of DingTalk's Flutter business. Specifically include:

  • Mac side:
  1. Memory leak problem after FlutterEngine exits;
  2. Deadlock problem in FlutterEngine shutdown phase;
  3. Crash problem in the destruction phase of OpenGL in lower versions of macOS;
  • Windows side:
  1. Win7 device rendering module "Crash + afterimage" problem;
  2. FlutterPlugin registration phase wild pointer Crash;
  3. White screen after Flutter Window visibility change.

Let's introduce them to you separately.

FlutterEngine Mac side issues

1.1 Memory leak problem after FlutterEngine exits

problem background

After the FlutterViewController on the Mac side is destroyed, the memory it has opened up is not actually released, and there will be a memory leak problem. This issue has some discussion in the Flutter issue, but it has not been clearly identified. This problem is also encountered during the grayscale process of Flutter business on the Mac side of DingTalk. If it cannot be handled, it will directly affect the feasibility of Dutter landing on the Mac side:

Positioning Analysis

One word reason:

It is caused by the unreasonable use of weak property in the implementation of FlutterEngine on the Mac side. FlutterViewController strongly holds FlutterEngine, which holds a weak property pointing to FlutterViewController. FlutterViewController tries to release the FlutterEngine in the dealloc process, but at this time the weak property held in the FlutterEngine cannot be accessed correctly (nil), which causes the release process to fail to execute normally and leaks occur.

The following is a brief description of the specific implementation.

Due to the design of OC and C++ object life cycle management issues, the internal object holding relationship of FlutterEngine is slightly special, as shown in the following figure:

  • As the main class exposed to the outside world, FlutterViewController is responsible for creating and holding FlutterEngine and FlutterView;
  • FluterEngine will hold itself in the initialization phase, and will release itself during shutdown;
  • FlutterEngine will create and hold FlutterRenderer, and FlutterRenderer will strongly hold FlutterView;
  • FlutterEngine indirectly and strongly holds FlutterView;
  • FlutterEngine has a weak pointer to FlutterViewController.

Under normal circumstances, after the FlutterViewController exits, it will trigger the FlutterEngine shudown action by calling the FlutterEngine's setViewController and passing in nil. The reference implementation is as follows:

That is, under normal circumstances, FlutterViewController should trigger 369 lines of code to run after dealloc, thereby releasing FlutterEngine resources. But this is not the case in actual operation. When the code runs to line 359, it does not hold true when trying to judge if (_viewController != controller). From the above code, we know that the controller is an externally passed object, which is nil at this time; _viewController, as a weak proptry, also becomes nil after FlutterViewController enters the dealloc process. Therefore, in this process, the shutdownDownEngine method in our hope is not called.

Solution

After the problem is located, the processing method is very simple. You can manually trigger the FlutterEngine shutDownEngine method when the FlutterViewController dealloc. And it can be implemented through the OC dynamic feature hook in the upper layer, or directly modify and recompile the FlutterEngine.

However, the modification here must be careful, and pay attention to completely restore the shutdown process in FlutterEngine, otherwise it may lead to the second problem we encountered: deadlock.

1.2 Deadlock problem in FlutterEngine shutdown phase

problem background

DingTalk initially adopted a relatively simple solution when dealing with the above-mentioned "FlutterEngine leak" problem: in the FlutterViewController dealloc method, manually call the shutdownDownEngine method provided by FlutterEngine to manually trigger the release of related resources.

Through this solution, the memory does drop after the FlutterViewController exits, but it is found that the entire page is occasionally stuck in grayscale. By simply analyzing the problematic link and cooperating with brute force testing, we restored the problem in the debug environment. Finally, it is confirmed that the UI thread and the Raster thread are deadlocked. The thread status after the deadlock is roughly as follows.

UI thread state:

Raster thread:

Positioning Analysis

One word reason:

It is unreasonable to call the FlutterEngine shutDownEngine method on the DingTalk side. Before shutDownEngine, you must call the shutdown method of FlutterView to stop the rendering process. After the rendering process is stopped normally, the FlutterEngine resource release process can be entered, otherwise the above deadlock problem may occur.

Because this problem is caused by the unreasonable DingTalk call, the specific abnormal reason will not be analyzed in depth. Interested students can refer to the above clues.

Solution

To complete the FlutterEngine release process at the upper level, call FlutterView shutdown to stop the Raster thread before calling FlutterEngine shutDownEngine.

1.3 Crash problem in the destructor phase of OpenGL in lower versions of macOS

problem background

This problem is still connected to two problems. After dealing with problems 1 and 2, referring to the FlutterEngine shutdown process, DingTalk will do 3 things after the FlutterViewController is destructed:

  1. Set the FlutterView bound in the FlutterRenderer to nil;
  2. Call the FlutterView shutdown method;
  3. Call the FlutterEngine shutDownEngine method.

After a series of processing, the test found that the memory leak and deadlock problems were basically eliminated. However, during the internal grayscale process, it is found that Crash will appear on the lower version of macOS, and the stack is roughly as follows:

Positioning Analysis

One word reason:

Similar to problem 2, this problem was also introduced because of DingTalk's handling of leaks. It is roughly caused by the iteration of two factors. On the one hand, because the FlutterView bound to the FlutterOpenGLRenderer is reset, the OpenGL objects created in the embedder layer are released in advance; on the other hand, because the low-version macOS OpenGL implementation fails to protect the key links in the destructor process, which leads to exceptions .

The following is a brief analysis of the exception-related code to avoid other students from encountering similar problems.

1. In the FlutterEngine setViewController method, if it is in the release process, the FlutterOpenGLRenderer setFlutterView method will be called and nil is passed in:

2. When the input parameter of the FlutterOpenGLRenderer setFlutterView method is nil, it will release the NSOpenGLContext object maintained internally:

3. The underlying implementation of FlutterEngine will perform flush when the GrDirectContext object is destructed. If the OpenGL-related objects have been released at this time, a crash will occur in lower versions of macOS (10.11, 10.12):

Solution

Since the problematic part is triggered by the upper-level code of DingTalk, the processing is relatively simple. Finally we removed the FlutterView nulling action on all Mac devices that use OpenGL rendering (before macOS 10.14). That is, in the final release stage of FlutterViewController, only the following two actions are performed:

  1. Call the FlutterView shutdown method;
  2. Call the FlutterEngine shutDownEngine method.

FlutterEngine Windows side issues

2.1 Win7 device rendering module "Crash + afterimage" problem

problem background

The background of this question is slightly complicated. If you look at it in detail, this question should be split into two sub-questions.

The first problem is that the Crash caused by d3d11 appears on some Win7 devices (x86 + x64), and the stack is roughly as follows:

Due to the delay in locating the specific cause of this problem, and Flutter officials said that their coverage of Win7 devices is not perfect "reference" . Therefore, we decided to customize the FlutterEngine slightly, and force the "soft solution mode" to render the Flutter page on old devices such as Win7.

I thought this problem could be bypassed in this way, but unfortunately this solution exposed another bug in FlutterEngine: when rendering pages through "soft solution mode", there is only a certain probability that FlutterViewController will be closed, which will cause the Windows desktop to appear disabled. film.

Positioning Analysis

One word reason:

This problem is mainly because in the internal shutdown process of FlutterEngine, the pointer of the FlutterWindowsEngine to the FlutterWindowsView object is not modified in time, resulting in wild pointers in multi-threaded scenarios; because the wild pointers cause the raster thread to output drawing frames to the FlutterWindowsView after it has been destroyed, and then cause an exception.

During localization, we speed up the problem localization process by adding auxiliary log. By supplementing the logs of key nodes, we quickly found suspicious points:

The above figure is the log output by the key node after the problem occurs. We can get the following key information through the log:

  1. OnBitmapSurfaceUpdated is a member function of FlutterWindowsView. But when the last two lines of OnBitmapSurfaceUpdated method are output, the destructor of FlutterWindowsView has been executed (wild pointer);
  2. The last time OnBitmapSurfaceUpdated is executed, the Window handle used for rendering is nullptr, that is, the window available for rendering (bound with FlutterWindowsView) to be released.

Because the Window handle used in the final rendering is nullptr, the afterimage problem occurs.

Supplementary note: When calling a C++ member function, even if this is a wild pointer when the call is made, as long as the this object is not accessed in the member function, there will be no memory access exception (Crash).

Solution

Modify the internal implementation of FlutterEngine, when the FlutterWindowsView is destructed in SoftwareRenderer mode, null the pointer pointed to by FlutterWindowsEngine (because there will be abnormal output in GPU mode, it has not been modified yet):

In this way, it can be guaranteed that the tasks in the raster thread will not call back the rendering interface after the FlutterWindowsView is destroyed:

2.2 FlutterPlugin registration phase wild pointer Crash

problem background

There are many cases of Crash in the first and second gray stages of the Windows side of the "+ Panel" business of the Flutter version of DingTalk, and the overall crash rate of the client is as high as x%:

Through a simple analysis, the restoration of the Crash stack is roughly as follows:

Two important pieces of information can be reached from the stack:

  1. Crash appears in the initialization phase of FlutterEngine, specifically when an exception occurs during Plugin registration;
  2. The cause of the crash is the wild pointer problem.

Positioning Analysis

One word reason:

Flutter provides the wrapper layer code for the Windows platform, including an object PluginRegistrarManager that is designed as a singleton. PluginRegistrarManager mainly serves the registration of FlutterPlugin and is designed as a singleton. It maintains a mapping relationship between the FlutterEngine pointer and the Registrar through map to ensure that the Registrar and FlutterEngine life cycles are consistent. However, because the code of the wrapper layer is compiled into pulgin.dll during construction, each plugin.dll contains a copy of the implementation of PluginRegistrarManager, that is, the "singleton mechanism" fails. The problem is that the binding relationship in the PluginRegistrarManager cannot be properly cleared when the FlutterEngine is destructed, which causes it to maintain an invalid pointer address internally, and a crash occurs when it is accessed again.

The analysis process is briefly described below. With brute force testing, we can reproduce the problem:

According to the above figure, it can be confirmed that the crash is caused by the wild pointer of the FlutterEngine object. Further locate the source of the Engine pointer when the plugin is registered, and finally locate it in the flutter::PluginRegistrarManager::GetInstance()->GetRegistrar() method:

Further analysis of the implementation in PluginRegistrarManager shows that GetRegistrar needs map + emplace method to maintain the relationship between FlutterEngine address and Registrar:

Internally, the method will be registered to the underlying Engine object through FlutterDesktopPluginRegistrarSetDestructionHandler, which will be called when the FlutterEngine is destructed, thereby unbinding:

The problem occurs in this process. If PluginRegistrarManager is not a real singleton, and FlutterEngine can only maintain a valid OnRegistrarDestroyed callback, then when FlutterEngine is destructed, some of the FlutterEngine addresses saved in PluginRegistrarManager objects will not be cleared, and again It can cause problems when used.

Solution

Modify the implementation of PluginRegistrarManager in the FlutterEngine wrapper layer to optimize the "singleton" implementation scheme. The singleton life cycle management is lowered to the bottom layer, and the wrapper layer is only responsible for providing related services.

For details, please refer to:

2.3 White screen after Flutter Window visibility change

problem background

In the Flutter page on the Windows side, if you put the Flutter Window:

  • First hide it by ShowWindow(flutter_wnd, SW_HIDE);
  • Then display it through ShowWindow(flutter_wnd, SW_SHOWNORMAL).

You will find that the content of the Flutter page cannot be displayed normally, and the canvas is blank. If the Flutter page refresh is triggered by setState or dragging the window after the white screen, the content can be rendered normally.

Positioning Analysis

This problem is relatively clear. There is a bug in Flutter's Windows implementation. After the Window visibility changes, the flush should be restarted to draw the latest view to the corresponding window. However, this process has not been implemented at present, resulting in the above problems.

Solution

This issue has been submitted as an issue. For the time being, Dingding bypasses this issue by means of upper-layer compensation. After the visibility of the Native Window changes, we manually notify the Flutter side to refresh the currently visible page to trigger a redraw and avoid problems.

Summarize

The above are the main problems handled by the desktop side during the landing of DingTalk Flutter. From our actual experience, although the Flutter v2.10 version has officially released support for Windows. But only from a stability point of view, Flutter's performance on the Mac side is undoubtedly better than WIndows. If there are other teams who want to try using Flutter on the desktop single-end, we recommend choosing the Mac side first, which has more advantages than the Windows side in terms of entry threshold and performance stability.

Pay attention to [Alibaba Mobile Technology], Ali's cutting-edge mobile dry goods & practice will give you thoughts!


阿里巴巴终端技术
336 声望1.3k 粉丝

阿里巴巴移动&终端技术官方账号。