Flutter&#39;s next-generation graphics renderer, Impeller

Flutter proposed in Roadmap in 2022 that it needs to rethink the way shaders are used, and plans to rewrite the image rendering backend. Recently, the rendering backend Impeller (impeller) has begun to take shape. This article will introduce the problems, goals, architecture and rendering details that Impeller solves.

background

Flutter has solved many Jank problems in the past year or so, but the Jank problem caused by shader compilation has not been completely solved. Here we first understand what shader compiles Jank. The bottom layer of Flutter uses skia as a 2D graphics rendering library, and skia defines a set of SkSL (Skia shading language) internally, and SkSL is a variant of GLSL. In Flutter's rasterization phase, Skia generates SkSL based on drawing commands and device parameters when the shader is used for the first time, and then converts the SkSL to a specific backend (GLSL, GLSL ES, or Metal SL) shader, and uses it on the device. Compiled as a shader program. Compiling shaders can take hundreds of milliseconds, resulting in tens of frames lost. To locate shader compilation Jank problems, you can check the trace information for GrGLProgramBuilder::finalize calls.

In order to solve this problem, Flutter implements the SkSL preheating mechanism for the GL backend in Flutter 1.20, which supports offline collection of SkSL shaders used in the application and saves it as a json file, and then packages the file into the application, and finally Precompiles SkSL shaders when the user first opens the application, thus reducing shader compilation jank. Subsequently, precompilation of iOS metal shaders was supported in Flutter 2.5.
Before and after the Flutter gallery app warms up, the performance is significantly improved from ~90ms to ~40ms on Moto G4 and from ~300ms to ~80ms on iPhone 4s.
After Flutter officially provided the SkSL shader warm-up, some high-frequency problems frequently mentioned by the community are collected as follows:
Q1. Why not precompile all shaders used?
For best performance, Skia GPU backend dynamically generates shaders at runtime based on some parameters (like drawing command, device model, etc.). The combination of these parameters generates a large number of shaders that cannot be precompiled and built into the application.
Q2. Are the SkSL shaders captured on different devices the same?
In theory, there is no mechanism to guarantee that SkSL shaders captured on one device will also work on other devices. In fact, (limited) testing has shown that the SkSL shader performs well, even when SkSL captured on iOS is applied to an Android device, or SkSL captured on an emulator is applied to a real device.

Q3. Why not create a super shader and compile it only once?
Such shaders would be very large and essentially reimplement Skia GPU functionality. Larger shaders take longer to compile and thus introduce more Janks.
But SkSL shader warmup also has its own shortcomings and limitations:

App package size increases
Longer application startup time due to precompiled SkSL shaders
Unfriendly development experience
The generality of the SkSL shader is not guaranteed and unpredictable The following timeline enumerates Flutter's efforts and progress in solving the Jank problem:

For the shader compilation Jank problem, the official has tried many times and still cannot completely solve it. Therefore, in the roadmap in 2022, please clearly propose to reconsider the way of using shaders, and plan to rewrite the image rendering backend. The plan is to migrate Flutter to the new architecture on iOS in 2022, and then port the solution to other platforms based on experience. Recently, the graphics rendering backend impeller (impeller) has been seen for the first time, let's take a look at what is unique about the impeller.
Impeller Architecture
Impeller is a renderer tailored for flutter. It is currently in the early prototype stage. It only implements the metal backend and supports iOS and Mac systems. In terms of engineering, he relies on flutter fml and display list, and implements the display list dispatcher interface, which can easily replace skia. Impeller is used by the flutter flow subsystem, hence the name.
Impeller core goals:
Predictable performance: All shaders are compiled offline at compile time, and pipeline state objects are pre-built from shaders.
Detectable: All graphics resources (textures, buffers, pipeline state objects, etc.) are tracked and tagged. Animations can be captured and persisted to disk without affecting rendering performance.
Portable: Not tied to a specific rendering API, shaders are written once and converted when needed.
Use modern graphics APIs: Make heavy use of (but not rely on) features of modern graphics APIs such as Metal and Vulkan.
Efficient use of concurrency: Single-frame workloads can be distributed across multiple threads.
impeller software architecture

Impeller can be roughly divided into several modules such as Compiler, Renderer, Entity, Aiks, and basic libraries Geomety and Base.
Compiler: host-side tool, including shader Compiler and Reflector. The Compiler is used to compile GLSL 4.60 shader source code offline to a specific backend shader (eg MSL). Reflector generates C++ shader bindings offline from shaders to quickly build pipeline state objects (PSO) at runtime
Renderer: Used to create buffers, generate pipeline state objects from shader bindings, set up RenderPass, manage uniform-buffers, subdivide surfaces, perform rendering tasks, etc.
Entity: used to build 2D renderers, contains shaders, shader bindings and pipeline state objects
Aiks: Encapsulates Entity to provide Skia-like API, which exists temporarily and is easy to connect to flutter flow
Impeller shader compilation offline
The impeller compiler module is the key to solving the shader compilation Jank. In the compilation phase, the compiler-related source code is first compiled into the host tool impellerc binary. Then start the first compilation stage of the shader, use the impellerc compiler to compile all the shader source code (including vertex shader and fragment shader) in the //impeller/entity/shaders/ directory into the shader intermediate language SPIR-V. Then start the second compilation phase of shading, convert SPIR-V to a high-level shader language for a specific backend (such as Metal SL), and then (using Metal Binary Archives on iOS) convert the backend-specific shader source code (Metal shader) Compiled as shader library. At the same time, the other path uses impellerc reflector to process SPIR-V to generate C++ shader binding, which is used to quickly create pipeline state objecs (PSO) at runtime. The header files generated by the Shader binding include structures (with proper padding and alignment) that allow uniform data and vertex data to be assigned directly to shaders without having to deal with bindings and vertex descriptors. Finally, compile the shader library and binding sources into the flutter engine.

In this way, all shaders are compiled into a shader library when offline, and no compilation operation is required at runtime, thereby improving the rendering performance of the first frame and completely solving the jank problem caused by shader compilation.
Shader Bindings
Shaders in impeller only need to be written once based on the GLSL 4.60 syntax, and translated to backend-specific shaders and bindings at compile time. For example, the solid_fill.vert vertex shader is compiled offline to generate solid_fill.vert.metal, solid_fill.vert.h and solid_fill.vert.mm files.
solid_fill.vert:

 uniform FrameInfo {
    mat4 mvp;
    vec4 color;
} frame_info;

in vec2 vertices;

out vec4 color;

void main() {
    gl_Position = frame_info.mvp * vec4(vertices, 0.0, 1.0);
    color = frame_info.color;
}

solid_fill.vert.metal:

 using namespace metal;
struct FrameInfo
{
    float4x4 mvp;
    float4 color;
};

struct solid_fill_vertex_main_out
{
    float4 color [[user(locn0)]];
    float4 gl_Position [[position]];
};

struct solid_fill_vertex_main_in
{
    float2 vertices [[attribute(0)]];
};

vertex solid_fill_vertex_main_out solid_fill_vertex_main(
    solid_fill_vertex_main_in in [[stage_in]],
    constant FrameInfo& frame_info [[buffer(0)]])
{
    solid_fill_vertex_main_out out = {};
    out.gl_Position = frame_info.mvp * float4(in.vertices, 0.0, 1.0);
    out.color = frame_info.color;
    return out;
}

solid_fill.vert.h:

 struct SolidFillVertexShader {
  // ===========================================================================
  // Stage Info ================================================================
  // ===========================================================================
  static constexpr std::string_view kLabel = "SolidFill";
  static constexpr std::string_view kEntrypointName = "solid_fill_vertex_main";
  static constexpr ShaderStage kShaderStage = ShaderStage::kVertex;
  // ===========================================================================
  // Struct Definitions ========================================================
  // ===========================================================================

  struct PerVertexData {
    Point vertices; // (offset 0, size 8)
  }; // struct PerVertexData (size 8)

  struct FrameInfo {
    Matrix mvp; // (offset 0, size 64)
    Vector4 color; // (offset 64, size 16)
    Padding<48> _PADDING_; // (offset 80, size 48)
  }; // struct FrameInfo (size 128)

  // ===========================================================================
  // Stage Uniform & Storage Buffers ===========================================
  // ===========================================================================

  static constexpr auto kResourceFrameInfo = ShaderUniformSlot<FrameInfo> { // FrameInfo
    "FrameInfo",     // name
    0u, // binding
  };

  // ===========================================================================
  // Stage Inputs ==============================================================
  // ===========================================================================

  static constexpr auto kInputVertices = ShaderStageIOSlot { // vertices
    "vertices",             // name
    0u,          // attribute location
    0u,    // attribute set
    0u,           // attribute binding
    ShaderType::kFloat,     // type
    32u,    // bit width of type
    2u,     // vec size
    1u       // number of columns
  };

  static constexpr std::array<const ShaderStageIOSlot*, 1> kAllShaderStageInputs = {
    &kInputVertices, // vertices
  };

  // ===========================================================================
  // Stage Outputs =============================================================
  // ===========================================================================
  static constexpr auto kOutputColor = ShaderStageIOSlot { // color
    "color",             // name
    0u,          // attribute location
    0u,    // attribute set
    0u,           // attribute binding
    ShaderType::kFloat,     // type
    32u,    // bit width of type
    4u,     // vec size
    1u       // number of columns
  };
  static constexpr std::array<const ShaderStageIOSlot*, 1> kAllShaderStageOutputs = {
    &kOutputColor, // color
  };

  // ===========================================================================
  // Resource Binding Utilities ================================================
  // ===========================================================================

  /// Bind uniform buffer for resource named FrameInfo.
  static bool BindFrameInfo(Command& command, BufferView view) {
    return command.BindResource(ShaderStage::kVertex, kResourceFrameInfo, std::move(view));
  }


};  // struct SolidFillVertexShader

The solid_fill.vert.mm file only fills and aligns the corresponding structure, and has no actual function.
For solid_fill.frag the same processing logic, solid_fill.frag.metal, solid_fill.frag.h and solid_fill.frag.mm files are generated.

Shader binding files contain all shader description information, such as entry points, input/output structures, and corresponding buffer slots. The runtime can quickly generate pipeline state objects according to shader binding. In addition, the input/output structures in bindings are padded and aligned, so vertex and uniform data can be directly memory mapped.

Impeller rendering process

Impeller implements IOSContextMetalImpeller, IOSSurfaceMetalImpeller and GPUSurfaceMetalImpeller structures to connect to the flutter flow subsystem by inheriting IOSContext, IOSSurface and flow Surface respectively. In the rasterization stage, the Layer Tree is synthesized through DisplayListCanvasRecorder (which inherits from SkNoDrawCanvas and implements all SkCanvas functions), converts the drawing commands in all layers into DLOps one by one, and stores them in the DisplayList structure. DLOps stores all data information of drawing, such as common AnitiAliasOp, SetColorOp, DrawRectOp, etc. There are 73 kinds of Ops.
The following is the structure of DrawRectOp of drawRect:

 struct DrawRectOp final : DLOp {
    static const auto kType = DisplayListOpType::kDrawRect;

    explicit DrawRectOp(SkRect rect) : rect(rect) {}

    const SkRect rect;

    void dispatch(Dispatcher& dispatcher) const {
        dispatcher.drawRect(rect);
    }
};

Next, enter the rendering process of impeller, use DisplayListDispatcher to execute all Ops in DisplayList, call the corresponding function of DisplayListDispatcher in the dispatch() function of Op, and convert the drawing information into EntityPass structure. If there is a saveLayer operation, create a child EntityPass to form an EntityPass tree structure. At the same time, multiple related Ops are converted into Entity and stored in EntityPass. Each Entity corresponds to a kind of Contents, representing a kind of drawing operation (such as drawRect/clipPath, etc.), and there are 11 kinds of Contents (see the appendix impeller class diagram in the fifth subsection). It can be seen that DisplayList records fine-grained Op information, with a flat structure and no hierarchical relationship. After converting to EntityPass, Ops is assembled, and a hierarchical EntityPass tree is generated according to the savaLayer operation, which is more convenient for subsequent rendering.
Then, use RenderPass to traverse from Root EntityPass, convert each Entity in EntityPass into a Command structure, that is, generate GPU Pipeline from Shader Bindings, convert Polygon to vertex data, set the color or texture data of the fragment shader, and then convert the vertex data And color or texture data is converted to GPU buffer and set to GPU Pipeline. After traversing all Entity Passes, all Commands are stored in RenderPass.
Then, start the rendering command encoding phase, generate MTLRenderCommandEncoder according to MTLCommandBuffer, traverse all Commands, set PipelineState, Vertext Buffer, and Fragment Buffer in each Command to MTLRenderCommandEncoder, and finally. End encoding and commit command buffer.
The following is the structure diagram of Entity Passes:

The Canvas#saveLayer() operation will create a child EntityPass for off-screen rendering; common operations that require off-screen rendering are: alpha blending, gradient, gaussian blur, and expensive clips
EntityPass contains a series of Entity, each Entity is a drawing operation, corresponding to Canvas#drawXXX()
Each Entity corresponds to a Contents, representing a drawing type, a total of 11 Contents
Each Contents generates a corresponding Command when rendering, including vertex data, fragment shader data, and GPU rendering pipeline information
The vertex data of the GPU drawing process is very important. It is necessary to generate vertex data according to the drawn shape, and then generate a vertex buffer object (VBO) to associate it with the rendering pipeline. The following is the processing process of vertices in the impeller:

Taking the Rect type as an example, the Rect will be converted into a Path structure in the EntityPass generation stage, and then the Tessellator (surface subdivision) will be used in the Command creation stage to generate vertex data according to the Path, store it in the main memory HostBuffer, and save the offset and length. Attaches to the PSO of the vertex or fragment shader for the BufferView. In the Encode Commands stage, upload the entire HostBuffer to the GPU buffer, and set the Vertext/Fragment Buffer, offset and length information of the drawing to the corresponding GPU pipeline.
Appendix: Impeller Class Diagram
Summarize
Above we have introduced the problem impeller is trying to solve, his goals, architecture and rendering details. The current status of the project is as follows:
Impeller compiles shader offline as shader library, which can effectively improve the performance of the first frame and avoid the jank problem caused by shader compilation
Currently only Metal backend is implemented, supporting iOS and Mac
Supports 73 Ops and 11 Contents
The code size is 18774 lines, and it still relies on some Skia data structures, such as SkNoDrawCanvas, SkPaint, SkRect, SkPicture, etc.
The project is in the early prototype stage, and some functions are not yet supported, such as stroke, color filter, image filter, path effect, mask filter, gradient, and drawArc, drawPoints, drawImage, drawShadow, etc. Progress and plans are documented in issue #95434.
The overall workload is large, which is equivalent to rewriting the function of Skia GPU. It can be seen that flutter is determined to rewrite the image rendering backend in order to solve the jank problem and improve the rendering performance. Expect impeller to take flutter's rendering performance to the next level.

Flutter's next-generation graphics renderer, Impeller

background

Impeller Architecture

Impeller shader compilation offline

Impeller rendering process

Appendix: Impeller Class Diagram

Summarize

大淘宝技术

引用和评论

大淘宝技术斩获NTIRE 2023视频质量评价比赛冠军（内含夺冠方案）

Flutter 适配HarmonyOS NEXT：调用原生功能实现相册选取与拍照

从gitee上的鸿蒙开源Flutter停止更新说起

谨慎升级macOS 15.4，规避 ITMS-90048 错误

差生文具多

Flutter 3.32 升级要点全解析

flutter实现一个提示弹框（LayerLink、OverlayEntry），实现各种对齐方式

Flutter&#39;s next-generation graphics renderer, Impeller

background

Impeller Architecture

Impeller shader compilation offline

Impeller rendering process

Appendix: Impeller Class Diagram

Summarize

大淘宝技术

引用和评论

大淘宝技术斩获NTIRE 2023视频质量评价比赛冠军（内含夺冠方案）

Flutter 适配HarmonyOS NEXT：调用原生功能实现相册选取与拍照

从gitee上的鸿蒙开源Flutter停止更新说起

谨慎升级​​macOS 15.4​​，规避 ​​ITMS-90048​​ 错误

差生文具多

Flutter 3.32 升级要点全解析

flutter实现一个提示弹框（LayerLink、OverlayEntry），实现各种对齐方式

Flutter's next-generation graphics renderer, Impeller

谨慎升级macOS 15.4，规避 ITMS-90048 错误