Build a canvas typesetting engine

Image credit: https://unsplash.com
The author of this article: Feiyang

background

Online example
Demo

As a front-end developer, especially a C-side front-end developer (such as WeChat applet), I believe that everyone has encountered functions such as sharing event pictures and sharing poster pictures.

分享图

Generally, the solutions to this requirement can be roughly divided into the following categories:

Depends on the server, such as writing a node service, and using puppeteer to access the pre-written webpage to take screenshots.
Use the api of CanvasRenderingContext2D directly or use auxiliary drawing tools such as react-canvas to draw.
Use the front-end page screenshot framework, such as html2canvas , dom2image , write the page structure in html, and then call the framework api to take screenshots when needed

scheme analysis:

The solution of relying on the server will consume a certain amount of server resources, especially the service of screenshots, which consumes a lot of CPU and bandwidth. Therefore, in some scenarios with high concurrency or large pictures, the experience of using this solution will It is relatively poor and the waiting time is very long. The advantage of this solution is that the degree of restoration is very high. Since the version of the headless browser on the server is determined, it can ensure that what you see is what you get, and from the perspective of development, there are no other learning costs. If the business is not very large and the traffic is not high, this scheme is the most reliable.
This solution is relatively hard-core, time-consuming and labor-intensive, and requires a lot of code to calculate the position of the layout, whether the text wraps, etc., and after the development is completed, if there are some subsequent adjustments to the ui, you have to look for the code you want to modify in the vast code. that it. The advantage of this scheme is that the details are very controllable, and in theory, various functions can be completed, if the hair is enough.
This should also be the most widely used solution on the web. So far, the number of html2canvas stars has reached 25k. The principle of html2canvas is simply to traverse the attributes in the dom structure and then convert them to canvas for rendering, so it must depend on the host environment, so some old browsers may encounter compatibility problems, of course, if it is It’s okay to encounter it during development. After all, we are all-purpose front-end development (dog head), which can be avoided through some hacking methods, but C-end products will run on a variety of devices, and it is difficult to avoid other users after the release. There are compatibility problems on the device, and if there is a problem, it is generally difficult to monitor unless the user reports it, and the number of users of the domestic applet is very large, so this solution cannot be used in the applet. So this solution seems to be peaceful, but there will be some compatibility problems.

In different jobs in the past few years, I have basically encountered the need to share pictures. Although the demand is generally not very frequent, it is not very smooth in my impression. I have also tried the above solutions. There are more or less problems.

idea:

In a requirement review, I learned that there is a plan for unified ui adjustment in subsequent iterations, and it will involve several functions of sharing pictures. At that time, the business involved small programs and h5. After the meeting, I opened the code and saw the shared image code like a mountain, interspersed with various compatible glue codes. Such a huge code is just to generate a small card layout. If it is an html layout, it should be written in 100 lines. I was thinking about how to refactor it.

Given that there is still plenty of development time, I was wondering if there are other more convenient, reliable and general solutions, and I have always been interested in this piece, with a learning attitude, so I came up with the idea of writing a library myself , after consideration, I chose the implementation idea of react-canvas , but react-canvas depends on the React framework. In order to maintain generality, the engine we developed this time does not depend on a specific web framework or DOM API, and can be based on CSS-like style sheets. Generate layout rendering, and support advanced functions to interact.

After sorting out the functions to be done, a simple canvas typesetting engine came to mind.

What is a typesetting engine

排版引擎

A layout engine, also known as a browser engine, rendering engine, or template engine, is a software component responsible for fetching markup content such as HTML, XML, and image files etc.), organize information (such as CSS, XSL, etc.), and output the typeset content to a monitor or printer. All web browsers, e-mail clients, e-readers, and other applications that need to display content according to presentational markup require a typesetting engine.

Excerpted from Wikipedia's description of browser typesetting engines, these concepts should be familiar to front-end students, common typesetting engines such as webkit, Gecko, etc.

design

Target

This requirement carries the following goals:

The framework supports "document flow layout", which is also our core requirement, without the need for developers to specify the position of elements, and automatic width and height.
The procedural call is turned into a declarative call, that is, it does not need to call the cumbersome api to draw the graph, only need to write the template to generate the graph.
Cross-platform, here mainly means that it can run on the web and various small programs without depending on a specific framework.
Support interaction, that is, events can be added, and the UI can be modified.

To sum up, it is possible to write "web pages" in canvas.

api design

In the original idea, I planned to use similar to vue template syntax as structure style data, but doing so will increase the compilation cost, and its starting point is a bit too far for the core function I want to achieve. After weighing it, I finally intend to use the api in the form of similar to the syntax of React createElement + Javascript style object, and give priority to implementing the core functions.

In addition, it should be noted that our goal is not to implement browser standards in canvas, but to be as close as possible to the css api to provide a set of solutions to achieve document flow layout.

The target api looks like this

// 创建图层
const layer = lib.createLayer(options);

// 创建节点树
// c(tag,options,children)
const node = lib.createElement((c) => {
  return c(
    "view", // 节点名
    {
      styles: {
        backgroundColor: "#000",
        fontSize: 14,
        padding: [10, 20],
      }, // 样式
      attrs: {}, // 属性 比如src
      on: {
        click(e) {
          console.log(e.target);
        },
      }, // 事件 如click load
    },
    [c("text", {}, "Hello World")] // 子节点
  );
});

// 挂载节点
node.mount(layer);

As shown above, the core of the api lies in the three parameters of creating a node:

tagName node name, here we support basic elements, such as view , image , text , scroll-view , etc. In addition, we also support custom tags, register a new component through the global component api, which is conducive to expansion.

function button(c, text) {
  return c(
    "view",
    {
      styles: {
        // ...
      },
    },
    text
  );
}

// 注册一个自定义标签
lib.component("button", (opt, children, c) => button(c, children));

// 使用
const node = lib.createElement((c) => {
  return c("view", {}, [c("button", {}, "这是全局组件")]);
});

options , the parameter of the label, supports styles , attrs , on , respectively style , _attribute_, _event_
children , the child node, can also be text.

We expect that after executing the above api, the text can be rendered in the canvas, and the corresponding events can be responded to after clicking.

`Process Architecture`

The first rendering of the framework will be performed in the following process, and will be explained in this order later:

The key details in the flowchart will be described below. There are some algorithms and data structures involved in the code that need attention.

`Module details`

`preprocessing`

After getting the view model (that is, the model written by the developer through the createElement api), it needs to be preprocessed first. This step is to filter the user input. The model input by the user only tells the framework of the intended target and cannot be used directly. use:

Node preprocessing
- Support shorthand strings, this step needs to convert the string to Text object
- Since we need to visit the sibling nodes and parent nodes frequently later, it is very important to save the sibling nodes and parent nodes in the current node and mark the position in the parent container in this step. This concept is similar to Fiber in React. structure, which is frequently used in subsequent calculations, and lays the foundation for our implementation of interruptible rendering.
style preprocessing
- Some styles support multiple shorthands that need to be converted to target values. For example, padding:[10,20] needs to be converted into 4 values paddingLeft , paddingRight , paddingTop , paddingBottom in the preprocessor.
- Set the default value of the node, such as view node default display attribute is block
- Inherited value processing, such as fontSize attribute default inheritance parent
Outlier processing, the user fills in a value that does not meet the expectations to be reminded in this step.
Initialize event mounts, resource requests, etc.
Other preparations for subsequent calculations and rendering (described later).

initStyles() {
    this._extendStyles()

    this._completeStyles()

    this._initRenderStyles()
}

`layout handling`

After preprocessing in the previous step, we have obtained a node tree with a complete style. Next, we need to calculate the layout. The calculation layout is divided into the calculation of size and position. It should be noted here that why should the size be calculated first in the process? ? Think about it carefully, if we calculate the position first, such as text and pictures, we need to complete the calculation of the previous size position and then go to the reference calculation. So this step is to calculate the position of all nodes after all nodes have calculated the size in situ.

The whole process is animated as follows.

`Calculate size`

A more professional statement should be the calculation box model. When it comes to the box model, everyone should be familiar with it, and it is almost a must for basic interviews.

Image credit: https://mdn.mozillademos.org/files/16558/box-model.png

In css, different box models can be used through the box-sizing property, but we do not support adjustment this time, the default is border-box .

For a node, its size can be simplified into several cases:

Refer to the parent node, such as width:50% .
A specific value is set, such as width:100px .
Reference child nodes, such as width:fit-content , and nodes such as image and text are also determined by the content.

After sorting out these modes, you can start the traversal calculation. For a tree, we have multiple traversal modes.

_Breadth-first traversal_:

_depth-first traversal_:

Here we consider the above situations separately:

Because it is a reference to the parent node, it needs to be traversed from parent to child.
There is no traversal order requirement.
The parent node needs to wait for all child nodes to complete the calculation before calculating, so it needs breadth-first traversal, and it is from child to parent.

There is a problem here. There is a conflict between the first and third required traversal methods, but looking back, the preprocessing part is the traversal from parent to child, so the task of calculating the size of parts 1 and 2 can be done in advance. The preprocessing part is calculated, so when this step is reached, only the third part needs to be calculated, which is calculated according to the child nodes.

class Element extends TreeNode {
  // ...

  // 父节点计算高度
  _initWidthHeight() {
    const { width, height, display } = this.styles;
    if (isAuto(width) || isAuto(height)) {
      // 这一步需要遍历，判断一下
      this.layout = this._measureLayout();
    }

    if (this._InFlexBox()) {
      this.line.refreshWidthHeight(this);
    } else if (display === STYLES.DISPLAY.INLINE_BLOCK) {
      // 如果是inline-block  这里仅计算高度
      this._bindLine();
    }
  }

  // 计算自身的高度
  _measureLayout() {
    let width = 0; // 需要考虑原本的宽度
    let height = 0;
    this._getChildrenInFlow().forEach((child) => {
      // calc width and height
    });

    return { width, height };
  }

  // ...
}

The code part is to traverse the direct child nodes in the document flow to accumulate the height and width. In addition, it is more troublesome to deal with the situation that there are multiple nodes in one line, such as inline-block and flex . Here, the Line object is added to assist management. Line instance will manage the objects in the current row, and the child node will be bound to a row instance until the Line instance reaches the maximum limit and cannot be added. When the parent node calculates the size, if it reads Line , it directly reads the instance of the row in which it is located. .

Here Text Image and other nodes with their own content need to inherit and rewrite _measureLayout method. Text internally calculates the width and height after line breaks, and Image calculates the scaled size.

class Text extends Element {
  // 根据设置的文字大小等来计算换行后的尺寸
  _measureLayout() {
    this._calcLine();
    return this._layout;
  }
}

`Calculate the location`

After calculating the size, the position can be calculated. The traversal method here requires breadth-first traversal from parent to child. For an element, as long as the position of the parent element and the previous element is determined, its own position can be determined.

This step only needs to consider confirming its own position according to the position of the parent node and the previous node. If it is not in the document flow, it is positioned according to the nearest reference node.

It is relatively complicated that if it is a node bound to the Line instance, the calculation is performed inside the Line instance, and the calculation inside the Line is similar, but the logic such as alignment and automatic line wrapping needs to be processed separately.

// 代码仅保留核心逻辑
_initPosition() {
    // 初始化ctx位置
    if (!this._isInFlow()) {
      // 不在文档流中处理
    } else if (this._isFlex() || this._isInlineBlock()) {
      this.line.refreshElementPosition(this)
    } else {
      this.x = this._getContainerLayout().contentX
      this.y = this._getPreLayout().y + this._getPreLayout().height
    }
  }

class Line {
  // 计算对齐
  refreshXAlign() {
    if (!this.end.parent) return;
    let offsetX = this.outerWidth - this.width;
    if (this.parent.renderStyles.textAlign === "center") {
      offsetX = offsetX / 2;
    } else if (this.parent.renderStyles.textAlign === "left") {
      offsetX = 0;
    }
    this.offsetX = offsetX;
  }
}

Well, after this step is completed, the work of the layout processor is completed, and then the framework will input the node to the renderer for rendering.

`Renderer`

For drawing a single node, it is divided into the following steps:

Draw the shadow, because the shadow is outside, it needs to be drawn before clipping
Draw crop and border
draw background
Draw child nodes and content, such as Text and Image

For rendering a single node, the function is relatively conventional. The basic function of the renderer is to draw different graphics, text, and pictures according to the input, so we only need to implement these APIs, and then pass the styles of the nodes through these APIs in order. After rendering, the order is mentioned again, so what order should we follow in this step of rendering. Here is the answer depth-first traversal .

In the default composition mode of canvas, the canvas is drawn at the same position, and the post-rendered node will be overlaid on it, that is to say, the post-rendered node has a larger z-index . (Due to the complexity, processing like the browser composite layer is not currently implemented, and the manual setting of z-index is temporarily not supported.)

In addition, we also need to consider a situation, how to achieve the effect of overflow:hidden , such as rounded corners, we need to clip and display the content beyond the canvas, but only clipping the parent node does not meet the requirements, in the browser the parent node The clipping effect can take effect on child nodes.

A complete clipping process call in canvas looks like this.

// save ctx status
ctx.save();

// do clip
ctx.clip();

// do something like paint...

// restore ctx status
ctx.restore();
//

What needs to be understood is that the state in CanvasRenderingContext2D is stored in the stack data structure. When we execute save multiple times, each execution of restore will restore to the most recent state.

That is to say, only the content drawn in the process of clip to restore will be clipped, so if we want to achieve parent node clipping also takes effect on child nodes, we can't restore immediately after rendering a node, we need to wait until all internal child nodes are rendered. call again.

Below are the pictures

As shown, the numbers are the rendering order

Draw node 1. Since there are still child nodes, it cannot be restored immediately
Draw node 2, and child nodes, draw node 3, node 3 has no child nodes, so execute restore
Draw node 4, there is no child node, execute restore, pay attention, at this time the nodes in node 2 have been drawn, so you need to execute restore again to restore the drawing context of node 1
Draw node 5, there is no child node, execute restore, at this time all the drawing in node 1 is completed, execute restore again

Since we have implemented the Fiber structure in the preprocessing and know the position of the parent node where the node is located, we only need to judge after each node is rendered and how many times restore needs to be called.

So far, after a long debug and refactoring, the input nodes can be rendered normally, and what needs to be done is to add support for other css properties. Rendering nodes, I always feel that there is something else I can do.

correct! The model of each graph is saved. Is it possible to modify and interact with these models? First, set a small goal to realize the event system.

`event handler`

The graphics in the canvas cannot respond to events like the dom element, so it is necessary to proxy the dom event, determine the location of the event on the canvas, and distribute it to the corresponding canvas graphics node.

If we follow the conventional event bus design idea, we only need to save different events in different List structures, and traverse to determine whether the point is in the node area when triggered, but this solution will definitely not work, and the reason is performance issues.

In the browser, event triggering is divided into capture and bubbling , that is to say, execute to capture from top to bottom according to the node hierarchy, and then execute in reverse order after reaching the deepest node. bubbling process, the List structure cannot be satisfied, the time complexity of traversing this data structure will be very high, which reflects the user experience that the operation is delayed.

After a while of brainstorming, I realized that events can also be stored in a tree structure, and the nodes with event monitoring are extracted to form a new tree, which can be called an "event tree", rather than being stored in the original node tree.

As shown in the figure, when the click event is mounted on nodes 1, 2, and 3, another callback tree structure will be generated in the event handler. In the callback, only this tree needs to be traversed, and pruning optimization can be performed. If the parent node does not have Triggered, the child elements under this parent node do not need to be traversed, which improves performance.

Another important point is to determine whether the event point is within the element. For this problem, there are many mature algorithms, such as ray method :

Time Complexity: O(n) Scope: Arbitrary Polygon

Algorithm idea: Take the measured point Q as the endpoint, draw a ray in any direction (generally horizontally to the right), and count the number of intersections between the ray and the polygon. If it is odd, Q is inside the polygon; if it is even, Q is outside the polygon.

But for our scene, except for the rounded corners, all are rectangles, and the rounded corners will be more troublesome to deal with, so the first version uses rectangles to judge, and then it will be improved as an optimization point.

According to this idea, we can implement our simple event handler.

class EventManager {
  // ...

  // 添加事件监听
  addEventListener(type, callback, element, isCapture) {
    // ...
    // 构造回调树
    this.addCallback(callback, element, tree, list, isCapture);
  }

  // 事件触发
  _emit(e) {
    const tree = this[`${e.type}Tree`];
    if (!tree) return;

    /**
     * 遍历树，检查是否回调
     * 如果父级没有被触发，则子级也不需要检查，跳到下个同级节点
     * 执行capture回调，将on回调添加到stack
     */
    const callbackList = [];
    let curArr = tree._getChildren();
    while (curArr.length) {
      walkArray(curArr, (node, callBreak, isEnd) => {
        if (
          node.element.isVisible() &&
          this.isPointInElement(e.relativeX, e.relativeY, node.element)
        ) {
          node.runCapture(e);
          callbackList.unshift(node);
          // 同级后面节点不需要执行了
          callBreak();
          curArr = node._getChildren();
        } else if (isEnd) {
          // 到最后一个还是没监测到，结束
          curArr = [];
        }
      });
    }

    /**
     * 执行on回调，从子到父
     */
    for (let i = 0; i < callbackList.length; i++) {
      if (!e.currentTarget) e.currentTarget = callbackList[i].element;
      callbackList[i].runCallback(e);
      // 处理阻止冒泡逻辑
      if (e.cancelBubble) break;
    }
  }

  // ...
}

After the event handler is completed, a scroll-view can be implemented. The internal implementation principle is to use two views, the external fixed width and height, the internal can be stretched, and the external registered events through the event handler to control the rendered transform value. It should be noted that , transform rendered, the position of the child element is not in the original position, so if the event is mounted on the child element, it will be offset. Here, the corresponding capture event is registered inside scroll-view . When the event is passed into scroll-view , the event is modified. The relative position of the instance to correct the offset.

class ScrollView extends View {
  // ...

  constructor(options, children) {
    // ...
    // 内部再初始化一个scroll-view，高度自适应，外层宽高固定
    this._scrollView = new View(options, [this]);
    // ...
  }

  // 为自己注册事件
  addEventListener() {
    // 注册捕获事件，修改事件的相对位置
    this.eventManager.EVENTS.forEach((eventName) => {
      this.eventManager.addEventListener(
        eventName,
        (e) => {
          if (direction.match("y")) {
            e.relativeY -= this.currentScrollY;
          }
          if (direction.match("x")) {
            e.relativeX -= this.currentScrollX;
          }
        },
        this._scrollView,
        true
      );
    });

    // 处理滚动
    this.eventManager.addEventListener("mousewheel", (e) => {
      // do scroll...
    });

    // ...
  }
}

`rearrange redraw`

In addition to the function of generating static layouts, the framework also has a process of redrawing and rearranging, which will be triggered when the properties of nodes are modified. APIs such as setStyle and appendChild are provided internally to modify the style or structure, and it will be confirmed whether it needs to be rearranged according to the property value. , such as modifying width will trigger redrawing after rearrangement, and modifying backgroundColor will only trigger redrawing.

`compatibility`

Although the framework itself does not depend on DOM and draws directly based on CanvasRenderingContext2D , compatibility processing is still required in some scenarios. Here are a few examples.

The api for drawing pictures on the WeChat applet platform is different from the standard, so the platform is judged in the image component, and if it is WeChat, the WeChat specific api is called to obtain it.
The font thickness set by the WeChat applet platform does not take effect on the real iOS device. After the platform is judged internally, the text will be drawn twice, and the second time will be offset based on the first time to form a bold effect.

`custom rendering`

Although the framework itself already supports the layout of most scenes, the business requirements are complex and changeable, so it provides the ability to customize the drawing, that is, only the layout is performed, and the drawing method is handed over to the developer to call, providing higher flexibility.

engine.createElement((c) => {
  return c("view", {
    render(ctx, canvas, target) {
      // 这里可以获取到ctx以及布局信息，开发者绘制自定义内容
    },
  });
});

`used in web frameworks`

Although the api itself is relatively simple, it still needs to write some repetitive code, which is not easy to read when the structure is complex.

When used in a modern web framework, the corresponding framework version, such as the vue version, can be used, and the vue node will be converted into an api call internally, which will be easier to read. However, it should be noted that due to the internal node conversion process, the corresponding Compared with direct use, there will be performance loss, and the difference will be more obvious when the structure is complex.

<i-canvas :width="300" :height="600">
  <i-scroll-view :styles="{height:600}">
    <i-view>
      <i-image
        :src="imageSrc"
        :styles="styles.image"
        mode="aspectFill"
      ></i-image>
      <i-view :styles="styles.title">
        <i-text>Hello World</i-text>
      </i-view>
    </i-view>
  </i-scroll-view>
</i-canvas>

`debugging`

In view of the relatively simple business scenarios, the debugging tools currently provided by the framework are relatively basic. By setting debug parameter, the debugging of the node layout can be enabled, and the framework will draw the layout of all nodes. If you need to view the layout of a single node, you need to mount the event. Then print to the console for debugging. More comprehensive visual debugging tools will be provided after the subsequent core functions are improved.

`achievement`

After personal experience, the development efficiency of general pages is comparable to that of writing html. In order to show the results, I wrote a simple demo page of the component library.

source code
Component Library Demo

`performance`

The framework has achieved good performance after several refactorings, and the performance is as follows

Optimizations that have been made:

Traversal algorithm optimization
Data structure optimization
scroll-view redraw optimization
- scroll-view redraws only the elements within the rendering range
- Elements outside the scroll-view visible range will not be rendered
Image instance caching, although there is http caching, multiple instances will be generated for the same image, and instance caching is done internally.

To be optimized:

Interrupt rendering, since we have implemented a structure similar to Fiber , it is more convenient to add this feature later.
The preprocessor also needs to be enhanced to enhance compatibility with the style and structure of user input, and to enhance robustness

`Summarize`

From the beginning, I wanted to implement a simple image rendering function, and finally implemented a simple canvas typesetting engine. Although the implemented features are limited and there are still many details and bugs to be fixed, it already has basic layout and interaction capabilities. Many pits and refactorings have been made, and at the same time, I can't help but sigh about the power of the browser's typesetting engine. And I also realized the charm of algorithms and data structures. Good design is the cornerstone of high performance and good maintainability, and I also get a lot of fun.

In addition, after this mode is perfected, I think there is still a lot of imagination. In addition to simple image generation, it can also be used for the list layout of h5 games, table rendering of massive data and other scenarios. In addition, there is an idea in the later stage. Currently, community rendering There are already a lot of good libraries in this piece, so I want to separate the layout, calculation of line wrapping, image scaling and other functions into a separate tool library, and integrate other libraries for rendering.

My ability to express is limited, and there may still be many details that have not been clarified. Comments and exchanges are welcome.

`thanks for reading`

This article is published from NetEase Cloud Music Big Front-end Team . Any unauthorized reprinting of the article is prohibited. We recruit front-end, iOS, and Android all year round. If you are ready to change jobs and happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!