Front-end technology sharing: review of page performance optimization problems

Project Background

In the code_pc project, the front-end needs to use rrweb to record the teacher's teaching content, and the students can record and play it back. In order to reduce the volume of recorded files, the current recording strategy is to record a full snapshot first, and then record incremental snapshots. The recording phase actually monitors the changes of DOM elements through MutationObserver, and then pushes events one by one into the array.

For persistent storage, the recorded data can be compressed and serialized into JSON files. The teacher will put the JSON file into the courseware package, and upload it to the educational administration system as a compressed package. When students play back, the front-end will first download the compressed package, decompress it through JSZip, get the JSON file, deserialize and decompress it, get the original recording data, and then pass it into rrwebPlayer for recording and playback.

problem found

In the project development stage, the test recording will not be too long, so the recording file size is not large (hundreds of kilobytes), and the playback is relatively smooth. However, as the project entered the testing stage, after simulating the recording of a long class scene, it was found that the recording file became very large, reaching 10-20 M. QA students reported that when they opened the student playback page, the page was obviously stuck, and the stuck time was More than 20s, during this time, there is no response to the page interaction event.

Page performance is the main factor affecting user experience, and users are obviously unacceptable for such a long-term page freeze.

Troubleshoot

After communication within the group, we learned that may cause page freezes mainly due to two factors: : decompressing the zip package at the front end, and loading the recording and playback files. Colleagues suspect that it is mainly the problem of decompressing the zip package, and at the same time hope that I will try to put the decompression process into the worker thread. So is it true that the front-end decompressing the zip package causes the page to freeze, as my colleague said?

3.1 Solve the time-consuming problem caused by Vue recursive complex objects

For the page stuck problem, the first thought must be caused by thread blocking, which requires checking where long tasks occur.

The so-called long task refers to the task that takes more than 50ms to execute. We all know that Chrome browser page rendering and V8 engine use a thread. If the JS script takes too long to execute, the rendering thread will be blocked, which will cause the page to freeze. .

For the time-consuming analysis of JS execution, everyone should know the use of the performance panel. In the performance panel, analyze the call stack and execution time by looking at the flame graph. The width of each block in the flame graph represents the execution time, and the height of the block overlay represents the depth of the call stack.

In this way, let's look at the results of the analysis:
在这里插入图片描述
It can be seen that replayRRweb is obviously a long task, which takes nearly 18s and seriously blocks the main thread.

The time-consuming replayRRweb is caused by two internal calls, the light green part on the left and the dark green part on the right. Let's take a look at the call stack to see where the time-consuming is more serious:
在这里插入图片描述
Students who are familiar with Vue source code may have seen that the above time-consuming methods are all recursive and reactive methods within Vue (the right shows that these methods are from vue.runtime.esm.js).

Why do these methods occupy the main thread for a long time? There is one in Vue performance optimization: Do not throw complex objects into data , otherwise Vue will deeply traverse the properties in the object and add getters and setters (even if these data are not needed for view rendering), which will lead to performance problems.

So is there such a problem in business code? We found a very suspicious code

export default {
  data() {
    return {
      rrWebplayer: null
    }
  },
  mounted() {
    bus.$on("setRrwebEvents", (eventPromise) => {
      eventPromise.then((res) => {
        this.replayRRweb(JSON.parse(res));
      })
    })
  },
  methods: {
    replayRRweb(eventsRes) {
      this.rrWebplayer = new rrwebPlayer({
        target: document.getElementById('replayer'),
        props: {
          events: eventsRes,
          unpackFn: unpack,
          // ...
        }
      })
    }
  }
}

In the above code, an instance of rrwebPlayer is created and assigned to rrWebplayer's responsive data. When creating an instance, it also accepts an eventsRes array, which is very large and contains tens of thousands of pieces of data.

In this case, if Vue is recursive and reactive to rrWebplayer, it must be very time-consuming. Therefore, we need to make rrWebplayer Non-reactive data (avoid Vue recursive reactive).

The data is not pre-defined in the data option, but this.rrwebPlayer is dynamically defined after the component instance is created (no dependency collection in advance, no recursive response);

The data is pre-defined in the data option, but when the state is subsequently modified, the object is processed by Object.freeze (let Vue ignore the responsive processing of the object);

The data is defined outside the component instance and is defined in the form of module private variables (in this way, pay attention to the memory leak problem, Vue will not destroy the state when the component is unloaded);

Here we use the third method , change rrWebplayer to Non-reactive data and try:

let rrWebplayer = null;export default {
  //...
  methods: {
    replayRRweb(eventsRes) {
      rrWebplayer = new rrwebPlayer({
        target: document.getElementById('replayer'),
        props: {
          events: eventsRes,
          unpackFn: unpack,
          // ...
        }
      })
    }
  }
}

Reloading the page, you can see that although the page is still stuck at this time, the stuck time has been significantly shortened to within 5 seconds. Looking at the flame graph, we can see that the recursive reactive call stack has disappeared under the replayRRweb call stack:

3.2 Use Time Slicing to Solve the Time-consuming Problem of Playback File Loading

But for users, this is still unacceptable, let's continue to see where the time-consuming is serious:

You can see that the problem is still in the replayRRweb function, which step is it:

So how to solve the problem of unpacking time-consuming?

Since rrweb recording and playback requires dom operations, it must be run on the main thread, and worker threads cannot be used (the dom API cannot be obtained). For long tasks in the main thread, it is easy to think of dividing the long tasks into small tasks through time slicing, scheduling tasks through the event loop, and executing tasks when the main thread is idle and the current frame has free time. , otherwise render the next frame. The plan is determined, and the following is the question of which API to choose and how to divide the task.

Some students here may ask questions, why the unpack process cannot be executed in the worker thread, the worker
After the data is decompressed in the thread, it is returned to the main thread for loading and playback, so that non-blocking can be achieved?
If you think about it carefully, when unpacking is performed in the worker thread, the main thread must wait until the data decompression is completed before playback can be performed, which is similar to unpacking directly in the main thread.
There is no essential difference. Worker threads only have a performance advantage when there are several parallel tasks to execute.

When it comes to time slicing, many students may think of the requestIdleCallback API. requestIdleCallback can perform tasks in the idle time of the browser rendering a frame, so as not to block page rendering, UI interaction events, etc. The purpose is to solve the situation of page frame loss (stuck) caused by tasks that need to occupy the main process for a long time, resulting in higher priority tasks (such as animation or event tasks) that cannot respond in time. Therefore, requestIdleCallback is positioned to handle tasks that are not important and not urgent.

requestIdleCallback is not executed at the end of every frame, only 16.6ms in one frame
It will only be executed when the rendering task is over and there is still time left. In this case, the next frame needs to be rendered at the end of requestIdleCallback execution, so
requestIdleCallback should not be executed more than each Tick
30ms, if the control is not returned to the browser for a long time, it will affect the rendering of the next frame, resulting in the page being stuck and the event response not being timely.

requestIdleCallback Parameter description:

// 接受回调任务
type RequestIdleCallback = (cb: (deadline: Deadline) => void, options?: Options) => number
// 回调函数接受的参数
type Deadline = {
 timeRemaining: () => number // 当前剩余的可用时间。即该帧剩余时间。
 didTimeout: boolean // 是否超时。
}

We can write a simple demo with requestIdleCallback:

// 一万个任务，这里使用 ES2021 数值分隔符
const unit = 10_000;
// 单个任务需要处理如下
const onOneUnit = () => {
    for (let i = 0; i <= 500_000; i++) {}
}
// 每个任务预留执行时间
1msconst FREE_TIME = 1;
// 执行到第几个任务
let _u = 0;

function cb(deadline) {
// 当任务还没有被处理完 & 一帧还有的空闲时间 > 1ms
    while (_u < unit && deadline.timeRemaining() >FREE_TIME) {
        onOneUnit();
        _u ++;
    }
    // 任务干完
    if (_u >= unit) return;
    // 任务没完成, 继续等空闲执行
    window.requestIdleCallback(cb)
}

window.requestIdleCallback(cb)

In this way, requestIdleCallback seems to be perfect. Can it be used directly in actual business scenarios? The answer is no. We can see from the MDN documentation that requestIdleCallback is only an experimental API, and the browser compatibility is general:

Consult caniuse and get a similar conclusion, all IE browsers do not support it, and safari is not enabled by default:

And there is another problem, the trigger frequency of requestIdleCallback is unstable, which is affected by many factors. After the actual test, the FPS is only about 20ms, and the duration of rendering a frame is controlled at 16.67ms under normal circumstances.

In order to solve the above problems, in the React Fiber architecture, a set of requestIdleCallback mechanism is implemented by itself:

Use requestAnimationFrame to get the start time of rendering a frame, and then calculate the expiration time of the current frame;
Use performance.now() to achieve microsecond-level high-precision timestamps for calculating the remaining time of the current frame;
Use MessageChannel zero-delay macro task to achieve task scheduling, such as using setTimeout(), there is a minimum time threshold, usually 4ms;

According to the above ideas, we can simply implement a requestIdleCallback as follows:

// 当前帧到期时间点
let deadlineTime;
// 回调任务
let callback;
// 使用宏任务进行任务调度
const channel = new MessageChannel();
const port1 = channel.port1;
const port2 = channel.port2;
// 接收并执行宏任务
port2.onmessage = () => {
  // 判断当前帧是否还有空闲，即返回的是剩下的时间
  const timeRemaining = () => deadlineTime - performance.now();
  const _timeRemain = timeRemaining();
  // 有空闲时间 且 有回调任务
  if (_timeRemain > 0 && callback) {
    const deadline = {
      timeRemaining,
      didTimeout: _timeRemain < 0,
    };
    // 执行回调
    callback(deadline);
  }
};
window.requestIdleCallback = function (cb) {
  requestAnimationFrame((rafTime) => {
    // 结束时间点 = 开始时间点 + 一帧用时16.667ms
    deadlineTime = rafTime + 16.667;
    // 保存任务
    callback = cb;
    // 发送个宏任务
    port1.postMessage(null);
  });
};

In the project, considering the api fallback scheme and support for canceling the task function (the above code is relatively simple, only the task function can be added, and the task cannot be canceled), the React official source code is finally used for implementation.

Then the API problem is solved, and the rest is the problem of how to divide tasks.

According to the rrweb documentation, an addEvent method is provided on the rrWebplayer instance to dynamically add playback data, which can be used in real-time live broadcast and other scenarios. According to this idea, we can segment the recording and playback data and call addEvent multiple times to add them.

import {
  requestHostCallback, cancelHostCallback,
}
 from "@/utils/SchedulerHostConfig";
export default {
  // ...
  methods: {
    replayRRweb(eventsRes = []) {
      const PACKAGE_SIZE = 100;
      // 分片大小
      const LEN = eventsRes.length;
      // 录制回放数据总条数
      const SLICE_NUM = Math.ceil(LEN / PACKAGE_SIZE);
      // 分片数量
      rrWebplayer = new rrwebPlayer({
        target: document.getElementById("replayer"),
        props: {
          // 预加载分片
          events: eventsRes.slice(0, PACKAGE_SIZE),
          unpackFn: unpack,
        },
      });
      // 如有任务先取消之前的任务
      cancelHostCallback();
      const cb = () => {
        // 执行到第几个任务
        let _u = 1;
        return () => {
          // 每一次执行的任务
          // 注意数组的 forEach 没办法从中间某个位置开始遍历
          for (let j = _u * PACKAGE_SIZE; j < (_u + 1) * PACKAGE_SIZE; j++) {
            if (j >= LEN) break;
            rrWebplayer.addEvent(eventsRes[j]);
          }
          _u++;
          // 返回任务是否完成
          return _u < SLICE_NUM;
        };
      };
      requestHostCallback(cb(), () => {
        // 加载完毕回调
      });
    },
  },
};

Note that the callback is finally loaded, this function is not provided in the source code, and I added it by modifying the source code myself.

According to the above plan, we reloaded the student playback page to see, and now the lag is almost undetectable. We find a 20M large file to load, and observe the flame graph. We can see that the recording file loading task has been divided into small tasks. The execution time of each task is about 10-20ms, and the main thread is no longer obviously blocked. :

After optimization, the page is still stuck. This is because the granularity of our split tasks is 100. In this case, there is still pressure to load, record and playback. We observed that the fps is only a dozen or so, and there will be a sense of stuttering. We continue to adjust the granularity to 10. At this time, the page loading is obviously smooth. Basically, the fps can reach more than 50, but the total time of recording and playback loading is slightly longer. Using the time slicing method can avoid the page being stuck, but the loading of the recording and playback takes several seconds on average, and some large files may take about ten seconds. We add a loading effect when processing this time-consuming task to prevent The user starts playback before the recording file is loaded.

Some students may ask, since loading has been added, why do you need time slicing? If time slicing is not performed, since the JS script keeps occupying the main thread and blocking the UI thread, this loading animation will not be displayed. Only by freeing the main thread through time slicing can some higher priority be released. Tasks (such as UI rendering, page interaction events) are executed so that the loading animation has a chance to be displayed.

advanced optimization

Using time slicing is not without its drawbacks, as mentioned above, the total time for recording and playback to load is slightly longer. But fortunately, the 10-20M recording files only appear in the test scene. The teachers actually recorded files below 10M. After the test, the recording and playback can be loaded in about 2s, and the students will not wait for a long time.

If the subsequent recording file is very large, how to optimize it? The unpack process mentioned earlier, we did not put it on the worker thread for execution, this is because considering that it is placed on the worker thread, the main thread has to wait for the worker thread to finish executing, which is no different from executing on the main thread. However, inspired by time slicing, we can also slice the unpack task, and then according to the navigator.hardwareConcurrency API, enable multi-threading (the number of threads is equal to the number of user CPU logical cores), and execute unpack in parallel. Multi-core CPU performance, should be able to significantly improve the recording file loading rate.

Summarize

In this article, we analyze the call stack and execution time through the flame graph of the performance panel, and then find out two factors Vue complex object recursive response, and recording and playback file loading.

For the time-consuming problem caused by recursive responsiveness of Vue complex objects, the solution proposed in this article is to convert the object into non-responsive data. For the time-consuming problem caused by the loading of recording and playback files, the solution proposed in this paper is to use time slicing.

Due to the compatibility of the requestIdleCallback API and the unstable trigger frequency, this article analyzes how to implement requestIdleCallback scheduling with reference to the React 17 source code, and finally uses the React source code to implement time slicing. After the actual test, the page freezes for about 20s before optimization. After optimization, the freeze is no longer noticeable, and the fps can reach more than 50. But after using time slicing, the recording file loading time is slightly longer. The subsequent optimization direction is to shard the unpack process, enable multi-threading, execute unpack in parallel, and make full use of multi-core CPU performance.

refer to
· vue-9-perf-secrets
React Fiber hard? Six questions to help you understand
· requestIdleCallback - MDN
· requestIdleCallback - caniuse
· implements React requestIdleCallback scheduling capability
For details, please click here view

Front-end technology sharing: review of page performance optimization problems

Project Background

problem found

Troubleshoot

3.1 Solve the time-consuming problem caused by Vue recursive complex objects

3.2 Use Time Slicing to Solve the Time-consuming Problem of Playback File Loading

advanced optimization

Summarize

有道AI情报局

引用和评论

速来体验！基于有道子曰的翻译大模型2.0正式上线

Vue.js-Vue实例

你可能不知道的图片加载相关知识

手写一个动态海洋和天空效果的vue hooks

使用CSS给标题添加书名号并超出省略

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！

原生electron起步-从零到一完成构建和打包