Quoting the slogan of the group monitoring: Those who are concerned about business stability will not have bad luck~
background
I don't know when the front-end white screen problem has become a very common topic. "White screen" has even become synonymous with front-end bugs: _Hello, your page is white. _Moreover, the phenomenon of'white' seems to be stronger for the user's physical sense, recalling the'blue screen' of the crash of the windows system:
It can be said that they are very similar, and they can even understand how the term white screen is unified. Then, the phenomenon of such a strong body sensation is bound to bring some bad effects to users. How to monitor as early as possible and quickly eliminate the effects becomes very important.
Why monitor the white screen alone
It is not only a white screen, but a white screen is just a phenomenon. What we need to do is fine-grained abnormal monitoring. Each company must have its own system for abnormal monitoring, and the group is no exception, and it is mature enough. However, the general scheme always has its shortcomings. If all abnormalities are alarmed and monitored, it is impossible to distinguish the severity of the abnormalities and respond accordingly. Therefore, customized and refined abnormal monitoring under the general monitoring system is Very necessary. This is the reason why this article discusses the scene of white screen. I delineated the boundary of this scene on the phenomenon of "white screen".
Program research
There are probably two possible reasons for the white screen:
- Errors in the execution of js
- Resource error
The two directions are different, and resource errors affect more areas and depend on the situation, so they are not within the scope of the following plan. For this reason, I refer to some practices on the Internet plus some research of my own, and I probably summarized some plans:
One, onerror + DOM detection
The principle is very simple. Under the current mainstream SPA framework, the DOM is generally mounted under a root node (such as <div id="root"></div>
). After a white screen occurs, the usual phenomenon is that all DOMs under the root node are uninstalled. This solution is to monitor the global onerror
event. , When an exception occurs, check whether the DOM is mounted under the root node, if not, it proves that the screen is blank.
I think it is a very simple and violent and effective plan. But there are also shortcomings: is based on **White screen === DOM under the root node is uninstalled** under the premise of the establishment of , the actual is not the case, such as some micro front-end framework, of course, there are some I will mention later This plan naturally conflicts with my final plan.
2. Mutation Observer Api
If you don’t understand, you can look at the document .
Its essence is to monitor DOM changes and tell you whether the DOM that changes each time is added or deleted. A variety of options were considered for it:
onerror
with 0611636deb94a8, similar to the first plan, but it was quickly rejected by me. Although it can well know the trend of DOM changes, it cannot be linked to a specific error. Both are event monitoring. Both are There is no necessary connection.- Use it alone to determine whether a large amount of DOM has been uninstalled. Disadvantages: a white screen may not necessarily mean that the DOM has been uninstalled, or it may not be rendered at all, and a large amount of DOM may be uninstalled under normal circumstances. No way at all.
- Using its monitoring timing alone to cooperate with DOM detection has the same disadvantages as Option 1, and I think it is not as good as Option 1. Because it cannot be linked to specific errors, that is, it cannot be located. Of course, when I communicated with other teammates, they gave other directions: tracking user behavior data to locate problems, I think it is also a way.
At the beginning I thought this was the final answer, but after a long period of psychological struggle, it was finally rejected. But it gives a better choice of monitoring timing.
3. Are you hungry-Emonitor white screen monitoring solution
Ele.me’s white-screen monitoring solution is based on the principle of recording the html length changes before and after the page is opened for 4 seconds, and uploading the data to Ele.me’s self-developed time series database. If a page is stable, then the distribution of page length changes should be in the form of a "power distribution" curve, and the data lines p10, p20 (ranked in the top 10% and 20% of the document) should be stable, in a certain interval Internal fluctuations, if the page is abnormal, the curve will definitely fall to the bottom.
other
Everything else is the same, in fact, after a round of research, I found that there are nothing more than two points.
monitoring timing: surveyed, there are three common types:
- onerror
- mutation observer api
- Rotation
DOM detection: is a lot of solutions, in addition to the above, you can also:
- elementsFromPoint api sampling
- Image Identification
- Various algorithm recognition of various data based on DOM
- ...
change direction
After several attempts, I almost didn't find what I wanted. The main reason was the accuracy - none of these solutions guarantee that what I was listening to was a white screen. The theoretical derivation alone would not make sense. They all have one thing in common: what they monitor is the phenomenon of'white screen'. Although it can be successful to derive the essence from the phenomenon, it is not accurate enough. So what I really want to monitor is the nature of the white screen.
So back to the beginning, what is a white screen? How did he cause it? Is the browser unable to render because of an error? No, the actual white screen that is prevalent in this spa framework is caused by the framework. The essence is that the framework does not know how to render due to an error, so it simply does not render. Since our team has a majority of React technology stacks, let's take a look at the paragraph React official website:
React believes that keeping a wrong UI is worse than removing it completely. We do not discuss whether this view is correct or not, at least we know the reason for the white screen: the rendering process is abnormal and we did not catch the exception and deal with it.
In contrast to the current mainstream framework: we host the DOM operation to the framework, so the exception handling methods of rendering are definitely different in different frameworks. This is probably the reason why the white screen monitoring is difficult to be unified and productized. But the general direction is definitely the same.
Then I think the white screen can be defined as follows: The rendering failure caused by the .
Then the white screen monitoring program is: monitor rendering abnormal . So for React, the answer is: Error Boundaries
Error Boundaries
We can call it the error boundary. What is the error boundary? It is actually a life cycle, used to monitor errors in the rendering process of the children of the current component, and can return a degraded UI to render:
class ErrorBoundary extends React.Component {
constructor(props) {
super(props);
this.state = { hasError: false };
}
static getDerivedStateFromError(error) {
// 更新 state 使下一次渲染能够显示降级后的 UI
return { hasError: true };
}
componentDidCatch(error, errorInfo) {
// 我们可以将错误日志上报给服务器
logErrorToMyService(error, errorInfo);
}
render() {
if (this.state.hasError) {
// 我们可以自定义降级后的 UI 并渲染
return <h1>Something went wrong.</h1>;
}
return this.props.children;
}
}
A responsible development will not let mistakes happen. The error boundary can be wrapped in any location and provide a degraded UI, that is, once the developer is'responsible', the page will not be completely white. This is also the situation that I said before that the solution 1 naturally conflicts with other solutions and other solutions are unstable. .
So, in the meantime we reported anomalies, exceptions reported here will lead us defined in black and white , this derivation is 100% correct.
The word 100% may not be responsible enough, let’s take a look at why I say this derivation is 100% accurate:
React rendering process
Let's briefly review what React does from the code to the presentation page.
I roughly divide it into several stages: render => task scheduling => task cycle => submission => display
Let's give a simple example to show the whole process (task scheduling is no longer in the scope of this discussion, so it will not be shown):
const App = ({ children }) => (
<>
<p>hello</p>
{ children }
</>
);
const Child = () => <p>I'm child</p>
const a = ReactDOM.render(
<App><Child/></App>,
document.getElementById('root')
);
Prepare
First of all, the browser does not recognize our jsx syntax, so we can probably get the following code through babel compilation:
var App = function App(_ref2) {
var children = _ref2.children;
return React.createElement("p", null, "hello"), children);
};
var Child = function Child() {
return React.createElement("p", null, "I'm child");
};
ReactDOM.render(React.createElement(App, null, React.createElement(Child, null)), document.getElementById('root'));
The babel plug-in converts all createElement
into the 0611636deb9990 method. Executing it will get a description object ReactElement
like this:
{
$$typeof: Symbol(react.element),
key: null,
props: {}, // createElement 第二个参数 注意 children 也在这里,children 也会是一个 ReactElement 或 数组
type: 'h1' // createElement 的第一个参数,可能是原生的节点字符串,也可能是一个组件对象(Function、Class...)
}
All nodes including the native <a></a>
and <p></p>
will create a FiberNode
, and its structure will look like this:
FiberNode = {
elementType: null, // 传入 createElement 的第一个参数
key: null,
type: HostRoot, // 节点类型(根节点、函数组件、类组件等等)
return: null, // 父 FiberNode
child: null, // 第一个子 FiberNode
sibling: null, // 下一个兄弟 FiberNode
flag: null, // 状态标记
}
You can think of it as Virtual Dom but with a lot of scheduling stuff. Initially, we will create a FiberNodeRoot
for the root node. If there is one and only one ReactDOM.render
then it is the only root, and there is one and only one FiberNode
tree.
I only keep some important fields in the rendering process, and there are many other fields used for scheduling and judgment. I will not release them here. I am interested in understanding by myself.
render
Now we are going to start rendering the page, which is our example just now, execute ReactDOM.render
. Here we have a global workInProgress
object marking the currently processed FiberNode
- First, we initialize a
FiberNodeRoot
for the root node, its structure is as shown above, andworkInProgress= FiberNodeRoot
. - Next we execute the first parameter of the
ReactDOM.render
ReactElement
:
ReactElement = {
$$typeof: Symbol(react.element),
key: null,
props: {
children: {
$$typeof: Symbol(react.element),
key: null,
props: {},
ref: null,
type: ƒ Child(),
}
}
ref: null,
type: f App()
}
The structure describes <App><Child /></App>
- We
ReactElement
generate aFiberNode
and the return to parentFiberNode
, the beginning is our root, andworkInProgress = FiberNode
{
elementType: f App(), // type 就是 App 函数
key: null,
type: FunctionComponent, // 函数组件类型
return: FiberNodeRoot, // 我们的根节点
child: null,
sibling: null,
flags: null
}
As long as
workInProgress
exists, we have to deal with theFiberNode
. There are many types of nodes, and the processing methods are different, but the overall process is the same. We take the current functional component as an example and directly execute theApp(props)
method. There are two cases here.- The component returns a single node, that is, returns a
ReactElement
object, repeat the steps 3-4. And point the child of the current node to the child nodeCurrentFiberNode.child = ChildFiberNode
and the return of the child node to the current nodeChildFiberNode.return = CurrentFiberNode
Fragment
multiple nodes (array or 0611636deb9c65), and we will get an array ofChildiFberNode
We loop him, and each node performs 3-4 steps. The child of the current node points to the first child nodeCurrentFiberNode.child = ChildFiberNodeList[0]
, and the sibling of each child node points to its next child node (if any)ChildFiberNode[i].sibling = ChildFiberNode[i + 1]
, and the return of each child node points to the current nodeChildFiberNode[i].return = CurrentFiberNode
- The component returns a single node, that is, returns a
If there are no exceptions, each node will be marked as pending layout FiberNode.flags = Placement
- Repeat the steps until all nodes
workInProgress
are empty.
In the end we can roughly get such a FiberNode
tree:
FiberNodeRoot = {
elementType: null,
type: HostRoot,
return: null,
child: FiberNode<App>,
sibling: null,
flags: Placement, // 待布局状态
}
FiberNode<App> {
elementType: f App(),
type: FunctionComponent,
return: FiberNodeRoot,
child: FiberNode<p>,
sibling: null,
flags: Placement // 待布局状态
}
FiberNode<p> {
elementType: 'p',
type: HostComponent,
return: FiberNode<App>,
sibling: FiberNode<Child>,
child: null,
flags: Placement // 待布局状态
}
FiberNode<Child> {
elementType: f Child(),
type: FunctionComponent,
return: FiberNode<App>,
child: null,
flags: Placement // 待布局状态
}
Commit phase
To put it simply, the submission phase is to take this tree for depth-first traversal of child => sibling, place DOM nodes and call the life cycle.
Then the entire normal rendering process is simply like this. Next look at exception handling
Error boundary process
We just learned the normal process and now we make some mistakes and catch him:
const App = ({ children }) => (
<>
<p>hello</p>
{ children }
</>
);
const Child = () => <p>I'm child {a.a}</p>
const a = ReactDOM.render(
<App>
<ErrorBoundary><Child/></ErrorBoundary>
</App>,
document.getElementById('root')
);
The body of the function that executes step 4 is wrapped in try...catch
. If an exception is caught, it will follow the exception process:
do {
try {
workLoopSync(); // 上述 步骤 4
break;
} catch (thrownValue) {
handleError(root, thrownValue);
}
} while (true);
When performing step 4, we call the Child
method. Because we added a non-existent expression {a.a}
an exception will be thrown into our handleError
process. At this time, our processing target is FiberNode<Child>
, let’s take a look at handleError
:
function handleError(root, thrownValue): void {
let erroredWork = workInProgress; // 当前处理的 FiberNode 也就是异常的 节点
throwException(
root, // 我们的根 FiberNode
erroredWork.return, // 父节点
erroredWork,
thrownValue, // 异常内容
);
completeUnitOfWork(erroredWork);
}
function throwException(
root: FiberRoot,
returnFiber: Fiber,
sourceFiber: Fiber,
value: mixed,
) {
// The source fiber did not complete.
sourceFiber.flags |= Incomplete;
let workInProgress = returnFiber;
do {
switch (workInProgress.tag) {
case HostRoot: {
workInProgress.flags |= ShouldCapture;
return;
}
case ClassComponent:
// Capture and retry
const ctor = workInProgress.type;
const instance = workInProgress.stateNode;
if (
(workInProgress.flags & DidCapture) === NoFlags &&
(typeof ctor.getDerivedStateFromError === 'function' ||
(instance !== null &&
typeof instance.componentDidCatch === 'function' &&
!isAlreadyFailedLegacyErrorBoundary(instance)))
) {
workInProgress.flags |= ShouldCapture;
return;
}
break;
default:
break;
}
workInProgress = workInProgress.return;
} while (workInProgress !== null);
}
The code is too long to intercept part of itthrowException
method first, there are two core things:
- Mark the current node status that is in the problem as incomplete
FiberNode.flags = Incomplete
- Start bubbling from the parent node, and look up for the node that is capable of handling exceptions (
ClassComponent
) and indeed handles exceptions (declaring thegetDerivedStateFromError
orcomponentDidCatch
), if there is, mark that node asworkInProgress.flags |= ShouldCapture
to be captured, if not, yes The root node.
completeUnitOfWork
method is similar. Start bubbling from the parent node and find ShouldCapture
. If there is one, mark it as captured DidCapture
. If it is not found, mark all nodes as Incomplete
until the root node, and workInProgress
to the current The captured node.
After that, start the process again from the currently captured node (or the root node may not be captured). Because of its state, react will only render its degraded UI. If there is a sibling node, it will continue to follow the process below. Let's take a look at the FiberNode
tree finally obtained in the above example:
FiberNodeRoot = {
elementType: null,
type: HostRoot,
return: null,
child: FiberNode<App>,
sibling: null,
flags: Placement, // 待布局状态
}
FiberNode<App> {
elementType: f App(),
type: FunctionComponent,
return: FiberNodeRoot,
child: FiberNode<p>,
sibling: null,
flags: Placement // 待布局状态
}
FiberNode<p> {
elementType: 'p',
type: HostComponent,
return: FiberNode<App>,
sibling: FiberNode<ErrorBoundary>,
child: null,
flags: Placement // 待布局状态
}
FiberNode<ErrorBoundary> {
elementType: f ErrorBoundary(),
type: ClassComponent,
return: FiberNode<App>,
child: null,
flags: DidCapture // 已捕获状态
}
FiberNode<h1> {
elementType: f ErrorBoundary(),
type: ClassComponent,
return: FiberNode<ErrorBoundary>,
child: null,
flags: Placement // 待布局状态
}
If there is no configuration error boundary, then there is no node under the root node, and naturally no content can be rendered.
Ok, I believe that by now everyone should be clear about the error boundary processing flow, and should be able to understand why I said before that the ErrorBoundry
is 100% correct. Of course, this 100% means that the ErrorBoundry
will basically cause a white screen, but it does not mean that it can capture all white screen exceptions. The following scenes are also not captured by him:
- Event handling
- Asynchronous code
- SSR
- Self-thrown error
React SSR is designed to use streaming, which means that while the server sends the processed elements, the rest is still generating HTML, which is the component whose parent element cannot catch the error of the child component and hide the error. In this case, it seems that all render functions can only be wrapped in try...catch
. Of course, we can use babel
or TypeScript
to help us simply implement this process. The final result is similar to ErrorBoundry
The events and asynchrony are very coincidental. Although ErrorBoundry
cannot capture the exceptions among them, the exceptions it generates do not cause a white screen (if it is a wrong setting state, it indirectly leads to a white screen, and it happens to be caught. ). This is outside the boundaries of the responsibility of white screen monitoring, and other refined monitoring capabilities are needed to handle it.
Summarize
Then finally summarize the conclusions of this article:
My definition of white screen: rendering failure caused by exception .
The corresponding solution is: resource monitoring + rendering process monitoring .
Under the current SPA framework, the white screen monitoring needs to be refined for the scene. Here, using React as an example, the white screen information can be obtained by monitoring the abnormality of the rendering process, and at the same time, it can enhance the developer's attention to exception handling. Other frameworks will also have corresponding methods to deal with this phenomenon.
Of course, this solution also has weaknesses. It is actually impossible to cover all white screen scenarios because it is derived from the essence. For example, I need to use resource monitoring to handle white screens caused by resource abnormalities. Of course, no solution is perfect. I am here to provide an idea, and everyone is welcome to discuss it together.
Author: ES2049 / Takeshi Kaneshiro
The article can be reprinted at will, but please keep this link to the original text.
You are very welcome to join ES2049 Studio if you are passionate. Please send your resume to caijun.hcj@alibaba-inc.com .
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。