Brief illustration: input url to the page that appears, what does the browser do?

There should be many front-end developers who have thought about this question: What happened from the input of the URL to the completion of the page loading?

This question involves a very wide range, and each point involved is very deep. How to get from the touch screen/keyboard to the CPU? How does the CPU get to the system kernel? How to go from the operating system GUI to the browser? How does the browser send data to the network card? How to send data from the local network card to the server? What to do after the server receives the data? What happens to the browser after the server returns the data? How does the browser display the page? Wait and so on, each process contains a large and in-depth knowledge system, which is difficult to be connected.

But as a front-end developer, the browser is one of our main tools, and how the browser displays the page is what we pay more attention to. Therefore, this article briefly describes this process from some basic processes.

From the above figure, it can be found that although the Javascript used is a single-threaded language, the browser itself is multi-process.

But this is not a one-to-one state, but a gradual development of the browser from the early single-process structure. Modern browser processes are divided into browser process, renderer process, network process, GPU process, cache process, plug-in process, etc. according to the different functions they are responsible for. In order to better understand the rendering process of the browser page, we take the most mainstream Chrome as an example to briefly explain the general functions of each process:

Browser process: Responsible for controlling interface display, user interaction, sub-process management and other functions.
Renderer process: Responsible for converting HTML\CSS\JS into web pages that users can interact with. Rendering engines such as webkit, blink and JS engine V8 are all in the process.
GPU process: The GPU process was originally intended to achieve 3D CSS effects, but then the page and Chrome's UI were all drawn by GPU. The GPU became an important requirement, so the GPU process was added.
Network process: Responsible for loading the network resources of the page.
Plug-in process: responsible for the operation of the plug-in. Since the plug-in may crash, the plug-in process needs to be isolated from other processes. Note that plugins are not our usual browser extensions, plugins and extensions are different.
Caching process: Responsible for handling page resource caching and cleaning.

What we need to focus on this time is the renderer process.

Back to the problem, when we enter the address in the browser address bar, the UI thread of the browser process will capture the input. If the URL is accessed, the UI thread will start a network thread to construct the request (here we don’t consider caching for the time being) , Caching is another story), it requests DNS for domain name resolution and then connects to the server to obtain data. If we enter a keyword, the browser will use the default search engine to search. After obtaining the data and passing the security check, the network thread will notify the UI thread that the data is ready, and then the UI thread creates a renderer process to render the page, and passes the data to the renderer process through the IPC pipeline.

So far, our protagonist renderer process stage!

Parse HTML

The renderer process receives a piece of HTML, which needs to be parsed into a DOM data structure. Because the direct HTML byte stream cannot be understood by the rendering engine, it must be transformed into an understandable internal structure. This internal structure is the DOM, which provides a structured representation of HTML documents. In the rendering engine, the DOM has three functions:

From the perspective of the page: DOM is the basic data structure of the generated page.
From the js perspective: DOM provides an interface for js operations. Through this set of interfaces, js can access the DOM interface, so that developers have the ability to change the structure, style, and content of the document.
From a security perspective: DOM is the parsed internal data structure of HTML. It links web pages with js and filters out some unsafe content.

The HTML Parser is used inside the renderer process to parse HTML into a DOM structure. It should be noted that the HTML parser will not wait for the entire HTML document to be loaded before parsing. Instead, it will parse as much HTML as it loads.

So how is the HTML byte stream converted into the DOM?

In fact, it is similar to V8 parsing js, it also does lexical analysis, and successfully converts the byte stream into tokens, including Tag tokens and text tokens, through the tokenizer. The HTML parser maintains a token stack structure. The tokens will be pushed and popped in the corresponding order, then the tokens will be parsed into DOM nodes, and the DOM nodes will be added to the DOM tree.

As mentioned earlier, generating DOM can filter some unsafe content. This is mainly implemented by a security check module called XSSAuditor in the rendering engine. It monitors lexical security, and after the tokenizer parses out the token, it checks whether these modules reference external scripts, whether they conform to CSP specifications, whether there are cross-site requests, etc. If there is content that does not meet the specifications. XSSAuditor will intercept the script or download task.

During the construction of the DOM tree, the document object is created, and then the DOM tree with the document as the root node is continuously modified to add new elements to it.

Parse CSS

We have already parsed HTML into a DOM tree, but having a DOM tree is not enough to let us know what the page looks like. Because we will definitely set some styles for the page. Therefore, the main process will also parse the CSS in the page to determine the computed style of each DOM node.

There are three main sources of CSS styles:

External CSS file referenced by link
Use the CSS inside the <style> tag
CSS embedded in the style attribute of the element

Similarly, browsers cannot directly understand the CSS styles of these plain texts. Therefore, when the rendering engine receives CSS text, it will perform parsing and conversion operations through the CSS parser. The parsing process is partly similar to HTML. Finally, the CSS text is converted into a structure styleSheets that the browser can understand. This structure has the ability to query and modify, and provide a basis for subsequent style operations.

Then standardize the attribute values in styleSheet. For example, when we write styles, we often use font-size: 1em, color: bule, font-weight: bold, etc. to convert them into standard calculated values.

Finally, according to the cascading style inheritance rules and cascading rules, the calculated style of each DOM node is saved in the ComputedStyle structure.

Render Tree VS Layout Tree

So far, we have completed the first two steps in the main thread of the renderer process. We already have the node and know the style of the node, can we start rendering?

No, the progress bar tells us that things are far from simple.

But before proceeding to the next step, we still need to clarify some concepts. Among them, Layout Tree is often heard, but what is Render Tree? Is it the same as Layout Tree?

Layout Tree is not equal to Render Tree.

As you can see this developer document [161273bfd49bcf https://developers.google.com/web/fundamentals/performance/critical-rendering-path/render-tree-construction?hl=zh-cn], Render tree is the product of combining dom and cssom. That is, the main thread parses the CSS and adds the calculated style to the dom node, and then a render tree is obtained.

The main thread parses CSS and determines the computed style for each DOM node. This is information about what kind of style is applied to each element based on CSS selectors.

———《Inside look at modern web browser(part 3)》

As shown in the figure, we only know whether the nodes are visible and their visible style, but we still don't know the exact location and size of the nodes. That is, it needs to be laid out.

The main thread traverses from the root node of the render tree, and after processing according to certain rules, a box model will be obtained. It will accurately capture the exact position and size of each element in the viewport, and all relative measurements will be converted to absolute elements on the screen. After knowing which nodes are visible and calculating the style and geometry information, the rendering engine can convert each node on the render tree into pixels on the screen. This step is called drawing or rasterization

In other words, Layout Tree is the result of Render Tree's layout calculation. On the basis of Render Tree, the geometric information of nodes is added.

The main thread going over DOM tree with computed styles and producing layout tree ———《Inside look at modern web browser(part 3)》

Layer tree

It’s great, we’re done one more step. Now that we have the nodes and the exact positions and styles of the nodes, can we render it?

Sorry, it still doesn't work.

Here we must first understand a concept, rasterization or rasterization (Restering). Simply put, rasterization is to convert these node information into pixels on the screen.

So what does rasterization have to do with our rendering? Because the browser uses this technology to draw elements on the screen.

Chrome used to rasterize elements in the visible area. As the user scrolls the page, it constantly adjusts the rasterized area, continues to rasterize and fills the content to the missing part of the effective way. The problem is that when the user scrolls the page quickly, there will be a feeling of stuttering.

The current chrome rasterization uses a composite (composting) technology, which divides some parts of the page into some layers, rasterizes them separately, and then composites them in the rasterization thread. In this way, when the page is scrolled, the raw materials are already available (the layers that have been rasterized), and you only need to synthesize the layers in the viewport into a new frame.

So what does this have to do with Layer Tree?

As mentioned earlier, Chrome currently uses the technology of combining multiple layers into one frame. The role of Layer Tree is to layer.

In order to find which layers those elements should be in, the main thread traverses the layout tree to create a layer tree (called'update layer tree' in Chrome devtools)

The rendering engine does not create a layer for each node. If a node does not have a layer, it belongs to the layer of the parent node. To create a new layer, the node needs to meet certain conditions.

with stacked context attributes will be promoted to a new layer

The page is a two-dimensional plane, but the stacking context will give HTML elements a three-dimensional concept. These elements are distributed on the Z axis perpendicular to this two-dimensional plane according to the priority of their own attributes.

Elements with clearly positioned properties, elements with transparent properties, elements with CSS filters, etc., all have cascading context properties. refer to MDN[ 161273bfd49f27 https://developer.mozilla.org/zh-CN/docs/Web/CSS/CSS_Positioning/Understanding_z_index/The_stacking_context].

needs to be cut will also be created as a layer

When we 100, the area displayed by the text will definitely exceed 100 100. At this time, clipping occurs, and the rendering engine will crop a part of the text content for display in the div area. When this kind of cropping occurs, the rendering engine will create a separate layer for the text part. If there is a scroll bar, the scroll bar will also be promoted to a separate layer. As long as any one of the above 2 conditions is met, it will be promoted to a single layer.

Paint

After going through the above steps, we finally got to the drawing step.

Painting is actually a big process, including the process of generating Paint Records, dividing the compositor into tiles, rasterizing the grid thread (using the GPU to generate a bitmap), and submitting the compositor frame.

Through layering, we know the hierarchical relationship of some special elements. However, we still don't know the hierarchical relationship of the elements in the same layer, who should cover whom. The main thread creates Paint Records for each layer based on the previous Layer Tree, and decides who draws first and who draws later. The later paintings must cover the previous ones, which determines the level of elements in the same layer. The drawing record table is also understood as a form similar to a singly linked list, and the drawing order can be obtained by traversing the linked list.

In the process of viewing the documents, we will find that different documents have different opinions on whether to generate the Layer Tree first or get the Paint Records first.

What I understand should be layering first, and then creating Paint Records for each layer. If you traverse the entire Layout tree to get the drawing records and then layer them, there will be a lot of extra work, such as picking out some drawing steps of the drawing records and binding some layers together. And from the profermance in Chrome devtool, you can see that the layer tree is created first, and then the paint is started.

With layer and drawing record table , submit the information to the synthesizer thread for synthesis. Since a layer may be very, very large, exceeding the area of the viewport, it is not necessary to draw such a large layer all at once. Therefore, it is also necessary to divide the layer into tiles, Tile , and draw these tiles first. The tile size is usually 256 256 or 512 512, and then the tile information is passed to the rasterized thread pool .

The rasterization thread pool is filled with rasterization threads. These threads execute the raster task Raster Task to generate bitmaps from the tiles, and to generate bitmaps near the viewport first. Usually the rasterization process is accelerated by GPU, so it is also called fast rasterization and GPU rasterization.

When all tiles are rasterized, the synthesizer thread collects tile information for Draw Quads. Draw Quads records the location of the tiles in the memory and where the tiles are drawn on the page.

Now everything is ready, the Draw Quads information is synthesized into the Compositer Frame in the main thread and sent to the browser process through the IPC pipeline. The browser process then sends the compositor frame to the GPU.

The GPU performs rendering, and the page appears! ! !

That's it! ! ! But this is not the end, we have to consider rearrangement and redrawing.

Reordering and redrawing from the perspective of threads

As a front-end, I often hear that rearrangement is more expensive than redrawing, so how do we understand it from the perspective of threads?

Rearrangement (reflux)

If you modify the geometric position properties of the element through js or CSS, such as width, height, etc., the browser will trigger a re-layout. That is to regenerate the layout tree and all subsequent processes, and go through them all again. This overhead is relatively large.

redraw

If you just change the background color of the element, you don't need to modify the layout tree and layer tree, and you don't need to modify the drawing and subsequent processes. Since the layout and layering stages are omitted, the overhead will be smaller and the efficiency will be higher.

direct synthesis

If you change an attribute that does not require layout or drawing, the rendering engine will skip the layout and drawing phases and only perform subsequent compositing operations. We call this process compositing.

js execution, rearrangement, and redrawing are all running on the main thread, which may cause page freezes due to a large number of calculations. In addition to the main thread, there are also synthesizer threads and grid threads. If you can directly synthesize without using the main thread, you can make the page smoother.

css 3 transform is such a property, it can achieve animation effects to avoid rearrangement and redrawing, and directly perform animation synthesis operations on the non-main thread. Because it does not occupy the main thread, and there is no layout and drawing stage, so the efficiency is the highest.

In addition, in addition to using the transform attribute, you can also use the requestAnimationFrame method. The callback passed in by requestAnimationFrame will be called before the redrawing of the next frame, so as to increase the animation frame rate as much as possible. You can refer to this document [ https://zhuanlan.zhihu.com/p/64917985].

Extended reading

Evolution of the browser

According to the current development situation, the overall architecture of Chrome in the future will develop in the direction of the "service-oriented architecture" adopted by modern operating systems to achieve the goals of simplicity, stability, high speed, and security.

Existing modules will be reconstructed into independent services (Service), for example, UI, database, file, device, network and other modules will be reconstructed into basic services similar to the underlying operating system and run in separate processes. At the same time, through the use of defined interfaces and IPC for communication and access, the system is more cohesive, loosely coupled, easy to maintain and expand.

At the same time, Chrome also provides a flexible and flexible architecture to run basic services in a multi-process manner on powerful performance devices. On resource-constrained devices (as shown in the figure below), many services will be integrated into one process to save memory usage. . Google Developer Documentation [ https://developers.google.com/web/updates/2018/09/inside-browser-part1#at_the_core_of_the_computer_are_the_cpu_and_gpu]