After chasing and fighting, Ali asked me for 30 minutes from URL input to rendering...

When the interviewer asked this question, most of the people were secretly delighted when they heard it: I have memorized this eight-legged essay long ago.

But wait a minute, can you answer the following questions:

Why do browsers parse URLs? What character encoding is used for URL parameters? So what is the difference between encodeURI and encodeURIComponent?
What are the disk cache and memory cache of the browser cache?
What is the difference between preloading prefetch and preload?
What is the difference between async and defer for JS scripts?
Why does the TCP handshake take three times, and why does the wave take four times?
Have you understood the HTTPS handshake?

The same question can be used to recruit P5 or P7, but the depth is different. So I reorganized the whole process again, this article is long, it is recommended to collect it first.

Overview

Before entering the topic, let's briefly understand the architecture of the browser as pre-knowledge. The browser is a multi-process work. "From URL input to rendering" will mainly involve the browser process, the network process and the rendering process:

The browser process is responsible for handling and responding to user interactions, such as clicking and scrolling;
The network process is responsible for processing data requests and providing download functions;
The rendering process is responsible for processing the acquired HTML, CSS, and JS into visible and interactive pages;

The whole process of "from URL input to page rendering" can be divided into two parts: network request and browser rendering, which are handled by the network process and the rendering process respectively.

network request

The network request part does the following:

URL parsing
Check resource cache
DNS resolution
establish a TCP connection
TLS negotiation key
Send request & receive response
close the TCP connection

It will be expanded one by one next.

URL parsing

The browser first determines whether the input is a URL or a search keyword.

If it is a URL, the incomplete URL will be synthesized into a complete URL. A full URL should be: protocol+host+port+path[+parameters][+anchor] . For example, if we enter www.baidu.com in the address bar, the browser will eventually splicing it into https://www.baidu.com/ , using port 443 by default.

If it is a search keyword, it will be spliced into the parameter part of the default search engine to search. This process requires escaping the input unsafe character encoding (safe characters refer to numbers, English, and a few symbols). Because the parameters of the URL cannot be in Chinese, nor can there be some special characters, such as = ? & , otherwise when I search for 1+1=2 , if it is not escaped, the url will be /search?q=1+1=2&source=chrome , which is ambiguous with the delimiter = of the URL itself.

When URLs escape unsafe characters, the encoding used is called percent encoding, because it uses the percent sign plus two hexadecimal digits. These two hexadecimal numbers come from UTF-8 encoding, which converts each Chinese into 3 bytes. For example, if I enter "Chinese" in the google address bar, the url will become /search?q=%E4%B8%AD%E6%96%87 , a total of 6 bytes.

encodeURI and encodeURIComponent , which we often use when writing code, play this role. Their rules are basically the same, except that URIs such as = ? & ; / form symbols. These are not encoded in encodeURI , but they are all in encodeURIComponent . Because encodeURI encodes the entire URL, and encodeURIComponent encodes the parameter part, it needs to be more strictly checked.

`Check cache`

Checking the cache must be done before the real request is made, and only then the caching mechanism will take effect. If a corresponding cache resource is found, check the validity period of the cache.

The cached resources within the validity period are directly used, which is called strong cache. From the chrome network panel, you can see that this type of request directly returns 200, and the size is memory cache or disk cache . memory cache means that the resource is fetched from memory, and disk cache means that it is fetched from the disk; reading from the memory is much faster than from the disk, but whether the resource can be allocated to the memory depends on the current system state. In general, flushing the page will use the memory cache, and closing and reopening will use the disk cache.
If the validity period is exceeded, a request will be sent to the server with the cached resource identifier to check whether it can continue to be used. If the server tells us that the local storage can continue to be used, 304 will be returned and no data will be carried; if the server tells us that we need If the updated resource is used, 200 is returned, and the updated resource and resource identifier are cached locally for the next use.

`DNS resolution`

If the local cache is not successfully used, a network request needs to be made. The first thing to do is DNS resolution.

will search for:

The browser's DNS cache;
DNS cache of the operating system;
The router's DNS cache;
Query the DNS server of the service provider;
Query to 13 root name servers around the world;

To save time, you can do DNS pre-resolution in the HTML header:

<link rel="dns-prefetch" href="http://www.baidu.com" />

In order to ensure the timely response, DNS resolution uses the UDP protocol

`establish a TCP connection`

The request we send is based on the TCP protocol, so the connection must be established first. The communication that establishes the connection is to make a phone call, and both parties are online; the communication without connection is to send a text message.

This process of confirming that the receiver is online is completed through the three-way handshake of TCP.

The client sends a connection establishment request;
The server sends a connection establishment confirmation, at which time the server allocates resources for the TCP connection;
The client sends an acknowledgment to establish a connection confirmation, at which time the client allocates resources for the TCP connection;

`Why does it take three handshakes to complete the connection establishment?`

One can start by assuming what happens when a connection is established only twice. Slightly modified the state diagram above and everything looks fine.

But if the server receives an invalid connection establishment request at this time, we will find that the server's resources are wasted - the client does not want to send data to it at this time, but it has prepared memory and other resources and has been waiting .

Therefore, the three-way handshake is to ensure the survival of the client and prevent the server from wasting resources when it receives an invalid timeout request.

`Negotiate encryption key - TLS handshake`

In order to ensure the security of communication, we use the HTTPS protocol, where S refers to TLS. TLS uses an asymmetric + symmetric method for encryption.

Symmetric encryption means that both sides have the same secret key, and both sides know how to encrypt and decrypt the ciphertext. This encryption method is fast, but the problem is how to let both parties know the secret key. because The transmission of data is all over the network. If the secret key is transmitted through the network, the secret key will be intercepted and the meaning of encryption will be lost.

In asymmetric encryption, everyone has a public key and a private key. Everyone can know the public key, and only the private key is known to them. The data is encrypted with the public key, and the private key must be used for decryption. This encryption method can perfectly solve the problem of symmetric encryption, but the disadvantage is that the speed is very slow.

We use asymmetric encryption to negotiate a symmetric key, which is only known by the sender and receiver. The process is as follows:

The client sends a random value and the required protocol and encryption;
The server receives the random value from the client, sends its own digital certificate, generates a random value by itself, and uses the corresponding method according to the protocol and encryption method required by the client;
The client receives the certificate from the server and verifies whether it is valid. If the verification is passed, a random value will be generated, and the random value will be encrypted by the public key of the server certificate and sent to the server;
The server receives the encrypted random value and decrypts it with the private key to obtain the third random value. At this time, both ends have three random values. The key can be generated according to the previously agreed encryption method through these three random values. The next communication can be encrypted and decrypted by the symmetric key;

It can be seen from the above steps that in the TLS handshake stage, both ends use asymmetric encryption to communicate, but because the performance of asymmetric encryption is greater than that of symmetric encryption, both ends use symmetric encryption when formally transmitting data.

`Send request & receive response`

The default port for HTTP is 80 and the default port for HTTPS is 443.

The basic composition of the request is request line + request header + request body

POST /hello HTTP/1.1
User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
Host: www.example.com
Accept-Language: en, mi

name=niannian

The basic composition of the response is response line + response header + response body

HTTP/1.1 200 OK
Content-Type:application/json
Server:apache

{password:'123'}

`close the TCP connection`

When the data transmission is complete, the TCP connection should be closed. The active party that closes the connection can be the client or the server. Taking the client as an example, there are four handshakes in the whole process:

The client requests to release the connection, which only means that the client no longer sends data;
The server confirms the connection release, but there may still be data to process and send at this time;
When the server requests to release the connection, and the server no longer needs to send data at this time;
The client confirms the connection release;

`Why do you have to wave four times?`

TCP can transmit data in both directions, and each direction requires a request and an acknowledgment. Because the server still has data transmission after the second handshake, there is no way to combine the second confirmation with the third.

`Why does the active party wait for 2MSL`

After sending the fourth confirmation segment, the client will wait for 2MSL before closing the connection. MSL refers to the maximum lifetime of a data packet in the network. The purpose is to ensure that the server receives this confirmation segment,

Assuming that the server does not receive the message of the fourth handshake, imagine what will happen? After the client sends the data packet of the fourth handshake, the server will first wait. After 1 MSL, it finds that the maximum survival time of the data packet in the network has exceeded, but it has not received the data packet, so the server Thinking that the packet has been lost, it decides to resend the packet of the third handshake to the client. This packet will take at most one MSL to reach the client.

One to one, a total of 2MSL, so the client waits for after sending the fourth handshake packet. The fourth wave has been successfully accepted, and the connection is officially closed.

`browser rendering`

The network request part is finished above. Now that the browser has got the data, the rest needs to be done by the rendering process. Browser rendering mainly completes the following tasks:

Build the DOM tree;
style calculation;
layout positioning;
layer layering;
layer drawing;
show;

`Build the DOM tree`

The structure of HTML files cannot be understood by browsers, so first, the tags in HTML must be converted into a structure that can be used by JS.

You can try to print the document in the console, which is the parsed DOM tree.

`style calculation`

The CSS file cannot be directly understood by the browser, so the CSS is first parsed into a style sheet. All three styles are parsed:

External CSS files referenced via link
<style> styles inside tags
CSS inline with the element's style attribute

Print document.styleSheets in the console, this is the parsed style sheet.

Using this style sheet, we can calculate the style of each node in the DOM tree. It is called calculation because each element inherits the properties of its parent element.

<style>
    span {
        color: red
    }
    div {
        font-size: 30px
    }
</style>
<div>
    <span>年年</span>
</div>

For example, in the above, it is not only necessary to accept the style set by span, but also inherit the style set by div.

The nodes in the DOM tree have styles and are now called the render tree.

`Why put CSS at the head and js at the end of the body`

In the process of parsing HTML, the characteristics of the resources that need to be loaded are as follows:

CSS resources are downloaded asynchronously, and neither download nor parsing will block the construction of the dom tree <link href='./style.css' rel='stylesheet'/>
JS resource synchronous download, download and execution will block the construction of the dom tree <script src='./index.js'/>

Because of this feature, it is often recommended to put CSS style sheets at the head of the head and js files at the end of the body, so that rendering can start as soon as possible.

`Does CSS block HTML parsing`

As mentioned above, page rendering is the task of the rendering process, which is subdivided into GUI rendering threads and JS threads.

Parsing HTML to generate a DOM tree, parsing CSS to generate a style sheet, and then generating a layout tree and layer tree are all done by the GUI rendering thread. This thread can parse HTML and CSS at the same time. These two will not conflict. , so it is also advocated to introduce CSS in the head.

But when the JS thread executes, the GUI rendering thread has no way to parse the HTML, because JS can manipulate the DOM, and if the two are executed at the same time, it may cause conflicts. If JS modifies the style at this time, then the parsing of CSS and the execution of JS cannot be performed at the same time. It will wait for the parsing of CSS to complete, then execute JS, and finally parse HTML.

From this perspective, CSS has the potential to block HTML parsing.

`What is a preload scanner`

The external link resources mentioned above, whether it is synchronous loading of JS or asynchronous loading of CSS, pictures, etc., can only be started after the HTML is parsed to this tag, which does not seem to be a very good way. In fact, since 2008, browsers have gradually implemented preloading scanners: when they get an HTML document, they scan the entire document and download CSS, JS, images, and web fonts in advance.

`What is the difference between async and defer when js script is introduced`

The preload scanner solves the problem of JS synchronous loading blocking HTML parsing, but we haven't solved the problem of JS execution blocking HTML parsing. All with async and defer attributes.

Without defer or async, the browser loads and executes the specified script immediately
The async attribute indicates that the JavaScript introduced by asynchronous execution will be executed after it is loaded.
The defer attribute means to delay until the DOM parsing is completed, and then execute the imported JS

When loading multiple JS scripts, async is executed in no order, and defer is executed in order

`What is the difference between preload and prefetch`

As mentioned earlier, the preload scanner can load the resources required by the page in advance, but this function only takes effect on the external links of a specific writing method, and we have no way to give important resources a higher priority according to our own ideas. So there is preload and prefetch.

preload: Load resources for the current page with high priority;
prefetch: load the resources needed in the future for the following pages with low priority, and only load them when they are idle;

Whether it is preload or prefetch, it will only be loaded and will not be executed. If the preloaded resource is set by the server to be cached cache-control , it will enter the disk, otherwise it will only be stored in memory.

The specific use is as follows:

<head>
    <!-- 文件加载 -->
    <link rel="preload" href="main.js" as="script">
    <link rel="prefetch" href="news.js" as="script">
</head>

<body>
    <h1>hello world!</h1>
    <!-- 文件文件执行 -->
    <script src="main.js" defer></script>
</body>

In order to ensure that resources are correctly preloaded, you need to pay attention when using:

The preloaded resource should be used immediately on the current page. If the script tag is not added to execute the preloaded resource, a warning will be displayed in the console, indicating that the preloaded resource is not referenced on the current page;
The purpose of prefetch is to fetch resources that will be used in the future, so when the user jumps from page A to page B, the resources of the ongoing preload will be interrupted, but prefetch will not;
When using preload, the as attribute should be used to indicate the priority of the resource. Using the as="style" attribute will get the highest priority, as ="script" will get the low priority or medium priority, and other values that can be taken are font/image/audio/video ;
When preloading the font, add the crossorigin attribute, even if there is no cross-domain, otherwise it will be loaded repeatedly:
```
<link rel="preload href="font.woff" as="font" crossorigin>
```

In addition, these two preload resources can be set not only through HTML tags, but also through js

var res = document.createElement("link"); 
res.rel = "preload"; 
res.as = "style"; 
res.href = "css/mystyles.css"; 
document.head.appendChild(res);

and the HTTP response headers:

Link: </uploads/images/pic.png>; rel=prefetch

`layout positioning`

The above describes the HTML and CSS loading and parsing process in detail. Now the nodes in our rendering tree have styles, but we don't know where to draw them. Therefore, another layout tree is needed to determine the geometric positioning of elements.

The layout tree only takes the visible elements in the render tree, which means that the head tag, elements of display:none will not be added.

`Layer layering`

Now we have a layout tree, but we still can't start drawing directly. Before that, we need to layer and generate a corresponding layer tree. The browser page is actually divided into many layers, and these layers are superimposed to synthesize the final page.

Because there are many complex effects in the page, such as some complex 3D transformations, page scrolling, or z-axis sorting using z-index, we hope to achieve these effects more easily.

Not every node of the layout tree can generate a layer. If a node does not have its own layer, then the node is subordinate to the layer of the parent node.

Usually elements that satisfy any of the following two points can be promoted to a separate layer.

1. Elements with stacking context attributes will be promoted to a separate layer: elements with explicit positioning attribute position , elements with transparent attribute opacity , elements using CSS filter filter , etc., all have stacking context attributes.

2. The place that needs to be clipped will also be created as a layer overflow

In chrome's developer tools: More options-more tools-Layers can see the layering of layers.

`layer drawing`

After completing the construction of the layer tree, it is finally time to draw each layer. First, the layers will be disassembled into drawing instructions one by one, and arranged into a drawing list. In the Layers panel of the developer tool mentioned above, click the profiler in the detail to see the drawing list.

At this point, the main thread in the rendering process, the GUI rendering thread, has completed all its tasks, and then it is handed over to the synthesis in the rendering process.

The compositing thread then splits the viewport into tiles and converts tiles into bitmaps.

At this point, the work of the rendering process is completed, and the generated bitmap will be returned to the browser process, and finally displayed on the page.

`Performance optimization, what else can be done`

This article does not focus on performance optimization, but only supplements some common methods under this proposition.

`Pre-parse, pre-render`

In addition to using preload and prefetch to load in advance mentioned above, you can also use DNS Prefetch , Prerender , Preconnect

DNS Prefetch: DNS pre-resolution;

 <link rel="dns-prefetch" href="//fonts.googleapis.com">

preconnect: perform some operations in advance before an HTTP request is officially sent to the server, including DNS resolution, TLS negotiation, and TCP handshake;
```
<link href="https://cdn.domain.com" rel="preconnect" crossorigin>
```
Prerender: Get all the resources of the next page and render the entire page when idle;
```
<link rel="prerender" href="https://www.keycdn.com">
```
Reduce reflows and repaints

Reflow means that the browser needs to recalculate styles, layout positioning, layering and drawing, and reflow is also called reflow;

Actions that trigger reflow:

Add or remove visible DOM elements
The position of the element changes
The size of the element changes
Browser window size changes

A repaint is just a pixel repaint, triggered when the element's style has been changed without affecting the layout.

Reflow = Calculate Style + Layout + Layering + Draw; Redraw = Draw. Therefore, reflow has a greater impact on performance

So reflow and redraw should be avoided as much as possible. For example, using GPU acceleration to implement style modification, transform/opacity/filters The modification of these attributes is not completed in the main thread, it will not be redrawn, and it will not be reflowed.

`Epilogue`

After finishing the whole process of "URL input to rendering", it is not difficult to find the answers in the text when we return to the more tricky questions at the beginning:

After the browser parses the input content, it splices it into a complete URL. The parameters in it are encoded in UTF-8, that is, the encodeURI and encodeURIComponent functions that we commonly use when developing, where encodeURI encodes the complete URL, and encodeURIComponent is Encoding the URL parameter part, the requirements will be stricter;
The disk cache and memory cache cached by the browser are read from the disk and from the memory respectively. Usually, refreshing the page will read directly from the memory, while closing the tab and reopening it will read from the disk;
Preloading prefetch is to load resources used by subsequent pages with low priority during idle time; while preload is to load resources needed by the current page in advance with high priority;
The async of the script refers to asynchronous loading, which is executed immediately after the loading is completed, and the defer is asynchronous loading, which is executed after completing the HTML parsing;
The TCP handshake needs three times three times to ensure the survival of the client and prevent the waste of server resources. The four times of waving is because TCP is a duplex communication, and each direction is required to release the connection and respond once;
The HTTPS handshake is to negotiate a symmetric key. The two parties send a total of three random numbers, and use these three random numbers to calculate a key known only to both parties. The content of official communication is encrypted with this key;

If this article is helpful to you, please give me a like ~ this is very important to me