56
头图

foreword

As a front-end er, the one I deal with the most in my daily work (one of) is the familiar and unfamiliar browser. Familiarity means that I will code the business based on the application level of the browser every day. Unfamiliarity means that many people may be as different as me. Familiar with its internal operating principles, such as how js works? How is the beautifully styled page rendered to the computer screen? In the open Internet, how does it ensure the security of our personal information? With all kinds of doubts, I started to listen to Mr. Li Bing's "Basic Principles and Practice of Browsers". I have to say that everyone's work is simple and easy to understand.

Browser working principle and practice

Chrome architecture: only 1 page is opened, why are there 4 processes

The difference between threads and processes : Multi-threads can process tasks in parallel, and threads cannot exist alone. They are started and managed by processes. A process is a running instance of a program.

The relationship between threads and processes : 1. Any error in the execution of any thread in the process will lead to the collapse of the entire process. 2. Data in the process is shared between threads. 3. When a process is closed, the operating system will reclaim the memory occupied by the process. 4. The content between processes is isolated from each other.

Single-process browser : 1. Unstable. Plugins in a single process, rendering thread crashes cause the entire browser to crash. 2. Not smooth. Scripts (infinite loops) or plugins can make the browser freeze. 3. Not safe. Plugins and scripts can access arbitrary resources of the operating system.

Multi-process browser : 1. Solve instability. Processes are isolated from each other. When a page or plug-in crashes, only the current plug-in or page is affected, and other pages are not affected. 2. The solution is not smooth. The script blocks the rendering process of the current page and will not affect other pages. 3. Solve insecurity. Use a sandbox with a multi-process architecture. The sandbox is regarded as when the operating system puts a lock on the process, and the sandboxed program can run, but cannot write any data on the hard disk, and cannot read any data in sensitive locations.

Multi-process architecture : divided into browser process, rendering process, GPU process, network process, and plug-in process.

Disadvantages : 1. High resource usage. 2, the system structure is complex.

Service-oriented architecture : Refactor the original various modules into independent services, each service can run in an independent process, access services must use a defined interface, communicate through IPC, make the system more cohesive, loosely coupled, Easy to maintain and expand.

TCP protocol: how to ensure that the page file can be fully delivered to the browser]

  • The IP header is the information at the beginning of the IP data packet, including IP version, source IP address, destination IP address, time-to-live and other information;
  • In addition to the destination port, the UDP header also includes information such as the source port number;
  • IP is responsible for delivering the data packet to the destination host;
  • UDP is responsible for delivering data packets to specific applications;
  • For erroneous data packets, UDP does not provide a retransmission mechanism, but just discards the current packet, which cannot guarantee the reliability of the data, but the transmission speed is very block;
  • In addition to the target port and the local port number, the TCP header also provides a sequence number for sorting, which ensures the complete transmission of data. Its connection can be divided into three stages: connection establishment, data transmission and disconnection;

HTTP request flow: why many sites open very fast the second time

  • The HTTP request in the browser goes through the following eight stages from initiation to completion: constructing the request, looking up the cache, preparing the IP and port, waiting for the TCP queue, establishing the TCP connection, initiating the HTTP request, the server processing the request, the server returning the request and disconnecting connect;
  • Build request. The browser constructs the request line, and after the construction is completed, it is ready to initiate a network request;
  • Find the cache. Before the real magic weapon request, the browser will check whether there is a copy of the requested resource in the cache, if there is, it will intercept the request and return the copy of the resource, otherwise it will enter the network request;
  • Prepare the IP address and port. The HTTP network request needs to establish a TCP connection with the server, and the establishment of the TCP connection needs to prepare the IP address and port number. The browser needs to request the DNS to return the IP corresponding to the domain name, and the domain name resolution result will be cached for the next query.
  • Waiting for the TCP queue. Chrome mechanism, the same domain name can only establish up to 6 TCP connections at the same time;
  • Establish a TCP connection. TCP establishes connection through "three-way handshake", transmits data, and disconnects "four-way wave";
  • Send an HTTP request. After the TCP connection is established, the browser can perform HTTP data transmission with the server. First, it will send the request line to the server, and then send some other information in the form of request headers. If it is a POST request, it will also send the request body;
  • The server processes the request. First the server returns the response line, then the server sends the response headers and body to the browser. Usually, when the server returns data, the TCP connection must be closed. If the request header or response header has Connection:keep-alive TCP keep it open;

Navigation flow: what happens from entering the URL to displaying the page

  • User enters URL and hits enter
  • The browser process examines the URL, assembles the protocol, and forms the full URL
  • The browser process sends the URL request to the network process through process communication (IPC)
  • After the network process receives the URL request, it checks whether the local cache has cached the requested resource, and if so, returns the resource to the browser process
  • If not, the network process initiates an http request (network request) to the web server. The request process is as follows:

    • Perform DNS resolution to obtain server IP address, port
    • Establish tcp connection with IP address and server
    • Build request headers
    • send request headers
    • After the server responds, the network process receives the response header and response information, and parses the response content
  • Network process parsing response flow:

    • Check the status code, if it is 301/302, you need to redirect, automatically read the address from Location, and repeat step 4, if it is 200, continue to process the request
    • 200 Response processing: Check the response type Content-Type, if it is a byte stream type, submit the request to the download manager, the navigation process ends, and no subsequent rendering will be performed. If it is html, notify the browser process to prepare the rendering process for rendering
  • Prepare the rendering process

    • The browser process checks whether the current URL is the same as the root domain name of the previously opened rendering process. If it is the same, the original process is reused. If it is different, a new rendering process is started.
  • Transfer data, update status

    • After the rendering process is ready, the browser sends a "submit document" message to the rendering process, and the rendering process receives the message and the network process establishes a "pipeline" for transmitting data
    • After the rendering process receives the data, it sends a "confirm submission" to the browser
    • After the browser process receives the confirmation message, the status of the engine browser interface: security, address URL, historical status of forward and backward, update web page

Rendering Process (Part 1): How HTML, CSS, and JavaScript become pages

  • Browsers cannot directly understand HTML data and need to convert it into a DOM tree structure;
  • After the DOM tree is generated, according to the CSS style sheet, the styles of all nodes in the DOM tree are calculated;
  • Create a layout tree: traverse all visible nodes in the DOM tree, add these nodes to the layout, ignore invisible nodes, such as all content under the head tag, display: none element;

The rendering process (part 2): How HTML, CSS, and JavaScript become pages

  • Layering: Elements with stacking context attributes (such as positioning attribute elements, transparent attribute elements, CSS filter attribute elements) are promoted to a separate layer, and the places that need to be clipped (such as scroll bars) are also created as layers;
  • Layer drawing: After completing the construction of the layer tree, the rendering engine will draw each layer of the layer tree, split a layer into small drawing instructions, and then form a drawing list with the instructions in sequence;
  • In some cases, the layer is very large, and it is too expensive to draw all the layer contents at one time, and the composition thread will divide the layer into tiles (256x256 or 512x512);
  • The compositing thread submits the tiles to the raster thread for rasterization, converting the tiles to bitmaps. The rasterization process will use GPU acceleration, and the generated bitmap is stored in GPU memory;
  • Once all tiles are rasterized, the compositing thread will generate a drawing tile command (DrawQuad), which will then submit the command to the browser process. The viz component receives the command, draws the page content into memory, and displays it in the on the screen;
  • Reflow: Modifying the element's geometric position properties through JavaScript or CSS will trigger a re-layout and parse a series of subsequent sub-stages; Re-draw: Do not cause layout changes, and directly enter the drawing and subsequent sub-stages; Synthesis: Skip the layout and drawing stages , the subsequent operations performed, occur in the synthetic thread, not the main thread;

Variable hoisting: is javascript code executed sequentially?

  • JavaScript code needs to be compiled before execution. During the compilation phase, variables and functions will be stored in the variable environment, and the default value of variables will be set to undefined;
  • During the code execution phase, the JavaScript engine looks for custom variables and functions from the variable environment;
  • If during the compilation phase, two identical functions are used, the last one defined in the variable environment will be the one defined last, and the one defined later will override the one defined first;

The call stack: why JavaScript code overflows the stack

  • Every time a function is called, the JavaScript engine creates an execution context and pushes it onto the call stack, and then the JavaScript engine starts executing the function code.
  • If a function A calls another function B, the JavaScript engine creates an execution context for function B and pushes the execution context of function B onto the top of the stack.
  • After the current function is executed, the JavaScript engine will pop the function's execution context off the stack.
  • When the allocated call stack space is full, a "stack overflow" problem occurs.

Block Scope: The var pitfall and why let and const were introduced

  • Variables declared with let and const are not hoisted. After the javascript engine is compiled, it is saved in the lexical environment.
  • Block-level scope stores let and const variables in a separate area of the lexical environment when the code is executed. A small stack structure is maintained inside the lexical environment, and the variables inside the scope are pushed onto the top of the stack. After the scope is executed, it is popped from the top of the stack.

Scope chains and closures: how the same variable appears in code, how the JavaScript engine chooses

  • Using a variable, the JavaScript engine will look for the variable in the current execution context. If it is not found, it will continue to search in the execution context pointed to by outer (the reference of the execution environment to the external execution context);
  • In the JavaScript execution process, the scope chain is determined by the lexical scope, and the lexical scope is determined by the position of the function declaration in the code;
  • According to the rules of lexical scope, an inner function can always access the variables declared in its outer function. When an inner function is returned by calling an outer function, even if the outer function has finished executing, the inner function still refers to the variables of the outer function. Stored in memory, the collection of these variables is called a closure;

this: this from the perspective of JavaScript execution context

When executing new CreateObj, the JavaScript engine does four things:

  • First create a control object tempObj;
  • Then call the CreateObj.call method and use tempObj as the parameter of the call method, so that when the execution context of createObj is created, its this points to the tempObj object;
  • Then execute the CreateObj function. At this time, the this in the execution context of the CreateObj function points to the tempObj object;
  • Finally, the tempObj object is returned.

The use of this is divided into:

  • When the method of the most object is called, the this in the function is the object;
  • When the function is called normally, in strict mode, this value is undefined, in non-strict mode this points to the global object window;
  • This in a nested function will not inherit the this value of the outer function;
  • Arrow functions do not have their own execution context, this is the this of the outer function.

Stack space and heap space: how data is stored

Dynamic languages: Languages that require checking data types when using them.
Weakly typed languages: Languages that support implicit conversions.

There are 8 data types in JavaScript, they can be divided into two categories - primitive types and reference types.
Primitive type data is stored on the stack, and reference type data is stored on the heap. Data in the heap is associated with variable relationships through references.

Understanding closures from a memory perspective: lexically scans the inner function, references the outer function variable, creates a "closure" object in the heap space, and holds the variable.

Garbage Collection: How Garbage Data Is Automatically Reclaimed

  • Data recovery in the stack: The execution state pointer ESP moves in the execution stack, and when it moves through an execution context, it will be destroyed;
  • Heap data recovery: V8 engine adopts mark-sweep algorithm;
  • V8 divides the heap into two areas - the new generation and the old generation, using the secondary and primary garbage collectors respectively;
  • The secondary garbage collector is responsible for the new generation garbage collection, and small objects (1-8M) will be allocated to this area for processing;
  • The new generation is processed by the scavenge algorithm: the space of the new generation is divided into two halves, one half is free, and the other half is for storing objects. The object area is marked, and the surviving objects are copied and arranged in the free area without memory fragmentation. After completion, the object area is cleaned up, and the roles are reversed. change;
  • Objects that survive two garbage collections in the young generation area are promoted to the old generation area;
  • The main garbage collector is responsible for garbage collection in the old area, large objects, and long survival time;
  • The old generation area adopts the mark-sweep algorithm to collect garbage: starting from the root element, recursively, the reachable element is the active element, otherwise it is garbage data;
  • After the mark-cleaning algorithm, a large number of discontinuous memory fragments will be generated. The mark-cleaning algorithm moves all surviving objects to one end, and then clears the memory beyond the boundary;
  • In order to reduce the lag caused by old-generation garbage collection, V8 divides the marking process into sub-marking processes, allowing garbage collection and JavaScript execution to alternate.

Compilers and parsers: how V8 executes a piece of JavaScript code

  • Computer languages can be divided into two types: compiled and interpreted languages. Compiled languages are compiled by a compiler and retain binary files that can be read by machines, such as C/C++ and go languages. Interpreted language is to dynamically interpret and execute the program through the interpreter when the program is running, such as Python, JavaScript language.
  • The compilation process of compiled language: the compiler first performs lexical analysis and syntax analysis on the code, generates an abstract syntax tree (AST), then optimizes the code, and finally generates machine code that the processor can understand;
  • Interpreted language interpretation process: The interpreter will perform lexical analysis and syntax analysis on the code, and produce an abstract syntax tree (AST), but it will generate bytecode based on the abstract syntax tree, and finally execute the program according to the bytecode;
  • AST generation: The first stage is word segmentation (lexical analysis), which decomposes line-by-line source code into tokens (grammatically indivisible, minimum single character). The second stage is parsing (syntax analysis), which converts the token data generated in the previous step into AST according to the grammar rules. This stage will check for grammar errors;
  • The meaning of the existence of bytecode: directly convert AST into machine code, the execution efficiency is very high, but it consumes a lot of memory, so it is first converted into bytecode to solve the memory problem;
  • The interpreter ignition interprets and executes the bytecode, and at the same time, it collects the mobile phone code information, and finds that a certain part of the code is a hot spot code (HotSpot). The compiler converts the hot spot bytecode into machine code, and saves it for next use;
  • The implementation of bytecode counts with interpreters and compilers is called just-in-time compilation (JIT).

Message queues and event loops: how pages come to life

  • Each rendering process has a main thread, which handles the DOM, calculates styles, handles layout, JavaScript tasks, and various input events;
  • Maintain a message queue, new tasks (such as IO threads) are added to the tail of the message queue, and the main thread cyclically reads tasks from the head of the message queue and executes tasks;
  • Solve tasks with high processing priority: The tasks in the message queue are called macro tasks, and each macro task will contain a micro task queue. During the execution of the macro task, if there is a change in the DOM, add the change to the micro task. in the task queue;
  • Solve the long-term execution of a single task: JavaScript avoids it through the callback function.

webapi: how setTimeout is implemented

  • When JavaScript calls setTimeout to set the callback function, the rendering process will create a callback task and delay the execution queue to store the timer task;
  • When the timer task expires, it will be taken out from the delay queue and executed;
  • If the current task execution time is too long, it will affect the execution of the delay expiration timer task;
  • If setTimeout has nested calls (more than 5 times), it is judged that the function method is blocked, then the system will set the shortest time interval to 4 seconds;
  • For inactive pages, the minimum interval between setTimeout execution is 1000 milliseconds, in order to reduce loading loss;
  • The maximum delay execution time is 24.8 days, because the delay value is stored in 32 bits;
  • This in the callback function set by setTimeout points to the global window.

webpai: How is XMLHttpRequest implemented?

  • XMLHttpRequest onreadystatechange processing flow: uninitialized -> OPENED -> HEADERS_RECEIVED -> LOADING -> DONE;
  • The rendering process will send the request to the network process, and then the network process is responsible for the resource download. After the network process receives the data, it uses IPC to notify the rendering process;
  • After the rendering process receives the message, it will encapsulate the xhr callback function into a task and add it to the message queue. When the main thread loop system executes the task, it will call the callback function according to the relevant status.

Macrotasks and microtasks: not all tasks are a treat

  • The tasks in the message queue are macro tasks. The rendering process will maintain multiple message queues, such as delayed execution queues and ordinary message queues. The main thread uses a for loop to continuously take tasks from these task queues and execute them;
  • A microtask is a function that needs to be executed asynchronously. The execution timing is after the execution of the main function and before the end of the current macro task;
  • When V8 executes a javascript script, it will create a global execution context and a microtask queue;
  • Micro-tasks generated during the execution of micro-tasks will not be postponed to the next macro-task, but will continue to be executed in the current macro-task;

Say goodbye to callbacks with Promises

  • Use Promises to solve the callback hell problem, eliminate nesting and multiple processing;
  • Mock the Promise
 function Bromise(executor) {
  var _onResolve = null
  this.then = function (onResolve) {
    _onResolve = onResolve
  }
  function resolve(value) {
    setTimeout(() => {
      _onResolve(value)
    }, 0)
  }
  executor(resolve, null)
}

async await Write asynchronous code in a synchronous way

  • The generator function is an asterisked function, and can suspend and resume execution;
  • A piece of code is executed inside the generator function. When the yield keyword is encountered, the javascript engine returns the content behind the keyword to the outside, and suspends the execution of the function;
  • The external function can synchronize the execution of the next method to resume the function;
  • A coroutine is a more lightweight existence than a thread. A coroutine can be regarded as a task running on a thread. A thread can have multiple coroutines, but only one coroutine can be executed at the same time. If the A coroutine starts B coroutine, A is B's parent coroutine;
  • The coroutine is not managed by the operation cooperation kernel, but is completely controlled by the program, which improves performance;
  • await xxx will create a Promise object and submit the xxx task to the microtask queue;
  • Pause the execution of the current coroutine, transfer the control power of the main thread to the parent coroutine for execution, and return the Promise object to the parent coroutine to continue executing the parent coroutine;
  • Before the execution of the parent coroutine ends, the micro-task queue will be checked. There is resolve(xxx) in the micro-task queue waiting to be executed, triggering the then callback function;
  • After the callback function is activated, it will give control of the main thread to the coroutine, continue to execute subsequent statements, and return control to the parent coroutine after completion.

Page performance analysis: use chrome to do web performance analysis

  • Chrome Developer Tools (DevTools for short) is a set of web page authoring and debugging tools, embedded in the Google Chrome browser. It contains a total of 10 function panels, including Elements, Console, Sources, NetWork, Performance, Memory, Application, Security, Audits and Layers.

DOM tree: How JavaScript affects DOM tree construction

  • The HTML parser (HTMLParse) is responsible for converting the HTML byte stream into a DOM structure;
  • The HTML parser does not wait for the entire document to be loaded before parsing, but parses as much data as the network process loads;
  • There are three stages of converting byte stream to DOM: 1. Convert byte stream to Token; 2. Maintain a Token stack, push StartTag Token into the stack, and EndTag Token pop out of the stack; 3. Create a DOM node for each Token ;
  • Both JavaScript files and CSS stylesheet files block DOM parsing;

Rendering pipeline: How does CSS affect the white screen time on first load?

  • After the DOM construction is completed, the css file has not been downloaded, and the rendering pipeline is idle, because the next step is to synthesize the layout tree, which requires CSSOM and DOM. Here, you need to wait for the CSS to load and parse it into CSSOM;
  • CSSOM has two functions: providing JavaScript with the ability to manipulate style sheets, and providing basic style information for the composition of the layout tree;
  • Before executing the JavaScript script, if the page contains references to external CSS files, or the CSS content is built-in through the style tag, the rendering engine also needs to convert these content into CSSOM, because JavaScript has the ability to modify CSSOM, so when executing JavaScript Before, you also needed to rely on CSSOM. That is to say, CSS will also block DOM generation in some cases.

Layering and Composition: Why CSS animations are more efficient than JavaScript

  • The fixed refresh rate of the display is 60HZ, that is, 60 pictures are updated per second, and the pictures come from the front buffer of the graphics card;
  • The responsibility of the graphics card is to synthesize a new image, save it in the back buffer, and then swap the back buffer with the front buffer. The update frequency of the graphics card is inconsistent with the refresh frequency before the display, which will cause visual freezes;
  • Each picture generated by the rendering pipeline is called a frame, and there are three ways to generate a frame: rearrangement, redraw and synthesis;
  • Reflow will calculate the layout tree according to CSSOM and DOM, there is no re-layout phase for redrawing;
  • After the layout tree is generated, the rendering engine converts it into a layer tree according to the characteristics of the layout tree, and each layer parses the drawing list;
  • The grid thread generates pictures according to the instructions in the drawing list, each layer corresponds to a picture, and the synthesis thread combines these pictures into one picture and sends it to the back buffer area;
  • The compositing thread will divide each layer into fixed-sized tiles, and the tiles close to the viewport will be drawn first;

Page Performance: How to Systemically Optimize Your Pages

  • Loading stage: reduce the number of key resources, reduce the size of key resources, and reduce the number of RTTs of key resources;
  • Interaction phase: reduce the execution time of JavaScript scripts, avoid forced synchronous layout: it will trigger when the layout style is obtained while operating the DOM, avoid layout jitter: perform forced layout and jitter multiple times, rationally use CSS to synthesize animation: mark will-change, avoid frequent garbage collection;
  • CSS implements some special effects such as deformation, gradient, animation, etc., which are triggered by CSS and executed in the compositing thread. This process is called compositing, and it does not trigger reflow or redraw;

Virtual DOM: How virtual DOM differs from real DOM

  • When there is data update, React will generate a new virtual DOM, and then compare the new virtual DOM with the previous virtual DOM. This process finds the changed nodes, and then applies the changed nodes to the DOM;
  • In the beginning, the process of comparing two DOMs was performed in a recursive function whose core algorithm was reconciliation. Usually, this comparison process is executed quickly, but when the virtual DOM is complex, the execution of the comparison function may take a long time on the main thread, which will cause other tasks to wait and cause the page to freeze. The React team rewrote the reconciliation algorithm, called Fiber reconciler, and the old algorithm was called Stack reconciler;

PWA: What Problems Are Solving for Web Applications

  • PWA (Progressive Web App), progressive web application. A gradual transition plan, allowing ordinary sites to transition to web applications, reducing site renovation costs, and gradually supporting new technologies instead of one-step implementation;
  • PWA introduces ServiceWorker to try to solve the problem of offline storage and message push, and introduces mainfest.json to solve the first-level entry problem;
  • After the ServiceWorker module is secretly transferred, when the WebApp requests resources, it will first pass the ServiceWorker to let it judge whether to return the resources cached by the Serviceworker or re-request the resources from the network, and hand over all control to the ServiceWorker for processing;
  • In the current Chrome architecture, Service Worker runs in the browser process. Because the browser process has the longest life cycle, it can provide services to all pages during the browser life cycle;

WebComponent: Build web applications like building blocks

  • The global properties of CSS will hinder componentization, and DOM is also a factor that hinders componentization, because there is only one DOM in the page, and the DOM can be directly read and modified anywhere;
  • WebComponent provides the ability to encapsulate local attempts, allowing DOM, CSSOM and JavaScript to run in a local environment;
  • template creates a template, finds the content of the template, creates a shadow DOM, and adds the template to the shadow DOM;
  • Shadow DOM can isolate global CSS and DOM, but JavaScript will not be isolated;

HTTP1: HTTP1 performance optimization

  • HTTP/0.9 is based on the TCP protocol, establishes a connection through three-way handshake, and sends a GET request line (without request header and request body). After the server receives the request, it reads the corresponding HTML file, the data is returned in ASCII character stream, and the connection is disconnected after the transmission is completed;
  • HTTP/1.0 adds request headers and response headers for negotiation. When initiating a request, the request header tells the server what type of question, what form of compression, what language, and file encoding it expects to return. Do you introduce state, Cache mechanism, etc.;
  • HTTP/1.1 improves persistent connections, solves a lot of overhead caused by establishing TCP connections, transmitting data and disconnecting connections, and supports multiple HTTP requests to be transmitted on one TCP connection. Currently, browsers allow the establishment of 6 TCPs for a domain name at the same time persistent connection;
  • HTTP/1.1 introduces Chunk transfer to support dynamic content generation: the server divides the data into several data blocks of any size, and each data block is sent with the length of the previous data block, and finally uses a zero-length block as the completion of sending the data. logo. In HTTP/1.1, you need to set the full data size in the response header, such as Content-Length.

HTTP2: How to increase network speed

  • The main problems of HTTP/1.1: TCP slow start; multiple TCP connections are opened at the same time, which will compete for fixed bandwidth; head blocking problem;
  • HTTP/2 uses only one long TCP connection under one domain name and eliminates the problem of head blocking;
  • Implementation of multiplexing: HTTP/2 adds a binary framing layer, which processes the sent or response data into binary frames and converts them into frames with request ID numbers. After the server or browser receives the response frame, Combined into a complete message according to the same ID frame;
  • Set request priority: The request priority can be set when sending a request, and the server can process it first;
  • Server push: requesting an HTML page, the server can know which JavaScript and CSS files are referenced, and send them to the browser together;
  • Header compression: compress request headers and response headers;

HTTP3: Get rid of the burden of TCP and TCL and build an efficient network

  • Although HTTP/2 solves the problem of header blocking at the application level, like HTTP/1.1, HTTP/2 is still based on the TCP protocol, and TCP was originally designed for a single connection;
  • TCP can be regarded as a virtual pipe between computers. Data sent from one end to the other will be split into data packets arranged in sequence. If a piece of data is lost due to network failure or other reasons during transmission , then the entire connection will be in a suspended state, and only wait until the data is retransmitted;
  • Since the TCP protocol is rigid and it is impossible to use a new protocol, HTTP/3 chooses a compromise method, based on the existing UDP protocol, to achieve functions such as multiplexing of TC slices and reliable transmission, which is called the QULC protocol;
  • QULC realizes similar TCP flow control and reliable transmission; integrates TLS encryption function; realizes multiplexing function;

Same Origin Policy: Why XMLHttpRequest Can't Cross-Origin Requests

  • URLs with the same protocol, domain name and port number are of the same origin;
  • The same-origin policy isolates DOM, page data, and network communications from different origins;
  • The page can reference third-party resources, but it exposes problems such as XSS and introduces content security policy CSP restrictions;
  • By default, XMLHttpRequest and Fetch cannot request resources across sites, and cross-domain resource sharing (CORS) is introduced for cross-domain access control;

Cross-site scripting attack XSS: why the httpOnly attribute in the cookie

  • XSS cross-site scripting, inject malicious code into HTML files to attack users;
  • XSS attacks mainly include stored XSS attacks, reflected XSS attacks and DOM XSS attacks;
  • Prevent XSS attacks: The server filters or transcodes scripts, uses CSP policies, and uses HttpOnly;

CSRF attack: Do not click on unfamiliar connections

  • CSRF cross-site request forgery, using the user's login status to attack through third-party sites;
  • Avoid CSRF attacks: Use SameSite (three modes: Strict, Lax, None) to allow browsers to prohibit third-party sites from launching requests to carry key cookies; verify the source site of the request, the Referer and Origin attributes in the request header; use CSRF Token;

Sandbox: the wall between the page and the system

  • The browser is divided into two core modules: the browser kernel and the rendering kernel, among which the browser kernel oil network process, the browser main process and the GPU process are composed, and the rendering kernel is the rendering process;
  • The security sandbox in the browser uses the security technology provided by the operating system to prevent the rendering process from accessing or modifying data in the operating system during execution. When the rendering process needs to access system resources, it needs to be implemented through the browser kernel. , and then forward the result of the visit to the rendering process through IPC;
  • Site isolation (Site Isolation) puts the interrelated pages in the same site (including the address of the same root domain name and the same protocol) into the same rendering process for execution;
  • By implementing site isolation, the malicious iframe can be isolated inside the malicious process, so that it cannot continue to access the content of other iframe processes, so it cannot attack other sites;

HTTPS: Make data transmission more secure

  • Insert a security layer between TCP and HTTP, and all data passing through the security layer will be encrypted or decrypted;
  • Symmetric encryption: The browser sends a list of cipher suites and a random number client-random, the server selects an cipher suite from the cipher suite, and then generates a random number service-random, which is returned to the browser. In this way, the browser and the server have the same client-random and service-random, and then use the same method to mix the two to generate a key master secret, and both parties can encrypt data transmission;
  • Disadvantages of symmetric encryption: the process of client-random and service-random are both plaintext, hackers can get the negotiated encryption suite and random numbers of both parties, generate keys, and the data can be cracked;
  • Asymmetric encryption: the browser sends a list of cipher suites to the server, the server selects an cipher suite, and returns the cipher suite and public key, the browser encrypts data with the public key, and the server decrypts with the private key;
  • Disadvantages of asymmetric encryption: The encryption efficiency is too low to ensure the security of the data sent by the server to the browser, and hackers can obtain the public key;
  • Symmetric encryption combined with asymmetric encryption: the browser sends the symmetric encryption suite list, asymmetric encryption list and random number client-random to the server, the server generates the random number service-random, selects the encryption suite and public key and returns it to the browser, the browser uses client-random and service-random calculate the pre-master, then use the public key to encrypt the pre-master, send the encrypted data to the server, the server decrypts the pre-master data with the private key, and combines client-random and service-random Generate a symmetric key and use the symmetric key to transmit encrypted data;
  • The introduction of digital certificates is to prove "I am who I am" and prevent DNS hijacking and forging servers;
  • The role of the certificate: one is to prove the identity of the server to the browser, and the other is to contain the server's public key;
  • Digital signature process: CA uses Hash function technology to plaintext information to obtain information digest, and then CA uses private key to encrypt the information digest, and the encrypted secret text is a digital signature;
  • Verify digital signature: read the plaintext information of the certificate, use the same Hash function to calculate the information digest A, and then use the CA's public key to decrypt to obtain B, compare A and B, if they are consistent, confirm that the certificate is valid;

original text

Finally, some great gods on the Internet have sorted out the online original text of each chapter. It really took a lot of thought. It is also sorted into git here. Interested children's shoes can check it out by themselves. End~

Browsers from a macro perspective

JavaScript execution mechanism in the browser

How V8 Works

The page looping system in the browser

page in browser

web in browser


wuwhs
6k 声望2.5k 粉丝

Code for work, write for progress!