39
头图

As a front-end programmer, the first thing to do at work every day is to turn on the computer, involuntarily click on the chrome browser, or touch the fish for a while or immediately enter the working state. Next, the browser window will accompany you through the day. Normally until seven or eight o'clock, it will be ninety o'clock if it is late, and then it will accompany you through the day and keep an eye on your work. As a loyal partner who accompanies you, you ask yourself, have you seriously understood how it works? Have you walked into its inner world?

If you have been curious, then please watch this issue of "Going into the Heart of Chrome and Understanding How the V8 Engine Works".

What is V8

Before getting into a deep understanding of a thing, we must first know what it is.

V8 is a Google open using C++ write performance JavaScript and WebAssembly engine applications Chrome and Node.js like. It implements ECMAScript and WebAssembly , running on Windows 7 and above, macOS 10.12+ and Linux systems that x64、IA-32、ARM or MIPS processors. V8 can run independently, or it can be embedded in any C++ application.

Origin of V8

Next, we will care about how it was born and why it is called this name.

V8 originally developed by Lars Bak team development to the car V8 engine (eight cylinder V-engine) named, indicates that this will be a very high-performance JavaScript engine, in September 2008 with No. 2 chrome is released together with open source.

Why do we need V8

JavaScript code we wrote is ultimately to be executed in the machine, but the machine cannot directly recognize these high-level languages. It takes a series of processing to convert the high-level language into instructions that can be recognized by the machine, that is, binary code, and hand it over to the machine for execution. The intermediate conversion process is the specific work of V8

Next, let's take a closer look.

V8 composition

First look at the internal composition of V8 V8 , among which the most important 4 are as follows:

  • Parser : Parser, responsible for parsing the source code into AST
  • Ignition : Interpreter, responsible for converting AST into bytecode and executing it, and marking hot codes at the same time
  • TurboFan : Compiler, responsible for compiling hot code into machine code and executing
  • Orinoco : Garbage collector, responsible for reclaiming memory space

V8 workflow

The following is the specific work flow chart of several important modules in V8 We analyze them one by one.

V8工作流程.png

Parser

The Parser is responsible for converting the source code into the abstract syntax tree AST . There are two important stages in the conversion process: Lexical Analysis and Syntax Analysis.

lexical analysis

Also called word segmentation, it is the process of converting a code in the form of a string into a sequence of tokens. Here, token is a string, which is the smallest unit that constitutes the source code, similar to English words. Lexical analysis can also be understood as the process of combining English letters into words. The lexical analysis process does not care about the relationship between words. For example, the brackets can be marked as token in the lexical analysis process, but the matching of the brackets is not checked.

JavaScript in token mainly includes the following:

Keywords: var, let, const, etc.

Identifier: consecutive characters not enclosed in quotation marks, which may be a variable, keywords such as if and else, or built-in constants such as true and false

Operators: +, -, *, / etc.

Numbers: like hexadecimal, decimal, octal and scientific expressions, etc.

String: the value of a variable, etc.

Spaces: consecutive spaces, line breaks, indentation, etc.

Comment: Line comment or block comment is a minimum grammatical unit that cannot be split

Punctuation: braces, parentheses, semicolons, colons, etc.

The following is const a = 'hello world' generated after esprima lexical analysis of tokens .

[
    {
        "type": "Keyword",
        "value": "const"
    },
    {
        "type": "Identifier",
        "value": "a"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "String",
        "value": "'hello world'"
    }
]
Parsing

Grammatical distraction is the process of AST generated by lexical analysis into token according to a given formal grammar. That is, the process of combining words into sentences. During the conversion process, the grammar will be verified. If the grammar is wrong, a grammatical error will be thrown.

Above const a = 'hello world' After parsing generated AST follows:

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}

After Parser parser generated AST will be referred Ignition interpreter for processing.

Ignition interpreter

The Ignition interpreter is responsible for converting AST into bytecode and executing it. Bytecode is AST and machine code. It has nothing to do with a specific type of machine code. It needs to be converted into machine code by an interpreter before it can be executed.

Seeing this, everyone must have doubts. Since bytecode also needs to be converted into machine code to run, why not directly convert AST into machine code and run it directly? Converting to machine code is definitely faster to run directly, so why add an intermediate process?

In fact, V8 of 5.9 previous versions is no bytecode, but directly to the JS code is compiled into machine code and machine code stored in memory, so it takes up a lot of memory, and the early phone memory is not high, Excessive occupancy will cause the performance of the mobile phone to be greatly reduced; and direct compilation into machine code leads to long compilation time and slow startup speed; furthermore, direct conversion of JS code into machine code requires CPU architectures, and complexity Very high.

5.9 version, bytecode is introduced, which can solve the above-mentioned problems of large memory usage, long startup time, and high code complexity.

Next we look at Ignition how to AST converted to bytecode.

The following figure is the work flow chart Ignition AST needs to pass the bytecode generator first, and then after a series of optimizations, can the bytecode be generated.

ignation.png

The optimizations include:

  • Register Optimizer : Mainly to avoid unnecessary loading and storage of registers
  • Peephole Optimizer : Find the reusable part of the bytecode and merge it
  • Dead-code Elimination : Delete useless code and reduce the size of bytecode

After the code is converted into bytecode, it can be executed by the interpreter. Ignition will monitor the execution of the code and record the execution information during the execution, such as the number of executions of the function, the parameters passed each time the function is executed, etc.

When the same code is executed multiple times, it will be marked as hot code. The hot code will be handed over to the TurboFan compiler for processing.

TurboFan compiler

TurboFan gets Ignition , it will first optimize it, and then compile the optimized bytecode into more efficient machine code and store it. Next time the same code is executed again, the corresponding machine code will be executed directly, which greatly improves the execution efficiency of the code.

When a piece of code is no longer a hot code, TurboFan will perform a de-optimization process to restore the optimized and compiled machine code to bytecode, and return the execution rights of the code to Ignition .

Now let's take a look at the specific implementation process.

Take sum += arr[i] as an example. Since JS is a dynamically typed language, sum and arr[i] may be of different types each time. When this code is executed, Ignition will check the data types of sum and arr[i] When it is found that the same code has been executed multiple times, it will be marked as a hot code and handed over to TurboFan .

TurboFan is executed, it is a waste of time to arr[i] sum and 060eba93cae0d8 every time. sum and arr[i] will be determined according to the previous several executions and compiled into machine code. The next time it is executed, the process of judging the data type is omitted.

But if in the subsequent execution process, arr[i] changes, the previously generated machine code does not meet the requirements, TurboFan will discard the previously generated machine code, and the execution right will be handed over to Ignition to complete the de-optimization. process.

Hot code:
image.png

Before optimization:
image.png

Optimized:
image.png

to sum up

Now let's summarize the execution process of V8

  1. The source code is passed through the Parser parser, after lexical analysis and grammatical analysis, AST
  2. AST generates bytecode and executes it through the Ignition
  3. During the execution process, if the hot code is found, the hot code is handed over to the TurboFan compiler to generate machine code and execute
  4. If the hot code no longer meets the requirements, perform de-optimization processing

This technology of combining bytecode with interpreter and compiler is what we usually call just-in-time compilation ( JIT ).

This article does not introduce the garbage collector Orinoco , V8 can be introduced in detail in a separate article, we will see you in the next issue.

Reference article

  1. V8 official document
  2. Celebrating 10 years of V8
  3. V8 execute JavaScript code?
  4. Ignition: An Interpreter for V8
  5. time compilation

阳呀呀
2.2k 声望2.7k 粉丝