In-depth interpreter implementation of WebAssembly

Image source: https://unsplash.com/photos/N8yoH-dj4k8
Author of this article: Wu Liuyi

Wasm interpreter project address:

background

Since the end of last year, the author decided to go deep into the technology of WebAssembly (for writing convenience, hereafter referred to as Wasm), while reading the "Principles and Core Technologies of WebAssembly" (this book explains the Wasm interpreter in detail) And the working principle and realization idea of the virtual machine), the idea of implementing a Wasm interpreter was born, so this project was created. Next, we will go straight to the topic and see how to implement a Wasm interpreter.

Wasm background knowledge

Before explaining the implementation process of the interpreter in detail, first introduce the background knowledge of Wasm.

What is Wasm

Wasm is a low-level assembly language that can run on the Web platform at a speed close to that of native applications. Languages such as C/C++/Rust use Wasm as the target language for compilation, and can transplant existing codes to run on the Web platform to improve code reuse.

The definition given by Wasm's official website is-WebAssembly (abbreviated as Wasm) is a based on a stack virtual machine. Wasm is designed to be a portable compilation target of a programming language, which can be deployed on the Web platform to provide services for client and server applications.

Among them, Wasm is defined as a virtual instruction set architecture V-ISA (Virtual-Instruction Set Architecture) . For the interpretation of this aspect, please refer to the content of the execution phase below.

Then look at some of the characteristics of Wasm:

level of 161501a81d3f4c must be lower than , as close as possible to machine language, so that the interpreter can easily perform AOT/JIT compilation and run Wasm programs at a speed close to native applications;
used as the target code , which is generated by other high-level language compilers;
code is safe and controllable , it cannot perform arbitrary operations like real assembly language;
code must be platform-independent (not platform-dependent machine code), so that it can be executed across platforms, so virtual machine/bytecode technology is used.

Tip: For more details about Wasm, please refer to the author's translated article "The Future of WebAssembly in the Post-MVP Era: A Cartoon Skill Tree (Translation)"

What can Wasm do

Wasm currently has certain applications in browser-side image processing, audio and video processing, games, IDE, visualization, scientific computing, etc., as well as non-browser serverless, blockchain, IoT and other fields. If you want to learn more about the Wasm application, you can follow the author's other GitHub repository:

https://github.com/mcuking/Awesome-WebAssembly-Applications

Wasm specification

Wasm technology currently has 4 specifications:

Core Specification -Defines the semantics of Wasm modules independent of specific embedding (ie platform independent).
JavaScript API -defines JavaScript classes and objects used to access Wasm from within JavaScript.
Web API —— Defines JavaScript API extensions specifically available in web browsers.
WASI API —— defines a modular system interface to run Wasm outside the Web, such as accessing files, network links and other capabilities.

The Wasm interpreter introduced in this article mainly runs in a non-browser environment, so there is no need to pay attention to JavaScript API and Web API specifications.

In addition, the currently implemented version does not involve WASI (subsequent planned support), so you only need to pay attention to the core specification .

Wasm module

The Wasm module mainly has the following 4 manifestations:

binary format -Wasm's main encoding format, ending with .wasm suffix.
text format -mainly for the convenience of developers to understand the Wasm module, or to write small test code, ending with the .wat suffix, which is equivalent to an assembly language program.
Memory format -The performance of the module loaded into the memory. This performance is related to the implementation of the specific Wasm virtual machine. Different Wasm virtual machine implementations have different memory representations.
module instance -If the memory format is understood as a class in an object-oriented language, then the module instance is equivalent to an "object".

The following figure shows the factorial function written in C language, and the corresponding Wasm text format and binary format.

The memory format is related to the implementation of the specific Wasm interpreter. For example, the memory format of this project is roughly as follows (will be explained in detail later in the execution phase):

The associations between the various formats are as follows:

binary format mainly generated by a high-level programming language compiler, and can also be generated by compiling in a text format.
text format can be directly written by the developer or generated by binary decompilation.
The Wasm interpreter usually decodes the binary module into its internal form, that is, memory format (such as C/C++ structure), and then performs subsequent processing.

Finally, I recommend a site called WebAssembly Code Explorer, which can more intuitively view the association between Wasm binary format and text format.

https://wasdk.github.io/wasmcodeexplorer/

Interpreter implementation principle

Through the above introduction, I believe everyone has a general understanding of Wasm technology. Next, we start by analyzing the execution flow of the Wasm binary file and discuss the implementation ideas of the interpreter.

Wasm binary files are executed in three stages: decoding , verification , execution

Decoding stage : Decode the binary format into the memory format.
verification stage : static analysis of the module to ensure that the structure of the module meets the specification requirements, and the bytecode of the function has no bad behavior (for example, calling a non-existent function).
execution stage : further divided into instantiation and function call two stages.

Tip: The interpreter implemented in this project does not have a separate verification phase . Instead, the specific verification is distributed in the decoding stage or execution stage , for example, in the decoding stage verify whether there is an illegal segment ID, in the execution stage verify the type or number of parameters or return value of the function Whether it matches the function signature, etc.
In addition instantiation process decoding stage is complete, the implementation phase only needs to be function calls can be.
The so-called instantiation , the main content is to apply for space for memory segments, table segments, etc., record the entry addresses of all functions (custom functions and imported functions), and then record all the module information into a unified data structure module middle.

Next, we will elaborate on the implementation details decoding stage and the execution stage

Decoding stage

Wasm binary file structure

Like other binary formats (such as Java class files), the Wasm binary format also starts with a magic number and a version number, followed by the main content of the module, which is placed in different sections according to different purposes. A total of 12 segments are defined, and each segment is assigned an ID (from 0 to 11). Except for custom segments, all other segments can only appear once at most, and they must appear in ascending order of ID. The ID has 12 segments from 0 to 11 as follows:

Custom segment, type segment, import segment, function segment, table segment, memory segment, global segment, export segment, start segment, element segment, code segment, data segment

Tip: There is a certain basis for the ordering of different segments. The main purpose is to perform stream compilation-that is, while downloading the Wasm module and compiling it into machine code, please article 161501a81d4807 "Making WebAssembly even faster: Firefox's" new streaming and tiering compiler》

In other words, each different section describes part of the Wasm module information. And all the sections in the module are put together to describe all the information of this Wasm module:

memory segment and data segment : The memory segment is used to store the runtime dynamic data program. The data segment is used to store static data for initializing the memory. Memory can be imported from an external host, and memory objects can also be exported to an external host environment.
table segment and element segment : The table segment is used to store the object reference . Currently, the object can only be a function, so can realize the function pointer function through the table segment. The element section is used to store the data of the initialization table section. Table objects can be imported from an external host, and table objects can also be exported to an external host environment.
start segment : the start segment is used to store the index start function, that is, it specifies a function that runs automatically when it is loaded. The main functions of the start function: 1. Initialize the module after loading; 2. Turn the module into an executable file.
global segment : global segment for storing information global variable (global variable value type, variability initialization expression, etc.).
Function section, code section and type section : These three sections are all used to store data expressing functions. in
type segment : The type segment is used to store all the function signatures in the (the function signature records the type and number of function parameters and return values). Note that if there are multiple functions with the same function signature, store a copy That's it.
Function section : The function section is used to store the function signature index corresponding to the function. Note that it is the index of the function signature, not the function index.
code segment : The code segment is used to store the bytecode and local variables of the function, which is the bytecode corresponding to the local variables and codes in the function body.
lead-in section and lead-out section : export section is used to store export item information (member name, type, and index in the corresponding section, etc.) of the export item. Introducing section for storing introducing item information (item introduction member name, type, and from which import module, etc.). There are 4 types of export/import items: functions, tables, memory, and global variables.
custom segment : custom segment is mainly used to save debugging symbols and other information irrelevant to the operation.

Tip: In the above Wasm binary format segment, the table segment should be more difficult to understand. Here is a special explanation.
In the Wasm design philosophy, is completely separated from the memory and the code segment/stack related to the execution process. This is in the case of the code segment/data segment/heap/stack in the usual system structure in a unified addressing memory space. It is completely different. The function address is invisible to the Wasm program, let alone passing, modifying and calling functions as variables.
The table is the key to this mechanism. The table is used to store object references. At present, the object can only be a function, which means that the table is only used to store function index values. Wasm program can only use the index in the table to find the corresponding function index value to call the function, and the runtime stack data is not stored in the memory object . This completely eliminates the possibility of Wasm code execution out of bounds. The worst case is just to generate a bunch of wrong data in the memory object.

Know each segment and the corresponding use of specific encoding format for each segment (detailed encoding format viewable module.c in load_module comment function), we can decode Wasm binary file, which is "translated" into memory Format, that is, to record all the information of the module into a unified data structure- module , module structure is shown in the following figure:

Tip: In order to save space and make binary files more compact, the Wasm binary format uses LEB128 (Little Endian Base 128) to encode integer values such as list length and index. LEB128 is a variable-length encoding format. A 32-bit integer encoding will occupy 1 to 5 bytes, and a 64-bit integer encoding will occupy 1 to 10 bytes. The smaller the integer encoding, the less bytes it takes up. Since integers such as list length and index are usually relatively small, the use of LEB128 encoding can save space.
LEB128 has two characteristics: 1. It is represented in little-endian order, that is, the low-order byte is in front, and the high-order byte is in the back; 2. It adopts the 128 system, that is, every 7 bits form a group (the last 7 bits of a byte) ), the highest bit vacated is the flag bit, 1 means there is a subsequent byte, 0 means no.
There are two variants of LEB128, which are used to encode unsigned integers and signed integers respectively. The specific implementation can be found in the read_LEB function in https://github.com/mcuking/wasmc/blob/master/source/utils.c .

Finally, a screenshot of part of the actual code corresponding to the decoding stage is shown as follows:

For more details, to the load_module function in 161501a81d4c56 https://github.com/mcuking/wasmc/blob/master/source/module.c 161501a81d4c57, which has rich comments.

Execution phase

After the above decoding stage, we can get the memory format covering all the information needed in the execution stage from the Wasm binary file. Next, let's explore together how to implement the execution stage based on the above memory format. Before the official start, first need to introduce the relevant knowledge of the down-stack virtual machine as a foreshadowing.

The official website's definition of Wasm- Wasm is a binary instruction format based on a stack virtual machine. In other words, Wasm is not only a programming language, but also a set of virtual machine architecture specifications. So what is a virtual machine, and what is a stack virtual machine?

Virtual machine concept

The virtual machine is the simulation of the hardware by software. It simulates the work of the hardware with the help of the function provided by the operating system and the compiler. Here, it mainly refers to the simulation of the hardware CPU. The virtual machine executes instructions mainly in the following 3 steps:

fetch - points to the address in the instruction stream from the instruction program counter PC acquires
Decoding — Determine the type of instruction and enter the corresponding processing flow
execute -execute the corresponding function according to the meaning of the instruction

To execute an instruction in the instruction stream is to continuously execute the above three steps in a loop. In the process of loop execution, there needs to be a mark to record which instruction has been executed currently, that is, program counter PC (Program Count) —— used to save the address of the next instruction to be executed.

Tip: The Wasm virtual machine is not provided with the platform-related machine code byte code , which is composed of a set of instructions customized by Wasm, mainly for the purpose of achieving cross-platform. The software simulates the CPU and defines a set of custom instruction sets similar to the CPU instruction set, so that only the programs of the virtual machine itself need to be adapted to different platforms, while the programs running on the virtual machine do not need to care about which platform it is running on superior.

Wasm instruction set

Wasm commands are mainly divided into 5 categories:

control instructions -function call/jump/loop, etc.
parameter instruction -discard the top of the stack, etc.
variable instructions -read and write global/local variables
memory instructions -memory load/store
Numerical instruction -Numerical calculation

Each instruction contains two parts of information: opcode and operand.

Operation Code (Opcode) : It is the ID of the instruction, which determines the operation that the instruction will perform. It is fixed at 1 byte. Therefore, the instruction set contains up to 256 instructions. This code is also called bytecode . The Wasm specification defines a total of 178 instructions. Since the opcode is an integer, which is easy for machine processing but not human friendly, the Wasm specification defines a mnemonic for each opcode.

The figure below is an enumeration of the opcode mnemonics of some Wasm instructions. For the completed version, please refer to https://github.com/mcuking/wasmc/blob/master/source/opcode.h .

In addition, there is a visual table on GitHub that displays all the operation codes of Wasm intuitively. Interested students can click to view it.

https://pengowray.github.io/wasm-ops/

The content of the operands will be introduced in the stack virtual machine section below.

Stacked Virtual Machine

Virtual machines are roughly divided into two types: register virtual machines and stack virtual machines.

Registered virtual machine : According to the hardware CPU implementation idea, registers are also simulated inside the virtual machine, and operands and instruction execution results can be stored in registers. The actual case is V8 / Lua virtual machine.
Because the number of registers is limited, how to allocate infinite variables to limited registers without conflicts requires register allocation algorithms, such as classic graph coloring algorithms. Therefore, the register-based virtual machine is slightly more difficult to implement, but the optimization potential is greater.
stack virtual machine : The result of the instruction is stored in the simulated operand stack (Operand Stack), which to implement than the register virtual machine 161501a81d50dc. Actual cases are JVM / QuickJs / Wasmer.

Next, we will introduce in detail the working mechanism of the down-stack virtual machine.

Operand

The main feature of a stack virtual machine is that it has an operand stack. Most Wasm instructions perform certain operations on the operand stack, such as the following instructions:

f32.sub : Indicates that two 32-bit floating-point numbers are popped from the operand stack, their difference is calculated and the result is pushed to the top of the operand stack.

The two 32-bit floating-point numbers popped from the operand stack are the operands. The following is the specific definition:

operand , also known as dynamic operand , refers to the number at the top of the operand stack and manipulated by instructions at runtime.

Immediate

Let's look at another example of instructions:

i32.const 3 : Indicates that the local variable of the 32-bit integer type with index 3 is pushed to the top of the operand stack.

And this value 3 is an immediate number, the following is the specific definition:

immediate data , also known as static immediate parameter/static operand , immediate data is directly hard-coded in the instruction (that is, in the bytecode), immediately following the opcode. Most Wasm instructions do not have immediate data. To know which instructions in Wasm instructions have immediate data, please refer to https://github.com/mcuking/wasmc/blob/master/source/module.c The skip_immediate function in.

The above discussion is only the execution of an instruction. Below we are looking at how the next function is executed on the stack virtual machine:

The caller pushes the parameters into the operand stack
After entering the function, initialize the parameters
Execute the instructions in the function body
Push the execution result of the function to the top of the operand stack and return
The caller obtains the return value of the function from the operand stack

As shown below:

It can be seen that the parameter transfer and return value acquisition during function call, as well as the execution of instructions in the function body, are all done through the operand stack.

Call stack and stack frame

As can be seen from the above description, function calls are often nested. For example, function A calls function B, and function B calls function C. Therefore, another stack is needed to maintain the call relationship information between functions- Call Stack .

call stack is composed of independent stack frames . Each time a function is called, a stack frame is pushed into the call stack. (Note: For the sake of simplicity and clarity, only the function is discussed. Others such as If / Loop, etc. The control block is not discussed in this article for the time being). Each time the function execution ends, the corresponding stack frame will be popped from the call stack and destroyed. A series of function calls is the process of constantly creating and destroying stack frames. But at any one time, only the call to the top of the stack stack frame is active, the so-called current stack frame .

Each stack frame includes the following:

associated stack frame structure variables function , store all the information for that function.
operand stack used to store parameters, local variables, and operands during the execution of the function body instruction.
It should be reminded that the stack frames associated with all functions of . Each stack frame occupies a certain part of the operand stack. Each stack frame only needs a pointer to save its own part of the operation. The address of the bottom of the stack is used to distinguish it from the operand stack part of other stack frames.
The advantage of this is that the operand stack part of the stack frame associated with the caller function and the called function is adjacent in the entire operand stack, which is convenient for the caller function to pass parameters to the called function and is also convenient for being called After the function is executed, the return value is passed to the calling function.
function returns address , which is used to store the address of the next instruction of the stack frame call instruction. When the stack frame is popped from the call stack, it will return to the next instruction of the stack frame call instruction to continue execution, in other words After the function corresponding to the current stack frame is executed and exited, it returns to the place where the function was called to continue executing the following instructions.

Tip: At present, the stack frame defined by this interpreter does not have a local variable table similar to the JVM virtual machine stack frame. Instead, the parameters, local variables and operands are all placed on the operand stack. The main purpose is two:
The implementation is simple, and there is no need to define additional local variable tables, which can simplify the code to a great extent.
Turning the parameter transfer into a no-operation NOP allows a part of the data in the operand stack of the two stack frames to overlap. This part of the data is the parameter, which naturally plays the role of parameter transfer between different functions.

Practical example

After the above foreshadowing, I believe that everyone has a certain understanding of the stack virtual machine. Finally, we use a practical example to illustrate the entire execution process:

There are two functions in the following Wasm text format: compute function and add function. The add function mainly receives two numbers (the types are 32-bit integers and 32-bit floating-point numbers) and calculates the sum of the two numbers. The add function is called twice in the compute function. Note that when the add function is called for the second time, the return result of the last call to the add function has been saved on the operand stack (again, it is confirmed that the stack frames associated with the two functions share the same A complete operand stack can easily realize the transfer of parameters between functions), so this time only the second parameter needs to be passed in.

(module
    (func $compute (result i32)
        i32.const 13    ;; 向操作数栈压入 13
        f32.const 42.0  ;; 向操作数栈压入 42.0
        call $add       ;; 调用 $add 函数得到 55
        f32.const 10.0  ;; 向操作数栈压入 10.0
        call $add       ;; 再调用 $add 函数得到 65
    )
    (func $add(param $a i32) (param $b f32) (result i32)
        i32.get_local $a  ;; 将类型为 32 位整数的局部变量 $a 压入到操作数栈
        f32.get_local $b  ;; 将类型为 32 位浮点数的局部变量 $b 压入到操作数栈
        i32.trunc_f32_s   ;; 将当前操作数栈顶的 32 位浮点数 $b 截断为 32 有符号位整数（截掉小数部分）
        i32.add           ;; 将操作数栈顶和次栈顶的 32 位整数从操作数栈弹出，并计算两者之和然后将和压入操作数栈
    )
    (export "compute" (func $compute))
    (export "add" (func $add))
)

Correspondingly, the schematic diagram of its execution process is as follows:

Finally, a screenshot of the actual code corresponding to the execution stage is shown as follows:

You can see the three stages of fetching, decoding, and execution of the virtual machine, which can be implemented simply by using the while loop and switch-case statement. More details of the recommended inspection https://github.com/mcuking/wasmc/blob/master/source/interpreter.c in interpreter function, which is rich in comments to explain.

Concluding remarks

The above is the core content of the Wasm interpreter implementation. Of course, this is only the most basic function of the Wasm interpreter-simply parse and execute instructions one by one. It does not provide JIT functions like other professional interpreters-that is, explain and execute first. Bytecode to start quickly, and then compile it into platform-related machine code through JIT to improve the execution speed of the following code (Note: The specific implementation process of JIT varies depending on the interpreter).

So using the interpreter of this project to interpret and execute the Wasm file does not have much advantage in speed. But it is also because its implementation is relatively simple, so the source code is easier to read, and there are rich comments, so it is very suitable for readers who are interested in Wasm to quickly understand the core principles of the technology.

It should be pointed out that this article does not involve how to use Wasm technology. And it happens that the author is developing a video player that supports H256 encoding based on Wasm and FFmpeg. The links to related articles are as follows:

"In-depth WebAssembly Video Player Application"

It is expected that after the video player is put into the actual production environment, the content of the article will be gradually improved-focusing on how to better apply Wasm technology in front-end projects, so stay tuned~

https://github.com/mcuking/blog

Reference

This article was published from big front-end team of NetEase Cloud Music . Any form of reprinting of the article is prohibited without authorization. We recruit front-end, iOS, and Android all year round. If you are ready to change jobs and you happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!

In-depth interpreter implementation of WebAssembly

background

Wasm background knowledge

What is Wasm

What can Wasm do

Wasm specification

Wasm module

Interpreter implementation principle

Decoding stage

Wasm binary file structure

Execution phase

Virtual machine concept

Wasm instruction set

Stacked Virtual Machine

Operand

Immediate

Call stack and stack frame

Practical example

Concluding remarks

Reference

云音乐技术团队

引用和评论

AI Code 在团队开发工作流的融合思考

Vue.js-Vue实例

你可能不知道的图片加载相关知识

手写一个动态海洋和天空效果的vue hooks

使用CSS给标题添加书名号并超出省略

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！

原生electron起步-从零到一完成构建和打包