Wasmi 的新执行引擎 - 比以往更快

Wasmi v0.32 Overview: After months of research and development, Wasmi's most significant update is ready for production. It's an efficient and versatile WebAssembly interpreter for embedded environments, mirroring the Wasmtime API. Install the CLI tool via cargo install wasmi_cli or use it as a library.
Startup Performance: Wasmi is a rewriting interpreter that rewrites WebAssembly bytecode into its internal bytecode. Fast translation leads to a fast startup time, important for translation-intensive workloads. Lazy translation translates only necessary parts, and Wasmi supports three translation modes: Eager (default), Lazy, and LazyTranslation. It also has unchecked translation and non-streaming translation. Additionally, linker caching improves performance with dozens of host functions. Benchmarks show significant speedups in startup time.
Benchmarks: Combining techniques, Wasmi speeds up startup time by several orders of magnitude. In various benchmarks like ERC-20, Argon2, etc., Wasmi and Wasm3 perform well due to lazy compilation. Single-pass JITs are slower, and Stitch's translation performance needs improvement.
Execution Speed: The old Wasmi v0.31 had issues in computation speed. From v0.32, Wasmi uses a register-based IR for execution, which uses fewer instructions but may favor compute-intense workloads. Memory consumption is also reduced. Benchmarks show Wasmi's performance on different CPUs, with some differences on Apple silicon. Stitch performs well, and the Coremark scores don't fully reflect Wasmi's performance.
Benchmark Suite: The benchmarks are generated using the wasmi-benchmarks repository. Contributions to add more runtimes or test cases are welcome.
Summary & Outlook: This article shows Wasmi v0.32's highlights and improvements. Many WebAssembly proposals are waiting for implementation. Wasmi has potential, especially on AMD server chips, and performance on Apple silicon will improve. Plans to implement the Wasm C-API are underway.
Special Thanks: Thanks to Parity Technologies, the Bytecode Alliance, OLUWAMUYIWA, yamt, and Neopallium for their contributions.
Wasm Workload Types: There are two types of Wasm workloads - compute-intensive and translation-intensive. Runtimes like Wasmtime are for compute-intensive, and Wasmi and others for translation-intensive. Balanced runtimes exist too.
Other Notes: The new translation from stack-based to register-based bytecode is complex. An outdated benchmark for startup and memory consumption exists. Coremark scores for different modes are explained. Wasmi's inferior performance on Apple silicon may be due to loop-switch dispatch and Apple's M4 chip enhancements. Stitch's approach has downsides relying on LLVM's optimizer. The paper "The Structure and Performance of Efficient Interpreters" is relevant.