- Announcement: The alpha release of a new compacting Zig tokenizer is announced. It can be found in the repository Validark/Accelerated-Zig-Parser. Give it a star to support the developer.
- Current status: Not ready for prime-time. Only AMD64 machines with AVX-512 instructions are supported. The new implementation is 2.75 times faster and uses 2.47 times less memory than the mainline implementation, currently at 1.4GB/s on a single core.
- Benchmarked tokenizers: There are three Zig tokenizers benchmarked: the one in the Zig Standard Library used by the 0.14 compiler, a heat-seeking tokenizer (temporarily removed but will be added back by July), and the compacting tokenizer. On the author's laptop with a Ryzen AI 9 HX 370, the results show that the compacting tokenizer is significantly faster and uses less memory.
- Headline features: The new compacting tokenizer processes 64-byte chunks of source code at once (soon to be 512!). It includes various SIMDized and branchless routines such as a UTF-8 validator, bit-manipulation for escaped characters, parallel parsing of lines, multi-purpose vectorized table-lookup, a mini non-deterministic finite state machine, SIMD hasher, and token-matching logic using bit-manipulation and SIMD operations.
- Simplified explanation: By processing entire chunks at once and using bitstrings, the tokenizer can quickly determine the start and end positions of tokens. For example, by creating bitstrings for different character types and performing bitmanipulations, it can find the starts and ends of identifiers. Vector compaction is then used to determine the actual positions of all tokens in the chunk.
- Future plans: More work is needed, including processing 512-bytes at once, optimizing loop-carried variables handling, and providing comptime-switches for different tokenizer consumption methods. The best talk on how the components work together will be given at Utah Zig in July.
- Running the benchmark: Only x86-64 machines with AVX-512 instruction set are supported. Clone the repository and some Zig projects, install Zig 0.14 using the one-off script or the helper script, build and execute the benchmark, and enable performance mode on Linux and bind to a single core if needed.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。