使用 c2rust 翻译 bzip2 - 博客 - 第二杆

  • Project Overview: Over the past months, work has been done on libbzip2-rs, a 100% Rust compatible bzip2 implementation. Funded by NLnet Foundation, the project uses c2rust for initial translation from C to Rust. The post describes using c2rust and cleaning up its output.

    • Using c2rust: c2rust by Immunant and Galois turns C code into Rust code. It's useful but the generated code is often unsafe and not idiomatic. c2rust was chosen for this project as the bzip2 C code base is straightforward.
  • Cleaning up c2rust output:

    • Loops: C for loops are translated to Rust while loops as the conversion is always valid. Some loops were cleaned up manually, and in some cases, smarter iteration methods were used.
    • Types and Casting: The output contains many explicit type annotations and casts. Removing them and changing types is manual and time-consuming.
  • Making it safe:

    • Insertion sort example: The generated Rust code for insertion sort was initially unsafe due to pointer arithmetic. By converting to a Rust slice, the code became safe.
    • Complex control flow: C constructs like goto and switch cases that fall through don't translate straightforwardly to Rust. The output can be messy, and in some cases, better names can be used but duplication may be needed.
    • libc functions: Translating libc functions like fopen_output_safely is challenging. Conditional compilation is lost, and using libc directly is not ergonomic. A Rust standard library call was used instead, but fdopen was still needed.
  • Testing:

    • Original test suite: It's important to maintain compatibility with the original C code. Porting the test cases later was valuable but not fun.
    • Fuzzing: Differential fuzzing is useful but we need to be careful as we need to handle older bzip2 versions.
    • Benchmarking: A dashboard was set up to track performance. There are some improvements in compression speed and both improvements and regressions in decompression speed. Benchmarks are somewhat unreliable due to bzip2's memory usage and slowness.
  • Conclusion: For the library portion, using c2rust was successful. For the binary portion, the effort in cleanup was not worth it. c2rust is better than manual translation and continues to improve. The libbzip2-rs source code is available on GitHub and can be used with the libbz2-rs-sys feature gate.
  • Support: Trifecta Tech Foundation's Data Compression initiative aims to create memory-safe compression libraries and relies on sponsorships.
阅读 6
0 条评论