NumPy 与 BLAS:吞吐量损失 90%

  • NumPy Overview: Downloaded over 5 billion times, it's the most popular Python library for numerical computing. Wraps low-level HPC libraries like BLAS and LAPACK with a high-level matrix operation interface. BLAS is mainly in C, Fortran, or Assembly and available for most modern chips.
  • BLAS Performance in NumPy: Up to 90% of throughput is lost on 1536-dimensional OpenAI Ada embeddings and more on shorter vectors. SimSIMD can partly fix this.
  • Baseline Benchmarks: Compared NumPy's default PyPi distribution with C layer's OpenBLAS. In single-threaded dot-product operations, NumPy is 3.53x to 8.73x slower for 1536-dimensional real and 768-dimensional complex vectors. Slowdown sources include dynamic dispatch, type checking, and memory allocations.
  • SimSIMD: Added complex dot products in v4 release, supports half-precision complex numbers not supported by NumPy or most BLAS. Compares with NumPy in functionality and performance. In Python, SimSIMD is significantly faster than NumPy in most cases except half-precision where NumPy is 8x slower.
  • Replicating Results: Instructions to clone repos, install BLAS, and run C and Python benchmarks. C++ benchmark logs show various performance details with different numeric types and SIMD extensions. Python benchmark logs show performance improvements of SimSIMD over NumPy for different data types and operations.
阅读 11
0 条评论