- NumPy Overview: Downloaded over 5 billion times, it's the most popular Python library for numerical computing. Wraps low-level HPC libraries like BLAS and LAPACK with a high-level matrix operation interface. BLAS is mainly in C, Fortran, or Assembly and available for most modern chips.
- BLAS Performance in NumPy: Up to 90% of throughput is lost on 1536-dimensional OpenAI Ada embeddings and more on shorter vectors. SimSIMD can partly fix this.
- Baseline Benchmarks: Compared NumPy's default PyPi distribution with C layer's OpenBLAS. In single-threaded dot-product operations, NumPy is 3.53x to 8.73x slower for 1536-dimensional real and 768-dimensional complex vectors. Slowdown sources include dynamic dispatch, type checking, and memory allocations.
- SimSIMD: Added complex dot products in v4 release, supports half-precision complex numbers not supported by NumPy or most BLAS. Compares with NumPy in functionality and performance. In Python, SimSIMD is significantly faster than NumPy in most cases except half-precision where NumPy is 8x slower.
- Replicating Results: Instructions to clone repos, install BLAS, and run C and Python benchmarks. C++ benchmark logs show various performance details with different numeric types and SIMD extensions. Python benchmark logs show performance improvements of SimSIMD over NumPy for different data types and operations.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。