为 NumPy 2.0 编写快速字符串通用函数

NumPy 2.0 is set to be released after a lot of work. It has new features like changes to Python and C APIs and improved documentation. There was also work on improving string operation performance.

How it started: Last July, the D.E. Shaw group reported poor string operation performance. NumPy created Python objects and called methods for each string array operation, which was costly.
Creating the first ufunc - isalpha: The solution was to operate on raw C data in the NumPy array buffer using ufuncs. A ufunc operates element-wise on ndarrays. The isalpha ufunc loop was relatively easy to write, getting pointers to input and output buffers and looping over items. Writing this ufunc showed that it was the right way to go, and work on other string operation ufuncs began.
Fixed-length string dtypes: In NumPy as of 1.26, there are two string dtypes, one for Unicode (UTF-32) and one for bytes (ASCII), which are fixed-length. The maximum length of each array item is specified by the dtype.
Next up was add: Implementing add as a ufunc was not straightforward. NumPy needed a resolve_descriptors function to register the loop as string dtypes are parametric. The function specified the parameters for all necessary dtypes and worked correctly after registration.
find / rfind: Nothing new here, right?: find and rfind are new ufuncs with fixed-type dtypes and accept four input arguments. A promoter function is needed to specify promotion rules for different input dtypes to use the same loop.
Last stop: replace: The replace ufunc requires writing a loop for string dtypes and int64, adding a promoter to promote other integer dtypes to int64, and a resolve_descriptors function to specify the output dtype size. A Python wrapper is used to handle the case where the output dtype size is not known from the input dtypes.
Results: Benchmarks showed a big performance improvement, especially for larger arrays. There was a 150x - 492x performance increase for 1000-element arrays and a 4x - 11x speed-up for smaller two-element arrays.
Conclusion: This work led to a complete rework of how string operations are done in NumPy. There is also a new numpy.strings namespace with fast ufuncs for most string operations and support for different string types.
Acknowledgements: Thanks to Nathan Goldbaum, Sebastian Berg, Marten van Kerkwijk, and Matti Picus for their reviews and patience. And a big thank you to the D.E. Shaw group for sponsoring the work.