Elasticsearch supports int8 scalar quantization to reduce vector size of float32 vectors output by embedding models. In Elasticsearch 8.16 and Lucene, Better Binary Quantization (BBQ) was introduced. BBQ reduces float32 dimensions to bits, delivering ~95% memory reduction while maintaining high ranking quality and outperforms traditional approaches like Product Quantization in indexing and query speed with no accuracy loss. It normalizes vectors around a centroid, stores multiple error correction values, uses asymmetric quantization, and allows for fast search through bit-wise operations. Indexing with BBQ is simple, and during segment merging, previously calculated centroids are used. Asymmetric quantization transforms vectors for fast queries. Extensive testing shows good recall with different datasets and configurations. BBQ will be released as tech-preview in 8.16 or in serverless now. Related content includes articles on GPU-accelerated Vector Search, scaling late interaction models, searching complex documents with ColPali, and semantic text.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用。你还可以使用@来通知其他用户。