微软原生 1 位 LLM 可能将高效生成式 AI 带到日常 CPU 中

发布于 4 月 23 日

Microsoft's BitNet b1.58 2B4T: The first LLM trained with "1-bit" (technically 1-trit) weights instead of being quantized from a floating point trained model. It shows comparable performance to full-precision LLMs of similar size with lower computation cost and hardware requirements.
Barriers to LLM adoption: State-of-the-art open LLMs have large memory footprints, consume much energy and have notable inference latency, making them impractical for many edge devices and real-time applications.
Training details: Trained from scratch on a 4 trillion token corpus using 1-bit weights. Replaces standard full-precision linear layers with custom BitLinear layers using 1.58-bit ternary value encoding. Incorporates techniques like activation quantization and normalization. Relies on large-scale pre-training, supervised fine-tuning and direct preference optimization.
Inference and library: Cannot be used with standard deep learning libraries. Microsoft developed an open-source inference library [bitnet.cpp] which offers optimized kernels for fast and lossless inference on CPU and will support NPU and GPU next. Future research includes training larger models, adding multi-lingual and multi-modal capabilities and extending context window length.

阅读 36