- Microsoft's BitNet b1.58 2B4T: The first LLM trained with "1-bit" (technically 1-trit) weights instead of being quantized from a floating point trained model. It shows comparable performance to full-precision LLMs of similar size with lower computation cost and hardware requirements.
- Barriers to LLM adoption: State-of-the-art open LLMs have large memory footprints, consume much energy and have notable inference latency, making them impractical for many edge devices and real-time applications.
- Training details: Trained from scratch on a 4 trillion token corpus using 1-bit weights. Replaces standard full-precision linear layers with custom BitLinear layers using 1.58-bit ternary value encoding. Incorporates techniques like activation quantization and normalization. Relies on large-scale pre-training, supervised fine-tuning and direct preference optimization.
- Inference and library: Cannot be used with standard deep learning libraries. Microsoft developed an open-source inference library [bitnet.cpp] which offers optimized kernels for fast and lossless inference on CPU and will support NPU and GPU next. Future research includes training larger models, adding multi-lingual and multi-modal capabilities and extending context window length.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。