- Abstract Summary: Proposes Tensor Product Attention (TPA), a new attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, shrinking KV cache size during inference. Integrates with RoPE to improve model quality and memory efficiency. Based on TPA, introduces Tensor ProducT ATTenTion Transformer (T6) with better performance than standard Transformer baselines in language modeling tasks across various metrics. TPA's memory efficiency allows processing longer sequences under fixed resources. Code available at [https://github.com/tensorgi/T6].
- Details: 31 pages with 6 figures. Subjects include Computation and Language (cs.CL), Artificial Intelligence (cs.AI), and Machine Learning (cs.LG). Cited as [arXiv:2501.06425] or [arXiv:2501.06425v2] with arXiv-issued DOI via DataCite. Submission history shows from Yifeng Liu with v1 on Sat, 11 Jan 2025 and v2 on Fri, 7 Feb 2025.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。