Google's Gemma 3: An open-source generative AI model with vision-language understanding, long context handling, and improved multilinguality.
- New features in blog post: Discussed by Google DeepMind and AI Studio teams, includes KV-cache memory reduction, a new tokenizer, and better performance/higher resolution vision encoders.
- Technical Report summary: Custom Sigmoid loss for Language-Image Pre-training (SigLIP) vision encoder, Pan & Scan algorithm for handling different image aspects, treating images as sequence of "soft tokens", and bi-directional attention with image inputs.
- Memory efficiency changes: Reduces KV-cache memory usage for long context, allowing analysis of longer documents and conversations.
- Improved tokenizer: Vocabulary size changed to 262k using the same SentencePiece tokenizer as Gemini, more balanced for non-English languages.
- Multilingual capabilities: Revisited data mixture with more multilingual data and revised pre-training/post-training process.
- Performance comparison: Better than Gemma 2 on pre-trained instruction-tuned versions across benchmarks, ranks among top 10 in LM Arena as of Apr 12, 2025, with higher Elo score.
- Long context handling: Generalizes to 128k context length after Rotary Position Embedding (RoPE) rescaling during pre-training.
- For more information: Check developer guide, model card, meme generator, and Gemmaverse.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。