AI code generators have become very popular in recent years, from OpenAI's Codex to DeepMind's AlphaCode. However, neither of these two AI models is open source: AlphaCode only gives some test examples, while Codex only opens up the API.
"Despite the great success of large language code models, none of the strongest models have been made public," said Carnegie Mellon researchers. "This prevents the adoption of these models outside well-resourced companies and limits resources." Organizations lack research in this area."
Therefore, several researchers from Carnegie Mellon University have launched an open-source automatic code generator model PolyCoder with 27B parameters, based on the GPT-2 architecture, trained on a 249GB code database in 12 programming languages.
The 12 programming languages are: C, C#, C++, Go, Java, JavaScript, PHP, Python, Ruby, Rust, Scala, and TypeScript.
Training results show that PolyCoder outperforms all known models, including Codex, in writing C. Compared with other open source models, PolyCoder performs better in C, JavaScript, Rust, Scala and TypeScript than the similar model GPT-Neo 2.7B. But Codex still outperforms PolyCoder in other languages.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。