- OpenAI funded FrontierMath, a leading AI math benchmark, which only became known when OpenAI announced its record-breaking performance on the test.
- FrontierMath, introduced in November 2024, tests AI systems' ability to handle complex math problems. Its problems were created by over 60 leading mathematicians.
- The connection between OpenAI and FrontierMath emerged on December 20 when OpenAI unveiled its new o3 model, achieving a 25.2% success rate on the benchmark's problems.
- Epoch AI, the benchmark's developer, had an agreement preventing them from revealing OpenAI's support until o3's announcement. They acknowledged the connection in a footnote.
- More than 60 mathematicians who created the benchmark problems were unaware of OpenAI's involvement even after o3's announcement.
- Tamay Besiroglu from Epoch AI admits mistakes and says OpenAI had access to many math problems and solutions before o3's announcement. Epoch AI kept a separate set of problems private.
- They have a verbal agreement with OpenAI prohibiting the company from using the materials to train models.
- There is a recommendation for more transparency in AI benchmarking, especially as mathematical reasoning is a weakness of language models.
- Epoch AI lead mathematician Elliot Glazer believes OAI has been accurate but they need to independently evaluate the model using the holdout set.
- The situation highlights the complexity of AI benchmarking and the importance of test results in attracting attention and investment.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。