OpenAI 的 GPT-5 以成本商品化和更高的审查亮相

August 7, 2025: GPT-5 Launch
- GPT-5 was rolled out to ChatGPT users and the API with a router, new model sizes, and pricing for production use.
- The product page advertised a 400K token context and 128K maximum output tokens.
Visual Contradictions in Bar Charts
- OpenAI presented bar charts to show GPT-5's deception and benchmark performance improvements but they visually contradicted the numbers.
- For example, in the "coding deception" chart, the bar for GPT-5 was shorter than expected despite a corrected written figure.
API Updates and Features
- The API surface consolidated around the Responses API, introduced in March and expanded in May.
- It enables "agentic" apps with multimodal prompting and built-in tools.
- Updates include direct access to image generation, Code Interpreter, improved file search, and remote Model Context Protocol servers.
- It also adds background mode, reasoning summaries, and encrypted reasoning items.
Model Performance on Different Tasks
- For MLE-Bench and Kaggle-like GPU workloads, the ChatGPT agent scores highest with a 9% bronze pass rate.
- On SWE-Lancer, it is also the best performer.
- GPT-5's reasoning model is strong on code-centric debugging and replication, while the routed agent is better on long-horizon, multi-skill workloads.
Pricing and Network Effects
- GPT-5 is priced at $1.25 per million input tokens and $10 per million output tokens, about half the input cost of GPT-4o.
- Opening GPT-5 to everyone immediately locks in massive network effects, with new and existing users upgrading and spending more.
Model Reliability Improvements
- Against open-ended factuality sets, GPT-5 models show lower hallucination rates than OpenAI o3 and prior baselines.
- METR's autonomy review concludes it is unlikely GPT-5 would have certain negative impacts.
Structured Outputs Maturity
- OpenAI's structured outputs have matured with strict JSON Schema enforcement and single flag.
- It pairs well with function calling and reduces glue code in extraction and integration pipelines.
User Response and Ecosystem Impact
- User response on Reddit has been volatile since the GPT-5 rollout, with disappointment over tone changes, rate limits, and model removals.
- Tech press reported OpenAI restoring GPT-4o as an option.
- For teams shipping against the ChatGPT runtime, it's important to instrument user sentiment and plan reversibility.
Learning Resources
- Developers can refer to the GPT-5 system card and follow other OpenAI coverage on InfoQ.