OpenAI 的 GPT-5 以成本商品化和更高的审查亮相

  • August 7, 2025: GPT-5 Launch

    • GPT-5 was rolled out to ChatGPT users and the API with a router, new model sizes, and pricing for production use.
    • The product page advertised a 400K token context and 128K maximum output tokens.
  • Visual Contradictions in Bar Charts

    • OpenAI presented bar charts to show GPT-5's deception and benchmark performance improvements but they visually contradicted the numbers.
    • For example, in the "coding deception" chart, the bar for GPT-5 was shorter than expected despite a corrected written figure.
  • API Updates and Features

    • The API surface consolidated around the Responses API, introduced in March and expanded in May.
    • It enables "agentic" apps with multimodal prompting and built-in tools.
    • Updates include direct access to image generation, Code Interpreter, improved file search, and remote Model Context Protocol servers.
    • It also adds background mode, reasoning summaries, and encrypted reasoning items.
  • Model Performance on Different Tasks

    • For MLE-Bench and Kaggle-like GPU workloads, the ChatGPT agent scores highest with a 9% bronze pass rate.
    • On SWE-Lancer, it is also the best performer.
    • GPT-5's reasoning model is strong on code-centric debugging and replication, while the routed agent is better on long-horizon, multi-skill workloads.
  • Pricing and Network Effects

    • GPT-5 is priced at $1.25 per million input tokens and $10 per million output tokens, about half the input cost of GPT-4o.
    • Opening GPT-5 to everyone immediately locks in massive network effects, with new and existing users upgrading and spending more.
  • Model Reliability Improvements

    • Against open-ended factuality sets, GPT-5 models show lower hallucination rates than OpenAI o3 and prior baselines.
    • METR's autonomy review concludes it is unlikely GPT-5 would have certain negative impacts.
  • Structured Outputs Maturity

    • OpenAI's structured outputs have matured with strict JSON Schema enforcement and single flag.
    • It pairs well with function calling and reduces glue code in extraction and integration pipelines.
  • User Response and Ecosystem Impact

    • User response on Reddit has been volatile since the GPT-5 rollout, with disappointment over tone changes, rate limits, and model removals.
    • Tech press reported OpenAI restoring GPT-4o as an option.
    • For teams shipping against the ChatGPT runtime, it's important to instrument user sentiment and plan reversibility.
  • Learning Resources

    • Developers can refer to the GPT-5 system card and follow other OpenAI coverage on InfoQ.
阅读 49
0 条评论