DeepSeek 常见问题解答

  • Date and Topic: Monday, January 27. Discussion about DeepSeek and its models.
  • Previous Post: Wrote about R1 last Tuesday (https://stratechery.com/2025/...
  • Huawei Mate 60 Pro and Chip Ban: In September 2023, Huawei announced the Mate 60 Pro with a SMIC-manufactured 7nm chip. The reaction in Washington D.C. to this was overwrought, leading to an expansion of the chip ban. People's lack of understanding of chip production led to this surprise.
  • DeepSeek's Models:

    • R1: A reasoning model similar to OpenAI's o1, with the ability to think through problems and produce high-quality results. R1-Zero is more significant, developed through pure reinforcement learning without human feedback, emerging with reasoning behaviors and "Aha Moments". It was trained with a small amount of cold-start data and a multi-stage training pipeline.
    • V3: Introduced important breakthroughs like DeepSeekMoE and DeepSeekMLA. DeepSeekMoE split the model into experts and optimized load-balancing during training. DeepSeekMLA compressed the key-value store during inference. V3 was cheap to train, taking 2,788 thousand H800 GPU hours at a cost of $2/GPU hour.
  • Model Training and Distillation: Distillation is a means of extracting understanding from other models. It is widely used in model training and is likely that DeepSeek distilled models like OpenAI's 4o. This is a concern for leading edge models as it allows others to free-ride on the training investment.
  • Impact on Big Tech: In the long run, model commoditization and cheaper inference are great for Big Tech. Microsoft benefits from providing cheaper inference, Amazon can serve high-quality open source models at lower costs, and Apple's hardware is well-suited for inference. Meta is the biggest winner as cheaper inference makes its AI vision more achievable. Google may be in a worse shape as decreased hardware requirements lessen its relative advantage.
  • Nvidia's Situation: Nvidia has two moats: CUDA and its ability to combine multiple chips. DeepSeek's efficiency shows that there is an alternative to paying more for Nvidia's chips. However, there are still factors in Nvidia's favor, such as the potential of its approach on H100s and GB100s, and the increase in usage due to lower inference costs. But DeepSeek's efficiency casts doubt on Nvidia's growth story in the near term.
  • Chip Ban and Its Implications: The chip ban has been accentuated by the U.S.'s loss of software lead. Earlier iterations of the ban may have led to DeepSeek's innovations, and the ban's primary outcome may be the crash in Nvidia's stock price. The mindset behind the chip ban is concerning as it focuses on denying innovation rather than competing through it.
  • OpenAI and Its Actions: OpenAI's closed source approach and lobbying for regulations have been criticized. The 2023 Biden Executive Order on AI is seen as an attempt to control AI. OpenAI's gambit for control has failed, and more innovation could have been achieved with open weights.
  • China and Open Source: DeepSeek is open-sourcing its models, which attracts talent and is seen as a cultural behavior. This is contrary to most U.S. companies' focus on differentiated products. China benefits from having access to DeepSeek and may see a further unleashing of innovation.
  • Overall Impact: DeepSeek has provided a gift to nearly everyone. Consumers and businesses will benefit from free AI products and services. Big consumer tech companies are winners, and China is also a big winner in the long run. America has a choice between doubling down on defensive measures or competing.
阅读 8
0 条评论