Airbnb 如何使用大型语言模型加速测试迁移

  • Airbnb's Codebase Update with LLMs: Thanks to the right mix of workflow automation and large language models, Airbnb accelerated the process of updating their codebase to adopt React Testing Library (RTL) and converted nearly 3.5K React test files originally using Enzyme.

    • Success Factors in Code Generation: When applying LLMs to code generation, success depends on variables like the prompt and context, which may include source files, examples, etc.
    • Lesson Learned: Prompt engineering was less effective than retrying conversion multiple times until it worked (brute force approach).
  • Migration Process Steps: Broke down the migration process into refactoring Enzyme to RTL, fixing Jest test errors, running the linter, and the TypeScript compiler. This step-based approach provided a solid foundation for the automation pipeline, enabling tracking progress and rerunning as needed.
  • Concurrent Migration Advantage: Another advantage was the possibility of running the migration concurrently for hundreds of files at a time.
  • Retry Process at Each Step: If a validation step failed, the LLM was prompted with the errors and asked to fix them. This loop was repeated until no errors or a maximum number of repetitions was reached.
  • Prompt Expansion: By the end of the migration, prompts expanded to 40,000 to 100,000 tokens, pulling in 50 related files and examples.
  • Effectiveness of Retry-Loop Approach: Proved effective for simple-to-medium complexity files, migrating 75% in four hours with ten iterations. But left a long tail of about 900 files.
  • Less Automated Approach for Long-Tail Files: Adopted a "sample, tune, sweep" strategy for long-tail files, analyzing each failed case, updating prompts and scripts, and rerunning. These files required 50 to 100 retries, making the process slower. After four days, 97% of the files were converted, and the remaining less than 100 were fixed manually.
  • Overall Impact: Leveraging LLMs condensed an estimated 1.5-year engineering project into six weeks while preserving original test intent and code coverage.
阅读 9
0 条评论