康拉德·欣森的博客

  • Why computational reproducibility matters: Thirty years after dealing with computational reproducibility issues, many have improved but people have lost touch with its daily reality. Reproducibility is important but is becoming a checklist. Alice and Bob's example shows the need for effective debugging in reproducibility.
  • Two popular recipes for reproducibility: Using Docker just packages code and environment as a container image, solving one issue but making it impossible to explore replicability. Using conda is not fundamentally different from Debian's package manager and is worse as users focus on bleeding-edge code.
  • Fundamental reason for reproducibility issue: "Debian 12" is not a precise specification for a computational environment. Packages get updated and installation order matters. A precisely specified environment would solve reproducibility issues.
  • How to provide a precisely specified computational environment: A container image can do it but doesn't allow exploring replicability. A precisely specified and reproducible environment means being able to run it as-is and get the same results, and also change and see the difference. Bit-for-bit reproducibility is easy in theory but difficult in practice as current infrastructure is not designed for it.
  • Solution: It is possible to make a computational environment bit-for-bit reproducible using tools like Debian snapshots or Guix, but they lack user-friendly interfaces and more work is needed. Bit-for-bit reproducibility should be pushed into the infrastructure to solve the issue once and for all.
阅读 17
0 条评论