圆周率 - 5 万亿位

  • 5 Trillion Digits of Pi Record: In 2010, Shigeru Kondo and Alexander J. Yee computed 5 trillion digits of Pi. The main computation took 90 days on Kondo's desktop with specific hardware specs. One error was detected and corrected via software ECC.

    • Hardware: 2 x Intel Xeon X5680 @ 3.33 GHz (12 physical cores, 24 hyperthreaded), 96 GB DDR3 @ 1066 MHz, Asus Z8PE-D12 motherboard, 1 TB SATA II (boot drive), 3 x 2 TB SATA II (store Pi output), 16 x 2 TB SATA II (computation), 2 x LSI MegaRaid SAS 9260-8i raid controller, Windows Server 2008 R2 Enterprise x64.
    • Software: y-cruncher v0.5.4.9138 Alpha (PB) - x64 SSE4.1 ~ Ushio. For verification, y-cruncher BBP v1.0.119 was used.
  • Computation Details:

    • Steps: Series Summation (73 days), Square Root (63.3 hours), Final Multiply (41.6 hours), Base Conversion (8.2 days), Verify Base Conversion (45.6 hours).
    • Verification: Hexadecimal digits verified by direct computation and Modular Hash Check. Decimal digits verified using specific formulas.
  • Arithmetic Algorithms:

    • Multiplication: Uses Basecase, Karatsuba, Floating-Point Fast Fourier Transform (FFT), and Hybridized Number-Theoretic Transforms (Hybrid NTT) algorithms. FFT is highly optimized with SSE2, SSE3, and SSE4.1 instructions. Hybrid NTT is multi-threaded and uses both integer and floating-point instructions.
    • Division and Square Roots: Use straight forward First-Order Newton's Method with no particular optimizations.
    • Radix Conversion: Uses Scaled Remainder Trees.
  • Scalability:

    • Multi-Threading: Uses Binary Splitting and FFT-based Multiplication with Binary Recursive Thread-Spawning for parallelism.
    • Multiple Hard Drive Support: Supports software RAID 0 with 16 hard drives for increased disk bandwidth.
    • Minimizing Bandwidth Consumption: Uses various techniques like in-place additions, minimizing Modular Hash Checks, and using Hybrid NTT instead of FFT for large operations.
    • Thread-Spamming and Load-Balancing: Uses Size-Balancing and Thread-Spamming to handle load-balancing issues in Binary Splitting.
  • Error-Detection and Correction: Uses a simple "Automatic Repeat Request" model with Modular Hash Check for redundancy. Redundancy checks are added at various places like Binary Splitting, Radix Conversion, and Multiplication.
  • Checkpointing and Restartability: Has a checkpointing feature to handle power outages and other issues. Can restart from checkpoints.
  • Opportunities for Improvement:

    • Sub-optimal Cache Use: Written for small computations, has problems with memory-thrashing for large computations.
    • Instruction Set Improvements: Does not fully utilize SSE instructions.
    • Mathematical Improvements: Does not use Middle Product, GCD Factorization, and Reuse Redundant Transforms for FFT-based Multiplication optimizations.
    • Algorithmic Improvements: Considers Classic Number-Theoretic Transform, Carry-Delay, and Hybridized Number-Theoretic Transform (version 2) but has not implemented them.
  • Detailed Timeline: Computation started on May 4, 2010, with various checkpoints and progress updates. Final verification completed on August 2, 2010.
  • Frequently Asked Questions: Answers various questions related to the computation, software, and author.
  • Special Thanks: Thanks Kei Simmel for translation and Shigeru Kondo for help.
  • About Me: 22-year-old senior-undergraduate student from Northwestern University, double major in Computer Science and Electrical Engineering, with hobbies like Bowling, Piano, Video Games, and subtitled Japanese Anime. Can be contacted via email.
阅读 6
0 条评论