圆周率 - 5 万亿位

  • 5 Trillion Digits of Pi Record: In 2010, Shigeru Kondo and Alexander J. Yee computed 5 trillion digits of Pi. The main computation took 90 days on Kondo's desktop with specific hardware specs. One error was detected and corrected via software ECC.

    • Hardware: 2 x Intel Xeon X5680 @ 3.33 GHz (12 physical cores, 24 hyperthreaded), 96 GB DDR3 @ 1066 MHz, Asus Z8PE-D12 motherboard, 1 TB SATA II (boot drive), 3 x 2 TB SATA II (store Pi output), 16 x 2 TB SATA II (computation), 2 x LSI MegaRaid SAS 9260-8i raid controller, Windows Server 2008 R2 Enterprise x64.
    • Software: y-cruncher v0.5.4.9138 Alpha (PB) - x64 SSE4.1 ~ Ushio. For verification, y-cruncher BBP v1.0.119 was used.
  • Computation Details:

    • Steps: Series Summation (73 days), Square Root (63.3 hours), Final Multiply (41.6 hours), Base Conversion (8.2 days), Verify Base Conversion (45.6 hours).
    • Verification: Hexadecimal digits verified by direct computation and Modular Hash Check. Decimal digits verified using specific formulas.
  • Arithmetic Algorithms:

    • Multiplication: Uses Basecase, Karatsuba, Floating-Point Fast Fourier Transform (FFT), and Hybridized Number-Theoretic Transforms (Hybrid NTT) algorithms. FFT is highly optimized with SSE2, SSE3, and SSE4.1 instructions. Hybrid NTT is multi-threaded and uses both integer and floating-point instructions.
    • Division and Square Roots: Use straight forward First-Order Newton's Method with no particular optimizations.
    • Radix Conversion: Uses Scaled Remainder Trees.
  • Scalability:

    • Multi-Threading: Uses Binary Splitting and FFT-based Multiplication with Binary Recursive Thread-Spawning for parallelism.
    • Multiple Hard Drive Support: Supports software RAID 0 with 16 hard drives for increased disk bandwidth.
    • Minimizing Bandwidth Consumption: Uses various techniques like in-place additions, minimizing Modular Hash Checks, and using Hybrid NTT instead of FFT for large operations.
    • Thread-Spamming and Load-Balancing: Uses Size-Balancing and Thread-Spamming to handle load-balancing issues in Binary Splitting.
  • Error-Detection and Correction: Uses a simple "Automatic Repeat Request" model with Modular Hash Check for redundancy. Redundancy checks are added at various places like Binary Splitting, Radix Conversion, and Multiplication.
  • Checkpointing and Restartability: Has a checkpointing feature to handle power outages and other issues. Can restart from checkpoints.
  • Opportunities for Improvement:

    • Sub-optimal Cache Use: Written for small computations, has problems with memory-thrashing for large computations.
    • Instruction Set Improvements: Does not fully utilize SSE instructions.
    • Mathematical Improvements: Does not use Middle Product, GCD Factorization, and Reuse Redundant Transforms for FFT-based Multiplication optimizations.
    • Algorithmic Improvements: Considers Classic Number-Theoretic Transform, Carry-Delay, and Hybridized Number-Theoretic Transform (version 2) but has not implemented them.
  • Detailed Timeline: Computation started on May 4, 2010, with various checkpoints and progress updates. Final verification completed on August 2, 2010.
  • Frequently Asked Questions: Answers various questions related to the computation, software, and author.
  • Special Thanks: Thanks Kei Simmel for translation and Shigeru Kondo for help.
  • About Me: 22-year-old senior-undergraduate student from Northwestern University, double major in Computer Science and Electrical Engineering, with hobbies like Bowling, Piano, Video Games, and subtitled Japanese Anime. Can be contacted via email.
阅读 17
0 条评论