圆周率 - 5 万亿位 - SegmentFault 思否

5 Trillion Digits of Pi Record: In 2010, Shigeru Kondo and Alexander J. Yee computed 5 trillion digits of Pi. The main computation took 90 days on Kondo's desktop with specific hardware specs. One error was detected and corrected via software ECC.
- Hardware: 2 x Intel Xeon X5680 @ 3.33 GHz (12 physical cores, 24 hyperthreaded), 96 GB DDR3 @ 1066 MHz, Asus Z8PE-D12 motherboard, 1 TB SATA II (boot drive), 3 x 2 TB SATA II (store Pi output), 16 x 2 TB SATA II (computation), 2 x LSI MegaRaid SAS 9260-8i raid controller, Windows Server 2008 R2 Enterprise x64.
- Software: y-cruncher v0.5.4.9138 Alpha (PB) - x64 SSE4.1 ~ Ushio. For verification, y-cruncher BBP v1.0.119 was used.
Computation Details:
- Steps: Series Summation (73 days), Square Root (63.3 hours), Final Multiply (41.6 hours), Base Conversion (8.2 days), Verify Base Conversion (45.6 hours).
- Verification: Hexadecimal digits verified by direct computation and Modular Hash Check. Decimal digits verified using specific formulas.
Arithmetic Algorithms:
- Multiplication: Uses Basecase, Karatsuba, Floating-Point Fast Fourier Transform (FFT), and Hybridized Number-Theoretic Transforms (Hybrid NTT) algorithms. FFT is highly optimized with SSE2, SSE3, and SSE4.1 instructions. Hybrid NTT is multi-threaded and uses both integer and floating-point instructions.
- Division and Square Roots: Use straight forward First-Order Newton's Method with no particular optimizations.
- Radix Conversion: Uses Scaled Remainder Trees.
Scalability:
- Multi-Threading: Uses Binary Splitting and FFT-based Multiplication with Binary Recursive Thread-Spawning for parallelism.
- Multiple Hard Drive Support: Supports software RAID 0 with 16 hard drives for increased disk bandwidth.
- Minimizing Bandwidth Consumption: Uses various techniques like in-place additions, minimizing Modular Hash Checks, and using Hybrid NTT instead of FFT for large operations.
- Thread-Spamming and Load-Balancing: Uses Size-Balancing and Thread-Spamming to handle load-balancing issues in Binary Splitting.
Error-Detection and Correction: Uses a simple "Automatic Repeat Request" model with Modular Hash Check for redundancy. Redundancy checks are added at various places like Binary Splitting, Radix Conversion, and Multiplication.
Checkpointing and Restartability: Has a checkpointing feature to handle power outages and other issues. Can restart from checkpoints.
Opportunities for Improvement:
- Sub-optimal Cache Use: Written for small computations, has problems with memory-thrashing for large computations.
- Instruction Set Improvements: Does not fully utilize SSE instructions.
- Mathematical Improvements: Does not use Middle Product, GCD Factorization, and Reuse Redundant Transforms for FFT-based Multiplication optimizations.
- Algorithmic Improvements: Considers Classic Number-Theoretic Transform, Carry-Delay, and Hybridized Number-Theoretic Transform (version 2) but has not implemented them.
Detailed Timeline: Computation started on May 4, 2010, with various checkpoints and progress updates. Final verification completed on August 2, 2010.
Frequently Asked Questions: Answers various questions related to the computation, software, and author.
Special Thanks: Thanks Kei Simmel for translation and Shigeru Kondo for help.
About Me: 22-year-old senior-undergraduate student from Northwestern University, double major in Computer Science and Electrical Engineering, with hobbies like Bowling, Piano, Video Games, and subtitled Japanese Anime. Can be contacted via email.