正在加载世界!在 4 小时内导入 OpenStreetMap | Crunchy Data 博客

  • OpenStreetMap (OSM) Database: Builds almost 750GB of location data from a single file download and takes a full day to run. A fresh load involves a massive write process and large index builds, and is a great performance stress-test for any Postgres system.
  • Tune Your Instrument: Use bare metal hardware with specific Postgres settings for loading, such as max_wal_size = 256GB, shared_buffers = 48GB, effective_cache_size = 64GB, maintenance_work_mem = 20GB, work_mem = 1GB, and prioritize bulk load with settings like checkpoint_timeout = 60min, synchronous_commit = off, wal_level = minimal, max_wal_senders = 0, fsync = off, autovacuum = off, and full_page_writes = off.
  • How PostgreSQL has Improved: The biggest speed gains are from improvements in the GIST index building code in PostgreSQL 15, which pre-sorts index pages before merging and can speed up performance significantly. Tests showed a 16% speedup, 15% size reduction, and 86% GIST index build speedup from PostgreSQL 14 to 15. There have been further improvements in PostgreSQL 16 and 17 in B-Tree index building, but the osm2pgsql benchmark doesn't show them as much due to the GIST index time build times.
  • How osm2pgsql has improved: In Q3 2022, osm2pgsql 1.7 made the [Middle Way Node Index ID Shift] the new default, which compresses the database's largest index and trades off lookup and update performance for a smaller footprint. This improvement dropped loading times by 37% and reduced the database size from 1000GB to under 650GB.
  • How hardware has improved: During data import, the osm2pgsql workload writes heavily at medium queue depths. The best results come from SSDs with oversized SLC caches. Increasing the --number-processes parameter in osm2pgsql can improve performance, but be careful not to go too far and run out of memory. Processor advances have seen an Intel i7-13600K overtaking the AMD R5 7700X in 2023 and an i9-14900K in 2024, but the i9-14900K had voltage bugs. AMD's 9600X was able to complete the test.
  • My best OSM import results to date: Completed in less than 4 hours with PostgreSQL 17, which is about 3% better than PostgreSQL 16 when replication is used. Looking forward to following up on this benchmark after the Intel system is fixed.
阅读 13
0 条评论