查询计划多久是最优的？

发布于 2025-08-03

Query Optimizer Basic Promise: Picks the “optimal” query plan based on cost estimates from selectivity and basic resource costs, but often makes mistakes.
Uniform Data Set: For selectivities ~1 - 5%, planner picks index scan which is a poor choice as simple sequential scan is faster. Index scans never win for this query with bitmap scans dominating. The reason is bitmap scans perform prefetching while index scans don't.
Cyclic (fuzz = 1) Data Set: A bit better than uniform data set but still makes mistakes like temporarily flipping to index scan and early seqscan.
Linear (fuzz = 10) Data Set: Not great, planner is more persistent in picking index scan even for low selectivities. Bitmap scan would be a better choice and only at very high selectivities (~90%) does the plan correctly flip to seqscan.
Linear (no fuzz) Data Set: The only case where consistently picks the right (fastest) plan. Little difference between index and bitmap scans and data set is good for kernel readahead.
Summary: Results from Ryzen 9900X and NVMe RAID. Runs with cold cache show differences between plans. With warm cache some differences disappear. Planner picks based on stats and cost model which may not be the best. Adjusting cost parameters may not work for all data sets. Complex queries have more complicated behavior. Improving planning involves better stats and cost model but always a lossy compression. Cost model is a rough approximation. Cost-based planning is the best approach but emphasizes plan robustness and the need to adjust if not right. Feedback can be sent to [mailto:tomas@vondra.me].

阅读 50