《SQL 查询优化之艺术》

  • SQL Overview: SQL is declarative, specifying query results only. The DBMS determines exact steps. Multiple ways exist to calculate query results, like using an index or sequential scan. The query optimizer finds the most efficient plan.
  • Plan Explorer: A tool that iterates over a two-dimensional search space and executes SQL queries for each parameter combination. It generates visualizations showing plan changes and tuple counts. It's based on Picasso's ideas and implemented as a modern web app.

    • Server Mode: Initially a standalone website using PGlite in the browser. Added an optional server mode to send queries to a REST endpoint on a web server, which forwards them to a PostgreSQL server. The architecture involves a browser with a WebUI and PGlite, and a proxy server in server mode communicating with a PostgreSQL database. In server mode, queries can be executed with or without analysis.
  • Generated Drawings: The tool creates drawings including used query plans, expected costs, actual execution time, estimated and actual number of result tuples, and the difference between them.
  • Example Query Discussion: For a self-join query on the data table with specific filter conditions and a search space, the query plan explorer reveals five different query plans used by PostgreSQL. Two of these plans are discussed in detail.

    • Query Plan 1: Has a fingerprint of Hash Join > Seq Scan(d1) > Hash > Seq Scan(d2). It uses a hash join with a sequential scan on one side and a hash operator reading from the data table on the other. PostgreSQL changes the join type from LEFT JOIN to INNER JOIN as no join partners with NULL key attributes would pass the filter.
    • Query Plan 2: Similar to the first plan but uses an index scan for the input of the hash operation instead of a sequential scan.
  • Expected and Actual Values: Drawings for expected total cost, actual time, expected and actual tuples, and the difference between them are shown. PostgreSQL's predictions can be incorrect, and using extended statistics might help.
  • Conclusion: The plan explorer is open source at GitHub. It gives insights into PostgreSQL's query optimizer decisions and mispredictions. The new server mode allows analysis of large datasets and custom PostgreSQL extensions.
阅读 11
0 条评论