Less is more: PostgreSQL queries from minutes to milliseconds

English version published on Flexport Engineering Blog

Introduction

Using more indexes does not necessarily improve database performance. The principle of "less is more" also applies to the database world. This article will describe our experience in speeding up a PostgreSQL query thousands of times, from minutes to milliseconds, by using fewer indexes.

question

The Flexport platform enables Flexport customers, operators and other users to communicate by messaging. The data for this function is stored in a message table on AWS Aurora PostgreSQL that scales to tens of millions of rows.

The main query on the message table is usually extremely fast, but recently it has experienced some intermittent slow query timeouts. The slow query not only affects the user experience of the message function, but also increases the load of the entire system and slows down the user experience of other functions.

The query looks like this:

 SELECT messages.* FROM messages
 WHERE messages.deleted_at IS NULL AND messages.namespace = ?
 AND (
       jsonb_extract_path_text(context, 'topic') IN (?, ?)
       OR jsonb_extract_path_text(context, 'topic') LIKE ?
     )
 AND ( context @> '{"involved_parties":[{"id":1,"type":1}]}'::jsonb )
ORDER BY messages.created_at ASC

It uses some PostgreSQL-specific syntax. So what is this query doing?

That @> symbol is PostgreSQL's " include " operator. Please refer to the official documentation. In short, it is a subset operation that supports multiple levels to determine whether the left side "contains" the right side (for example, whether a large object contains a small object). This operator can be used on some structured types, such as arrays, ranges, or JSONB (the JSON binary representation in PostgreSQL).

The message table has an integer id column, a JSONB context column, and other columns. The JSONB object in the context column contains descriptive properties for a message, for example, the topic property describes the subject of the message, and the involved_parties property describes the list of legal entities that have permission to view the message.

There are two indexes on the context column:

GIN index on context column
BTREE expression index on jsonb_extract_path_text(context, 'topic') expression

GIN is an indexing engine provided by PostgreSQL for complex values, generally used for data structures such as arrays, JSON or text. GIN is designed to index data whose internal structure can be subdivided so that subdata within the data can be found. BTREE is PostgreSQL's default indexing engine and can perform equality comparisons or range queries on simple values. Expression indexes are a powerful index type provided by PostgreSQL that can index an expression (rather than a column). The JSONB type can only use an indexing engine such as GIN, because BTREE only supports scalar types (which can be understood as "simple value types without internal structure"). Therefore, the jsonb_extract_path_text(context, 'topic') expression on the context column can be indexed with BTREE because it returns a string type. Unlike the uniform and consistent representation format of BTREE indexes, the contents of GIN indexes can vary widely depending on the data types and operator types used . And considering the high diversity of query parameters, GIN indexes are more suitable for some specific queries, unlike BTREE indexes, which are widely used for equality comparison and range queries.

Initial investigation

A query usually performs an index scan to filter first, and then performs a table scan on the filtered range (a special case is that when the index scan is sufficient to cover all the required data columns, a table scan is not required). In order to maximize performance, the index should have a good selectivity to narrow the scope to reduce or even avoid subsequent table scans. Condition context @> '{"involved_parties":[{"id":1,"type":1}]}'::jsonb can use GIN index on context column, but this is not a good choice because {" id":1,"type":1} This value is a special value that exists in most rows (this number is very special, like the administrator's number). Therefore, GIN indexes are poorly selective for this condition. In fact, the other conditions in this query already provide a good degree of selectivity, so there is never any need to use an index for this condition. Now we need to know whether this query actually uses a GIN index on this condition.

We can get the query plan by executing a statement of the form "EXPLAIN ANALYZE {the query statement}" in the SQL console. This statement actually executes the query and returns the selected query plan (with the actual time spent on it).

However, we only get a fast query plan, which does not use this GIN index. The investigation ran into trouble.

Dive into technical details

First, read this fast query plan through a visual representation.

Quick query plan

This link is highlighted for better reading

 QUERY PLAN
------------------------------------------------------------------------------
 Sort  (cost=667.75..667.76 rows=3 width=911) (actual time=0.093..0.094 rows=7 loops=1)
   Sort Key: created_at
   Sort Method: quicksort  Memory: 35kB
   ->  Bitmap Heap Scan on messages  (cost=14.93..667.73 rows=3 width=911) (actual time=0.054..0.077 rows=7 loops=1)
         Recheck Cond: ((jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) = ANY ('{?,?}'::text[])) OR (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~~ '?%'::text))
         Filter: ((deleted_at IS NULL) AND (context @> '{"involved_parties": [{"id": 1, "type": 1}]}'::jsonb) AND ((namespace)::text = '?'::text) AND ((jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) = ANY ('{?,?}'::text[])) OR (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~~ '?%'::text)))
         Heap Blocks: exact=7
         ->  BitmapOr  (cost=14.93..14.93 rows=163 width=0) (actual time=0.037..0.037 rows=0 loops=1)
               ->  Bitmap Index Scan on index_messages_on_topic_key_string  (cost=0.00..10.36 rows=163 width=0) (actual time=0.029..0.029 rows=4 loops=1)
                     Index Cond: (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) = ANY ('{?,?}'::text[]))
               ->  Bitmap Index Scan on index_messages_on_topic_key_string  (cost=0.00..4.57 rows=1 width=0) (actual time=0.007..0.007 rows=7 loops=1)
                     Index Cond: ((jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~>=~ '?'::text) AND (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~<~ '?'::text))
 Planning time: 0.313 ms
 Execution time: 0.138 ms

Visual representation of fast query plans

As shown in the figure below, the query plan is like a call stack. The query is expanded top-down into a tree, which is then evaluated bottom-up. For example, two leaf nodes (Bitmap Index Scan) are evaluated first, the result is merged by the parent node (BitmapOr), then returned to the ancestor node (Bitmap Heap Scan), and so on. Note that index 1 and index 2 in the leaf node are actually the same BTREE expression index, but are used for two scans with different conditions.

A PostgreSQL query can scan multiple indexes at the same time and combine the results . The slowest index determines the overall performance of the index scan. This is usually more efficient than a single index scan, because multiple indexes typically filter the dataset to a greater extent, resulting in fewer disk reads (further scans of the filtered dataset, if the data is not in the memory cache, read from disk). There is a step called Bitmap Heap Scan, which combines the results of index scans, so you can see the words "BItmap Heap Scan" in the query plan.

Further investigation

Based on the available information, the main focus is on these points:

It's unlikely to be due to resource contention, as contention should affect all SQL statements equally, not just SELECT statements, but only slow SELECTs are observed?
It may be that the same query occasionally chooses a different plan even with the same parameters .
Could also be a GIN related issue. Gitlab has had issues with GIN .

One conjecture is that "data and statistics updates cause the query planner to occasionally choose a slow query plan that uses a GIN index".

Conjecture or not, for this situation, the best course of action is to improve observability. Missing some data? OK, collect it! We use some code to log query execution time and only grab slow query plans when the query is really slow. Then it turns out that the cause is indeed an unstable query plan: the query planner usually chooses a fast query plan, but occasionally a slow query plan.

slow query plan

This GitHub link is highlighted for better reading

 QUERY PLAN
------------------------------------------------------------------------------
 Sort  (cost=540.08..540.09 rows=3 width=915)
   Sort Key: created_at
   ->  Bitmap Heap Scan on messages  (cost=536.03..540.06 rows=3 width=915)
         Recheck Cond: (((jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) = ANY ('{?,?}'::text[])) OR (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~~ '?%'::text)) AND (context @> '{"involved_parties": [{"id": 1, "type": 1}]}'::jsonb))
         Filter: ((deleted_at IS NULL) AND ((namespace)::text = '?'::text) AND ((jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) = ANY ('{?,?}'::text[])) OR (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~~ '?%'::text)))
         ->  BitmapAnd  (cost=536.03..536.03 rows=1 width=0)
               ->  BitmapOr  (cost=20.13..20.13 rows=249 width=0)
                     ->  Bitmap Index Scan on index_messages_on_topic_key_string  (cost=0.00..15.55 rows=249 width=0)
                           Index Cond: (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) = ANY ('{?,?}'::text[]))
                     ->  Bitmap Index Scan on index_messages_on_topic_key_string  (cost=0.00..4.57 rows=1 width=0)
                           Index Cond: ((jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~>=~ '?'::text) AND (jsonb_extract_path_text(context, VARIADIC '{topic}'::text[]) ~<~ '?'::text))
               ->  Bitmap Index Scan on index_messages_on_context  (cost=0.00..515.65 rows=29820 width=0)
                     Index Cond: (context @> '{"involved_parties": [{"id": 1, "type": 1}]}'::jsonb)

(This query plan is from EXPLAIN, due to EXPLAIN ANALYZE timeout)

Visual representation of slow query plans

As can be seen from the figure below, this slow query plan is more complex than the previous fast query plan. It has an additional "BitmapAnd" and a "Bitmap Index Scan" node that scans index 3 (index 3 is the GIN index on the context column). If index 3 is inefficient, the overall performance will decrease.

The query planner works well when the cost estimates are accurate. But cost estimates for GIN indexes on JSONB are inaccurate. From the observations, it considers this index to have a selectivity of 0.001 (which is a hard-coded fixed value), which means it assumes that any relevant query will select 0.1% of all rows in the table, but in our scenario it actually will select 90% of the rows, so this assumption does not hold. Incorrect assumptions cause the query planner to underestimate the cost of slow query plans. Although the JSONB type column also has some statistics, it seems to have no effect.

solution

We cannot drop this GIN index because other queries depend on it. However, there is a simple solution to avoid using the GIN index for this query. In general, if the left-hand side of a conditional operator is not a simple column name but an expression, then no index is used unless there is a corresponding expression index. For example, if there is an index on the id column, the condition "id = ?" will use the index, but the condition "id+0 = ?" will not use the index. In our scenario, the left side of the @> operator is the context column, so it can be changed to the path access expression "context->'involved_parties'" (parameter values are updated accordingly).

The original query condition is
context @> '{"involved_parties":[{"id":1,"type":1}]}'::jsonb

The modified query condition is
context->'involved_parties' @> '[{"id":1,"type":1}]'::jsonb

experiment

We can test this problematic condition with a simple example. This experiment stably reproduces the different effects of the two conditions, giving us the confidence to bring this optimization online.

Design two simple query statements:

Statement 1 before optimization

 SELECT * FROM messages
WHERE context @> ‘{“involved_parties”:[{“id”:1,”type”:1}]}’::jsonb LIMIT 100

Optimized Statement 2

 SELECT * FROM messages
WHERE context->”involved_parties” @> ‘[{“id”:1,”type”:1}]’::jsonb LIMIT 100

Because statement 2 only does a table scan and not an index scan, use LIMIT 100 to ensure that it stops when it finds 100 rows that satisfy the condition without scanning the entire table. In statement 1, the GIN index scan is always an unordered bitmap index scan, which must scan the entire index and cannot take advantage of LIMIT 100. Note that a LIMIT value like 10 will be below a certain threshold to only use table scans, and only above this threshold will index scans be enabled, so use a larger LIMIT value like 100 (this threshold depends on cost estimates, according to We test maybe 20).

The resulting query plan is:

Statement 1 before optimization

 QUERY PLAN
------------------------------------------------------------------------------
Limit  (cost=2027.11..2399.74 rows=100 width=915) (actual time=6489.987..6490.102 rows=100 loops=1)
 ->  Bitmap Heap Scan on messages  (cost=2027.11..113145.85 rows=29820 width=915) (actual time=6489.986..6490.093 rows=100 loops=1)
       Recheck Cond: (context @> '{"involved_parties": [{"id": 1, "type": 1}]}'::jsonb)
       Heap Blocks: lossy=10
       ->  Bitmap Index Scan on index_messages_on_context  (cost=0.00..2019.65 rows=29820 width=0) (actual time=6477.838..6477.839 rows="millions(confidential number)" loops=1)
             Index Cond: (context @> '{"involved_parties": [{"id": 1, "type": 1}]}'::jsonb)
 Planning time: 0.076 ms
 Execution time: 6490.920 ms

Optimized Statement 2

 QUERY PLAN
------------------------------------------------------------------------------
Limit  (cost=0.00..13700.25 rows=100 width=915) (actual time=0.013..0.114 rows=100 loops=1)
  ->  Seq Scan on messages  (cost=0.00..4085414.08 rows=29820 width=915) (actual time=0.013..0.106 rows=100 loops=1)
        Filter: ((context -> 'involved_parties'::text) @> '[{"id": 1, "type": 1}]'::jsonb)
Planning time: 0.058 ms
Execution time: 0.135 ms

For this case, the optimized query is 50,000 times faster.

Real improvement in production environment

The chart below shows the dramatic improvement after going live in production.

About the same number of API requests per day (5~10k requests per 6 hour time window)

Significant reduction in errors since Dec 19

Significant reduction in delays since December 19 (Dec 19)

some useful principles

We summarize some principles of database performance (especially applicable to PostgreSQL).

Principle 1: Less is more

Managed Indexes <br>More indexes don't mean better performance. In fact, each additional index degrades the performance of write operations. If the query planner chooses an inefficient index, the query will still be slow.
Do not stack indexes (eg indexing every column is not advisable). Try removing as many indexes as possible. And every time you change an index, you need to monitor its impact on performance.

Prefer simple database design
The data in RDBMS (relational database system) is generally suitable for normalized design. JSON or JSONB is a NoSQL-style de-normalization design.
Which is better, normalization or denormalization? From a business point of view, it is necessary to analyze the specific situation. From an RDBMS perspective, normalization is always simpler and better, while denormalization can be complementary in some cases.

Recommendation 1: Consider designing your data model from a DDD (Domain Driven Design) perspective.

Entities can always be modeled as tables, and value objects can always be embedded in entities (and sometimes, for performance reasons, large value objects can also be modeled as tables).
If the target entity of an association is an aggregate root, it must not be embedded and stored elsewhere (but must be a table of its own). But if the target entity of the association is not an aggregate root, and the source entity of the association is a self-contained aggregate root, then the target entity can be embedded and saved.

Recommendation 2: Nullable columns in modern RDBMS are very efficient , don't worry too much about performance, if multiple nullable columns are the most concise way to model optional attributes, don't hesitate, and more Don't think of JSONB as "optimized" for nullable columns .

Principle 2: Statistics need to be accurate

PostgreSQL maintains statistics for each table , including but not limited to tuple number, page number, most common values, histogram bounds, and different values number of distinct values (may be equivalent to set cardinality). Some statistics are sampled and not accurate enough. The query planner generates multiple possible plans for the query, estimates the cost based on statistics and rules, and selects the most efficient plan. The quality of the query plan depends on the accuracy of the statistics. Accurate data leads to excellent execution (which is also a good principle in data science and data-driven business).

As mentioned, the cost estimates for GINs on JSONB are inaccurate. Whereas the cost estimates for BTREE on scalar types are much more accurate. So JSONB is not suitable for some situations . In order to pursue efficiency, as a workaround, you can create a BTREE expression index for a scalar type property of JSONB. This article from ScaleGrid is a good introduction to how to use JSONB and GIN efficiently .

Recommendation: PostgreSQL has features such as expression indexes and partial indexes that are both powerful and cost-effective. As long as it is considered to be beneficial based on data analysis, it is worth choosing.

Principle 3: Improve observability

Whether or not we have speculation about the underlying root cause of the problem, improving observability is the best thing to do. Query logs can prove that slow queries, not application code or connection waits, are causing slow requests. Automatic EXPLAIN captures the actual query plan used by slow queries.

APM (Application Performance Management) like Datadog is also an important tool. It provides a lot of insight:

Is this problem caused by insufficient resources? No, insufficient resources should affect any SQL CRUD statement equally, but we only observed slow SELECTs.
Does this problem occur at the same time every day? No, it can happen anytime.
Is each occurrence an independent event? No, multiple events will aggregate within a small time window. Something must have happened then that caused the problem.

Thanks

Thanks to Vinod Kumar, Dylan Irlbeck, David Goussev, BJ Terry, Kyle Kinsey and other Flexport colleagues who participated in the review.

refer to

Understanding Postgres GIN Indexes: The Good and the Bad
Postgres Planner not using GIN index Occasionally
Gitlab once faced a GIN related issue
Understanding Postgres query planner behaviour on GIN index
Statistics used by the query planner
When To Avoid JSONB In A PostgreSQL Schema
Using JSONB in PostgreSQL: How to Effectively Store & Index JSON Data in PostgreSQL

Less is more: PostgreSQL queries from minutes to milliseconds

Introduction

question

Initial investigation

Dive into technical details

Quick query plan

Visual representation of fast query plans

Further investigation

slow query plan

Visual representation of slow query plans

solution

experiment

Real improvement in production environment

some useful principles

Principle 1: Less is more

Principle 2: Statistics need to be accurate

Principle 3: Improve observability

Thanks

refer to

sorra

引用和评论

Orca ORM的开发状态和技术路线

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式