Wire Protocol of PostgreSQL Queries in a Nutshell

I was working on a pull request to improve the performance of executemany() in asyncpg, who talks to the PostgreSQL server directly in its wire protocol (comparing to psycopg2 who uses libpq to speak the wire language for itself). Though the documentation for the protocol explained mostly everything, I found it a bit boring to read and it's easy to miss some very important details if you don't read every word. So I shall try to explain the message flow again here, hopefully in a nutshell.

I've also wrote some PoC code in Python for this post, and the source of the diagrams can be found here. please feel free to download and play with it.

The Start

Most PostgreSQL clients exchange data with the server through a single streaming connection - a TCP connection or UNIX domain socket connection for example. Each connection keeps exactly one Session on the server side. Sessions are maintained by separate processes, so they are isolated from each other, like this:

Clients and Sessions

When the connection is established, the client must authenticate itself to the server, and tell the server which database to use for this Session, by sending some Messages defined in the protocol back and forth. Once the authentication process is done successfully, the server always send a ReadyForQuery Message to the client indicating that it's good to move forward.

Startup

In practice, the actual start-up message flow may be more complex than this. I'm skipping the details here and jumping to the query message flow.

Simple Query

Client may initiate two types of queries: the Simple Query or the Extended Query. We'll start with the simple one. It is actually as simple as this:

Simple Query

As you can see, Simple Query starts with the client sending a Query Message with the actual SQL command. Once received, the server will process the SQL and execute it, and return the result status string back to the client in a CommandComplete Message. Likewise, the server will send a ReadyForQuery Message at the end, telling the client that it is ready for more queries.

Pipelining, Batching and Transactions

However, it is unnecessary for the client to wait for ReadyForQuery to send the next query - server sends it only to indicate that the job for the query is done, there won't be further responses returned for that query. The server can handle all the queries coming from the stream like a pipeline worker, returning the responses in the same order. So as far as the client could recognize who is who, it is actually okay to do this:

Pipeline

Simple Query also allows one Query Message with multiple SQL commands separated with semicolons (;). So here is a similar way to insert 3 rows with a single Query Message:

Multiple SQL

Much less ReadyForQuery noises, nice! However, this approach is actually different than the previous one. The server shall execute all SQL commands from the same Query Message in an implicit transaction, unless there is an explicit transaction in current Session. In this case, failure from any of the three SQL commands will rollback the implicit transaction, thus none of the commands could insert anything. For example:

Multiple SQL Rollback

Here the second INSERT ... VALUES (8) fails with a unique constraint check error, then the implicit transaction gets rolled back, skipping the third INSERT ... VALUES (9) at all.

Instead in previous example with three Query Messages, each SQL command lives in their own transaction, so one failing command couldn't stop another from inserting successfully. In order to achieve the same transactional effect, we need to create an explicit transaction with the BEGIN...END SQL commands:

Explicit Transaction

As you might have noticed, the ReadyForQuery Messages now provide different values: INTRANS means the Session has an active transaction when the server sends the Message, INERROR means the Session is in a failed transaction (indicating that further queries like INSERT ... 11 will be rejected until the transaction is closed), and IDLE means the Session is not in a transaction block any more.

You might also have noticed that, this example had two ErrorResponses, but previous example had only one. This is because ReadyForQuery cleared the "error returned, skipping all messages" state on the server side. For example, if we combine the INSERT ... 11 into previous Message like this: Query("INSERT ... VALUES (10); INSERT ... VALUES (11)"), then we'll have one ErrorResponse again. We'll get to this again later.

Row Sets

Now let's take a look at the SQL that actually returns data, e.g. a SELECT command:

Select Result

In this diagram, there are two more types of Messages before the usual CommandComplete:

RowDescription contains all information of each column of the result, including column name, column type and more.
DataRow is simply one data row of the result, including all values of each column in the same order as described in the RowDescription Message.

Obviously DataRow Message may repeat as many times as needed. Also, there can be no DataRow Message at all, indicating an empty result with "table header" (RowDescription) only.

Different types of SQL commands may also output other types of responses like CopyInResponse, CopyOutResponse or NoticeResponse, but I'm not covering them here.

Behind the Scenes

On the backend, the PostgreSQL server is actually doing more to generate such responses. The following diagram describes roughly how this process looks like:

Select Detail

On receiving a Query Message, the server will firstly start an implicit transaction if there is no transaction in the Session.
parse - PostgreSQL server will then parse the given SQL string into a prepared statement in the Session. And yes it is the same kind of prepared statement as you can generate with SQL PREPARE, only that this one here has no name. Thus it is also called unnamed prepared statement, or unnamed statement in short.
bind - Then the server will create a portal in current transaction with above prepared statement, and run the query planning. Likewise, portal is also known as server-side cursor, which you can create with SQL DECLARE CURSOR. Similarly the portal in Simple Query is an unnamed portal.
With a portal, the server has enough knowledge about the result, so it will send over a RowDescription Message.
execute - Eventually, the server executes the portal, and result rows get generated and sent to the client. When the cursor hits the end, a CommandComplete Message is sent.
sync - At last, the server closes the implicit transaction - that means either to commit if all good, or rollback on error. However, it will not close an explicit transaction, because in that case Query("BEGIN") won't work. Finally, a ReadyForQuery Message is sent to the client after all this.

If there are multiple SQL commands in one Query Message, above process is simply repeated several times, but sync runs only once at the very end so that all SQL commands share the same implicit transaction.

Extended Query

For a PostgreSQL server, Extended Query runs in the same way as above - only that it breaks down each step into separate Messages, allowing fine control to gain much more possibilities. For the same example with Extended Query:

Extended Query

In brief, we parsed the SQL string into stmt1, bound it into portal1, described the portal for RowDescription, executed the portal for DataRows, then put an end with Sync. The difference is, the prepared statement and portal now have names. It is possible to create multiple prepared statements or portals and use them alternatively.

Pipelining

Actually in previous diagram, the client wasn't waiting for any responses to send the next Message. Because the responses won't arrive at all until Sync is issued (if everything goes well). We put it that way to better explain the consequences of each Message. In fact the network sequence is more like this:

Extended Query Pipelining

It is okay for the client to send each Message one by one, or concatenate them into a big buffer and send altogether. Because the server will do the same buffering thing to save round-trip time, and only flush the buffer into the wire when Sync or Flush is issued, or as soon as an error occurred.

Comparing to Sync generates ReadyForQuery, Flush generates nothing at all but only flushes the server-side buffer.

Transaction

Because Extended Query is simply Simple Query extended, some similar rules we discussed previously also apply here. Take implicit transaction for example: within a Session when the PostgreSQL server receives a Message (one of Parse, Bind, Execute or Describe) from the client, it will firstly check current transaction state. If there was no transaction (either explicit or implicit - in IDLE state), it will start an implicit one immediately before handling the Message.

Handling Messages may cause errors. In implicit transactions, errors lead to rollbacks; but in explicit transactions, they put the Session into INERROR state:

Implicit Transaction

In addition for extended query, you may issue as many Messages as you want before Sync - you can Parse 5 different SQLs, Bind 3 of them, and Execute them all. But a failing Message will set the Session into ignore_till_sync mode, meaning that the server will ignore any Message until Sync is found. This is similar to using one Simple Query Message to execute many SQLs joint with semicolons (;), one failing SQL cause the rest ignored. At last, errors always cause an immediate flush of the server-side buffer.

If the Message turns out to be a Sync, it shall clear the ignore_till_sync mode first. Then the server will check if the current transaction is an implicit one. If yes, it will try to commit the transaction. Sync won't touch an explicit transaction, but it will send a ReadyForQuery response and flush the server-side buffer anyhow:

Implicit Transaction

Therefore, Sync is the normal way closing an implicit transaction - it would commit a good transaction, or reset an aborted transaction to normal:

Implicit Transaction

You may also issue an explicit COMMIT or ROLLBACK SQL command to end an implicit transaction in the way you want. But 1), the PostgreSQL server shall complain about it with an extra NoticeResponse warning, and 2), in ignore_till_sync mode, the server will ignore any Messages other than Sync, even if it is a ROLLBACK. Therefore, it is only meaningful to rollback an implicit transaction with an explicit ROLLBACK while there was no error:

Implicit Transaction

Behind the scene, the PostgreSQL server is using a much more complex state machine to maintain the transaction block state. This diagram only shows an essential part of it:

Server-side States

Note that the purple tags are actual transaction boundaries. The transaction state we get from ReadyForQuery Message are simply judged by the colors of each state. Please note that, if you get a ReadyForQuery from an implicit transaction (the two states in dashed frame), it always states that the transaction is IDLE because Sync does the commit first. I guess that's why the STARTED state is actually categorized as an IDLE state in PostgreSQL source code, but I'm marking it as INTRANS anyway here because the transaction did start beforehand. It is also interesting to see that, any transaction starts as an implicit one - only BEGIN commands (or its varient) could mark it as an explicit transaction.

In explicit transactions, Extended Query may also trigger the ignore_till_sync mode with a failing Message. As Sync won't rollback the explicit transaction, it is only used to clear the ignore_till_sync mode, making space for ROLLBACK command. Therefore, it is rare to execute multiple SQL commands within one Sync. The usual pattern is Bind -> Execute -> Sync, with optional Parse and Describe. Externally, the state transitioning looks like this:

Server-side States

If an operation is not seen for a state in this diagram, it means this missing operation won't lead to a state change. For example, issuing a Sync in explicit INTRANS has no effect but flushing the buffer. And success means any other Message that runs successfully (e.g. executing SELECT now()), while error means it results with an error (e.g. executing SELECT 1/0).

Additionally, please be aware that, using Query is identical to call Parse -> Bind -> Describe -> Execute -> Sync. If you are mixing Simple Query in an Extended Query, the Query will close the implicit transaction in Extended Query.

Parameter

You may wonder why having the portal trouble at all when there is already prepared statement - why didn't PostgreSQL merge the cursor feature into prepared statements? The reason is about caching prepared statements. It is a waste to repeatingly parse the same SQL over and over again: say for a web server we need to load user data for every request, we may use SELECT * FROM users WHERE id = 235 for user 235, and SELECT * FROM users WHERE id = 169 for user 169. The only difference in SQL is the user's ID - if we can turn that into a parameter, we can then parse the SQL string only once and reuse the prepared statement for different users.

This is exactly how portal helps: it allows binding the actual parameter values with the same prepared statements into different portals. For example:

Parameters

The parameter placeholder is $1, indicating the first parameter in the list provided in Bind. Each portal is an actual execution handle of the query with specific parameter values. You can have as many portals as you need, as far as the stmt5 lives.

Lifetime

Named prepared statements live until the end of the Session, unless explicitly freed with the Close("S", stmt1") Message ("S" for Statement). That means if not closed, you may use stmt1 whenever you want within the same TCP connection. But before closing, you cannot create another prepared statement with the same again in the same Session, doing so will end up with an ErrorResponse.

Prepared statements are usually managed by PostgreSQL drivers like asyncpg in a way of LRU cache, in order to save resources parsing the same SQL strings.

However, named portals cannot live that long - they lasts until the end of the transaction if not explicitly closed with Close("P", "portal1") Message ("P" for Portal). In previous example, there was only one implicit transaction that ended at Sync(), so it is possible to reuse the names portal1 and portal2 after that.

It is possible to manually create or close unnamed prepared statements and portals in Extended Query - just setting the name to an empty string ("") will do. They are identical to named statements and portals except for one thing: it is not an error to create a new unnamed statement or portal if an existing one wasn't closed - it's simply overwritten. This is extremely convenient if you need some one-time use statements or portals - just create one and use it at once, then forget about it. But be aware that, Simple Query also creates unnamed statements and portals, mixing using it will cause unnamed statements or portals you created overwritten.

Server-side Cursor

If not closed, portals may live throughout a transaction. It is designed this way so that it can yield results incrementally like a cursor. Note the second parameter 0 in Execute Messages previously - it meant to fetch all results. If given a positive number, it will fetch at most that many rows from the result set. For other SQL commands that doesn't return results, this parameter has no effect - the command is always executed to completion.

Cursor

If the limit is reached before the end of the result, a PortalSuspended Message will be returned, like the first Execute in this diagram. But, receiving a PortalSuspended doesn't necessarily mean there are more rows to receive - the second fetch may also get an empty list if the limit number of the first fetch is exactly the same as the row count. It is also worth noting that, the final row count in CommandComplete Message reflects only the number of rows in the last fetch, not the total number of all the rows returned.

There is no MOVE equivalent in Extended Query to move the cursor without returning. Instead you may directly use the SQL command, using the portal name as cursor name.

Describe

The Describe Message can be used on either prepared statements or portals, in order to get information of them. This selection is specified by the first parameter, where P means portal, and S means statement.

Describing a portal generates a RowDescription response like the one we got in Simple Query. It is usually essential for processing following DataRows if any. It returns NoData if describing a portal that returns nothing like an UPDATE command without RETURNING.

Describing a statement is a bit complicated - it returns two Messages. The first is ParameterDescription which describes the parameters and their types in the given statement. The second is still a RowDescription (or NoData), but it may lack some typing information because the statement is not bound to specific parameters thus some types of variables are still unknown.

The End

There are still a lot more I didn't cover here like COPY, NOTIFY, TLS and more. But it's already been days to write those above and I'm already getting messy with part of it, so I'll end this with two more closing topics.

Cancel a Running Query

PostgreSQL allows canceling a running query from outside, by establishing a new connection to the server and send over a CancelRequest Message instead of the normal StartupMessage:

Cancel

The Canceller may or may not be a a part of Client, as far as the CancelRequest contains essential subprocess ID and secret key of the target Session, which the Client could obtain during its initial authentication. Because the signal may arrive at any time, the cancellation is not a consistent event. That is to say, after sending the CancelRequest, the Client should be ready to face any possible responses including successful results or errors.

Bye

For a graceful shutdown, it's usually the Client sending a Terminate Message and close the connection. On receiving Terminate, the server will also close the connection and terminate the Session, rolling back any pending transactions. It is worth noting that, even though a broken connection without Terminate may do the same, the server would firstly finish any query at hand before noticing the disconnection, therefore separate SQL commands without transaction may still succeed during ungraceful shutdown.

References

https://www.pgcon.org/2014/sc...
https://www.postgresql.org/do...
https://www.postgresql.org/do...

Wire Protocol of PostgreSQL Queries in a Nutshell

The Start

Simple Query

Pipelining, Batching and Transactions

Row Sets

Behind the Scenes

Extended Query

Pipelining

Transaction

Parameter

Lifetime

Server-side Cursor

Describe

The End

Cancel a Running Query

Bye

References

fantix

引用和评论

被 Manus 带火的 MCP 是什么｜一文看懂

python与nodejs哪个性能高

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

Python 描述符

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时