MySQL isolation level: repeatable read

I hope that I can write some articles that are helpful to Mengxin. If there are any mistakes, I hope that you can enlighten me.

In MySQL, when we specify the isolation level as repeatable read* (actually MySQL's default transaction isolation level), then run a transaction, and then start another transaction:

The first query of transaction A;
B transaction is then added;
Transaction A is then updated with the new row of transaction B, and queried again;
The results vary, which does not seem to avoid intuitive hallucinations.

In this case, this article is to explore what kind of phantom reading can be solved when MySQL repeats reading?

Disclaimer: In any SQL system, the Repeatable Read isolation level does not require resolution of phantom reads. We are just exploring: what MySQL does to resolve phantom reads* (or part of phantom reads) at this level.

What is phantom reading

The first is the intuitive definition: before a transaction is completed, although there may be other transactions executing, the result of each read I read in this transaction should be consistent.

But then we read Wikipedia's definition: when the read WHERE condition does not add a range lock, because another transaction adds, deletes, or modifies the data in the range, the result of this transaction is inconsistent when it is read twice.

An obvious difference will be found: when WHERE is not range locked , what does this sentence mean:

If you want to avoid phantom reading, you must add a range lock - lock all data used in this transaction - reduce performance bottlenecks;
Whether there is a range lock or a similar mechanism to ensure the consistency of the relevant data of this transaction has become a major basis for phantom reading.

With the definitions cleared up, let's take a look at what MySQL does.

Some mechanisms of MySQL

First, when MySQL performs a read, two reads occur.

Snapshot reads (consistent non-locking reads)

According to the definition of MySQL, when we query, InnoDB presents a snapshot of the database at a certain point in time to the query, so as to achieve a consistent non-locking read mechanism .

When the isolation level is upgraded to repeatable read (the default isolation level of MySQL transactions), the snapshot used at the beginning of the transaction shall prevail during the entire transaction process.

Try the chestnuts officially given by MySQL:

 # 事件A
SET autocommit=0;

# 事件B
SET autocommit=0;

# 事件A
SELECT * FROM t;
# 空结果

# 事件B
INSERT INTO t VALUES (1, 2);

# 事件A
SELECT * FROM t;
# 空结果

# 事件B
INSERT INTO t VALUES (1, 2);

# 事件A
SELECT * FROM t;
# 空结果

# 事件B
COMMIT;

# 事件A
SELECT * FROM t;
# 空结果

COMMIT;

SELECT * FROM t;
# 有结果

It sounds like the phantom read problem is solved in one go, but we keep reading for more:

While consistent reads sound good, different effects occur for some DML statements:

First, snapshots apply to SELECT statements within a transaction, when transaction A updates new rows committed by other transactions that were generated during the transaction , and become visible even if they are not in the snapshot.

Besides that, when you run some substatements in a DML statement:

By default, InnoDB use more robust locks to process these statements, and these read statements appear to be active at the level of read committed, although still in the same transaction, each statement's read Fetching also results in setting and updating its own snapshot.

Let's modify the official chestnut slightly, assuming two id and val :

 # 事件A
SET autocommit=0;

# 事件B
SET autocommit=0;

# 事件A
SELECT * FROM t;
# 一个结果

# 事件B
INSERT INTO t VALUES (2, 3);

# 事件A
SELECT * FROM t;
# 一个结果

# 事件B
COMMIT;

# 事件A
SELECT * FROM t;
# 一个结果

INSERT INTO t VALUES (3, 4);
SELECT * FROM t;
# 两个结果，快照的一条 + 新增一条

UPDATE t SET val = 0 WHERE id > 0;
# 注意语句的影响行数
SELECT * FROM t;
# 三个结果——因为 DML 语句影响（发现）了一些额外的行，快照被更新了，事务尚未提交的一条也包含在内

There is also a concept of MVVC (multi-version concurrency control), you can study it yourself if you are interested: InnoDB Multi-Versioning - MySQL Doc

(Because MVVC involves many other things, at least I don't know enough about it at the moment to describe its concept and implementation here - every modern database system has its own implementation of MVVC based on its own positioning)

current read (locked read)

For the problem of snapshot read, locked read provides a lock mechanism for you to safely operate/will operate the target data.

Shared lock: SELECT ... FOR SHARE , other transactions can read these data, but cannot modify them.
Update lock: SELECT ... FOR UPDATE , lock the rows, indexes and associated index information related to these data to ensure that the data related to this transaction cannot be modified in shared locks and other isolation levels.

There are some small notes:

Locked reads can only be used if autocommit is disabled;
Subqueries will not be locked by association. If you want to lock the data of subqueries, remember to put the corresponding substatements in it;
If you don't want to wait for other transactions to end the lock, you can use SELECT ... FOR ... SKIP LOCKED , which will ignore the locked row and return the remaining results; or SELECT ... FOR ... NOWAIT , which directly returns the locked error.
At the end of the transaction, all locks holding reads are released.

One last chestnut left:

 # 事件A
SET autocommit=0;

# 事件B
SET autocommit=0;

# 事件A
SELECT * FROM t FOR SHARE;
# 一个结果，没有 WHERE 条件，SHARE 将锁加到全表

# 事件B
INSERT INTO t VALUES (2, 3);
# 被阻塞

gap lock

The row lock ensures the consistency of the modification, and when the data is added, the row lock is powerless.

Gap Lock is used to lock the index of the range to ensure that the gap associated with the target position is firm. Therefore, when we want to add a row with val = 5, we only need to lock the index of about 5 to ensure the new increased consistency.

 # 事件A
SET autocommit=0;

# 事件B
SET autocommit=0;

# 事件A
SELECT * FROM t WHERE id = 5 FOR SHARE;# 这会锁定 5，没有范围的成本
# 或
# SELECT * FROM t WHERE id > 4 FOR SHARE;# 这会锁定 5 ~ ∞ 的范围

# 事件B
INSERT INTO t VALUES (5, 3);

# 事件A
INSERT INTO t VALUES (5, 12);
# 成功

A little supplement: If we operate on ID = 5, we will directly take the record lock (lock a certain record), after all, there is no need for Gap.

We confirmed the behavior of gap locks on indexes, what happens if the condition column doesn't belong to the index at all?

If id is not indexed or has a nonunique index, the statement does lock the preceding gap.

If the id column has no index or only a non-unique index, the contiguous range is locked, which is explained in the documentation. I won't leave the code here, you can try it if you are interested.

Next-Key Lock

When we combine gap locks and record locks, we get Next-Key locks.

In the repeatable read level, InnoDB will use this lock to ensure that related additions and changes will block the same-target additions and changes of other transactions. When only we can perform this operation at this time, phantom reads are naturally avoided.

Finally, let's go back to the original chestnut:

 # 事件A
SET autocommit=0;

# 事件B
SET autocommit=0;

# 事件A
UPDATE t SET val = 0 WHERE id > 0;
# 成功，同时 Next-Key 锁锁住所有 WHERE 条件相关的记录

# 事件B
INSERT INTO t VALUES (2, 3);
# 被阻塞

in conclusion

In the end, it can be seen that MySQL's repeatable read isolation level solves the phantom read problem caused by homogeneous reads, but does not solve the phantom read problem caused by non-homogeneous reads - of course, we can also see from reading Encyclopedia: for repeatable reads By definition of the read level, avoiding phantom reads is not something that must be dealt with at this level.

quote

The following sources are referenced or cited in this article, thanks to:

Repeatable-read isolation violated in UPDATE - MySQL Bug Report
Consistent Nonlocking Reads - MySQL Document
The relationship between transaction isolation level and lock in Innodb - Meituan Technical Team
Database Kernel Monthly Report - 2017/06 - Alibaba Cloud PolarDB-Database Kernel Group
Isolation (database systems) - Wikipedia
High Performance MySQL - O'Reilly Book Series

Extension: There are also several shared/exclusive/intent/locks, each of which is useful. My English is not very good, and I am a little tired to read it (it doesn't seem to be too related to the theme of this article), everyone is interested in picking it up: InnoDB Locking - MySQL Document
(If it is related, please leave a message, and I will add their concept and relationship to this article)

MySQL isolation level: repeatable read

What is phantom reading

Some mechanisms of MySQL

Snapshot reads (consistent non-locking reads)

current read (locked read)

gap lock

Next-Key Lock

in conclusion

quote

UioSun

引用和评论

简单比较：PHP 函数、闭包的性能

MySQL慢查询日志：性能优化的终极指南

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

《SQL应用场景解析：如何通过SQL解决实际业务问题》

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

php+mysql 搭建一个在线游戏网站目前已有2000+游戏【代码解析一】

一键实现 Oracle 数据整库同步至 Apache Doris

MySQL isolation level: repeatable read

What is phantom reading

Some mechanisms of MySQL

Snapshot reads (consistent non-locking reads)

current read (locked read)

gap lock

Next-Key Lock

in conclusion

quote

UioSun

引用和评论

简单比较：PHP 函数、闭包的性能

MySQL慢查询日志：性能优化的终极指南

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

《SQL应用场景解析：如何通过SQL解决实际业务问题》

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

php+mysql 搭建一个在线游戏网站目前已有2000+游戏【代码解析 一】

一键实现 Oracle 数据整库同步至 Apache Doris

php+mysql 搭建一个在线游戏网站目前已有2000+游戏【代码解析一】