I hope that I can write some articles that are helpful to Mengxin. If there are any mistakes, I hope that you can enlighten me.
In MySQL, when we specify the isolation level as repeatable read* (actually MySQL's default transaction isolation level), then run a transaction, and then start another transaction:
The first query of transaction A;
B transaction is then added;
Transaction A is then updated with the new row of transaction B, and queried again;
The results vary, which does not seem to avoid intuitive hallucinations.
In this case, this article is to explore what kind of phantom reading can be solved when MySQL repeats reading?
Disclaimer: In any SQL system, the Repeatable Read isolation level does not require resolution of phantom reads. We are just exploring: what MySQL does to resolve phantom reads* (or part of phantom reads) at this level.
What is phantom reading
The first is the intuitive definition: before a transaction is completed, although there may be other transactions executing, the result of each read I read in this transaction should be consistent.
But then we read Wikipedia's definition: when the read WHERE condition does not add a range lock, because another transaction adds, deletes, or modifies the data in the range, the result of this transaction is inconsistent when it is read twice.
An obvious difference will be found: when WHERE is not range locked , what does this sentence mean:
- If you want to avoid phantom reading, you must add a range lock - lock all data used in this transaction - reduce performance bottlenecks;
- Whether there is a range lock or a similar mechanism to ensure the consistency of the relevant data of this transaction has become a major basis for phantom reading.
With the definitions cleared up, let's take a look at what MySQL does.
Some mechanisms of MySQL
First, when MySQL performs a read, two reads occur.
Snapshot reads (consistent non-locking reads)
According to the definition of MySQL, when we query, InnoDB presents a snapshot of the database at a certain point in time to the query, so as to achieve a consistent non-locking read mechanism .
When the isolation level is upgraded to repeatable read (the default isolation level of MySQL transactions), the snapshot used at the beginning of the transaction shall prevail during the entire transaction process.
Try the chestnuts officially given by MySQL:
# 事件A
SET autocommit=0;
# 事件B
SET autocommit=0;
# 事件A
SELECT * FROM t;
# 空结果
# 事件B
INSERT INTO t VALUES (1, 2);
# 事件A
SELECT * FROM t;
# 空结果
# 事件B
INSERT INTO t VALUES (1, 2);
# 事件A
SELECT * FROM t;
# 空结果
# 事件B
COMMIT;
# 事件A
SELECT * FROM t;
# 空结果
COMMIT;
SELECT * FROM t;
# 有结果
It sounds like the phantom read problem is solved in one go, but we keep reading for more:
While consistent reads sound good, different effects occur for some DML statements:
First, snapshots apply to SELECT
statements within a transaction, when transaction A updates new rows committed by other transactions that were generated during the transaction , and become visible even if they are not in the snapshot.
Besides that, when you run some substatements in a DML statement:
By default, InnoDB
use more robust locks to process these statements, and these read statements appear to be active at the level of read committed, although still in the same transaction, each statement's read Fetching also results in setting and updating its own snapshot.
Let's modify the official chestnut slightly, assuming two id
and val
:
# 事件A
SET autocommit=0;
# 事件B
SET autocommit=0;
# 事件A
SELECT * FROM t;
# 一个结果
# 事件B
INSERT INTO t VALUES (2, 3);
# 事件A
SELECT * FROM t;
# 一个结果
# 事件B
COMMIT;
# 事件A
SELECT * FROM t;
# 一个结果
INSERT INTO t VALUES (3, 4);
SELECT * FROM t;
# 两个结果,快照的一条 + 新增一条
UPDATE t SET val = 0 WHERE id > 0;
# 注意语句的影响行数
SELECT * FROM t;
# 三个结果——因为 DML 语句影响(发现)了一些额外的行,快照被更新了,事务尚未提交的一条也包含在内
There is also a concept of MVVC (multi-version concurrency control), you can study it yourself if you are interested: InnoDB Multi-Versioning - MySQL Doc
(Because MVVC involves many other things, at least I don't know enough about it at the moment to describe its concept and implementation here - every modern database system has its own implementation of MVVC based on its own positioning)
current read (locked read)
For the problem of snapshot read, locked read provides a lock mechanism for you to safely operate/will operate the target data.
- Shared lock:
SELECT ... FOR SHARE
, other transactions can read these data, but cannot modify them. - Update lock:
SELECT ... FOR UPDATE
, lock the rows, indexes and associated index information related to these data to ensure that the data related to this transaction cannot be modified in shared locks and other isolation levels.
There are some small notes:
- Locked reads can only be used if autocommit is disabled;
- Subqueries will not be locked by association. If you want to lock the data of subqueries, remember to put the corresponding substatements in it;
- If you don't want to wait for other transactions to end the lock, you can use
SELECT ... FOR ... SKIP LOCKED
, which will ignore the locked row and return the remaining results; orSELECT ... FOR ... NOWAIT
, which directly returns the locked error. - At the end of the transaction, all locks holding reads are released.
One last chestnut left:
# 事件A
SET autocommit=0;
# 事件B
SET autocommit=0;
# 事件A
SELECT * FROM t FOR SHARE;
# 一个结果,没有 WHERE 条件,SHARE 将锁加到全表
# 事件B
INSERT INTO t VALUES (2, 3);
# 被阻塞
gap lock
The row lock ensures the consistency of the modification, and when the data is added, the row lock is powerless.
Gap Lock is used to lock the index of the range to ensure that the gap associated with the target position is firm. Therefore, when we want to add a row with val = 5, we only need to lock the index of about 5 to ensure the new increased consistency.
# 事件A
SET autocommit=0;
# 事件B
SET autocommit=0;
# 事件A
SELECT * FROM t WHERE id = 5 FOR SHARE;# 这会锁定 5,没有范围的成本
# 或
# SELECT * FROM t WHERE id > 4 FOR SHARE;# 这会锁定 5 ~ ∞ 的范围
# 事件B
INSERT INTO t VALUES (5, 3);
# 事件A
INSERT INTO t VALUES (5, 12);
# 成功
A little supplement: If we operate on ID = 5, we will directly take the record lock (lock a certain record), after all, there is no need for Gap.
We confirmed the behavior of gap locks on indexes, what happens if the condition column doesn't belong to the index at all?
If id
is not indexed or has a nonunique index, the statement does lock the preceding gap.
If the id
column has no index or only a non-unique index, the contiguous range is locked, which is explained in the documentation. I won't leave the code here, you can try it if you are interested.
Next-Key Lock
When we combine gap locks and record locks, we get Next-Key locks.
In the repeatable read level, InnoDB will use this lock to ensure that related additions and changes will block the same-target additions and changes of other transactions. When only we can perform this operation at this time, phantom reads are naturally avoided.
Finally, let's go back to the original chestnut:
# 事件A
SET autocommit=0;
# 事件B
SET autocommit=0;
# 事件A
UPDATE t SET val = 0 WHERE id > 0;
# 成功,同时 Next-Key 锁锁住所有 WHERE 条件相关的记录
# 事件B
INSERT INTO t VALUES (2, 3);
# 被阻塞
in conclusion
In the end, it can be seen that MySQL's repeatable read isolation level solves the phantom read problem caused by homogeneous reads, but does not solve the phantom read problem caused by non-homogeneous reads - of course, we can also see from reading Encyclopedia: for repeatable reads By definition of the read level, avoiding phantom reads is not something that must be dealt with at this level.
quote
The following sources are referenced or cited in this article, thanks to:
Repeatable-read isolation violated in UPDATE - MySQL Bug Report
Consistent Nonlocking Reads - MySQL Document
The relationship between transaction isolation level and lock in Innodb - Meituan Technical Team
Database Kernel Monthly Report - 2017/06 - Alibaba Cloud PolarDB-Database Kernel Group
Isolation (database systems) - Wikipedia
High Performance MySQL - O'Reilly Book Series
Extension: There are also several shared/exclusive/intent/locks, each of which is useful. My English is not very good, and I am a little tired to read it (it doesn't seem to be too related to the theme of this article), everyone is interested in picking it up: InnoDB Locking - MySQL Document
(If it is related, please leave a message, and I will add their concept and relationship to this article)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。