The interviewer asked about MySQL transactions, locks and MVCC in one breath, I

Interviewer : How do you understand the transactions in the InnoDB engine?

candidate : In my understanding, a transaction can make "a set of operations" either all succeed or all fail

candidate : The purpose of the transaction is to "guarantee the final consistency of the data."

candidate : For example, I sent you Alipay and transferred 888 red envelopes. Then naturally my Alipay balance will be deducted by 888 yuan, and your Alipay balance will be increased by 888 yuan.

candidate : The transaction is to ensure that my balance deduction and your balance increase are successful or failed at the same time, so this transfer will be normal

Interviewer : Well, do you know the characteristics of business?

candidate : Well, it is ACID, which are Atomicity, Consistency, Isolation, and Durability.

candidate : Atomicity refers to: the operation of the current transaction either succeeds at the same time or fails at the same time. The atomicity is guaranteed by the undo log, because the undo log records the information before the data is modified.

candidate : For example, if we want to insert a piece of data, the undo log will record a corresponding delete log. When we want to update a record, the undo log will record the update record of the previous "old value".

Candidate : If an abnormal situation occurs during the execution of the transaction, perform a "rollback". The InnoDB engine uses the data recorded in the undo log to "restore" the data to before the start of the transaction

candidate : I will talk about consistency a bit later, let me talk about isolation first

Interviewer : Well...

Candidate : Isolation refers to: When transactions are executed concurrently, their internal operations cannot interfere with each other. If multiple transactions can operate on one data at the same time, problems such as dirty reads, repeated reads, and phantom reads will occur.

candidate : Therefore, there needs to be a "certain" isolation between transactions. In the InnoDB engine, four isolation levels are defined for us to use:

Candidate : They are: read uncommit (read uncommitted), read commit (read submitted), repeatable read (repeatable read), serializable (serial)

candidate : Different isolation levels have different isolation between transactions (the higher the level, the better the transaction isolation, but the lower the performance), and the isolation is achieved by various locks of MySQL , But it shields the details of the lock.

candidate : Persistence refers to: once the transaction is committed, its changes to the database should be permanent. To put it bluntly, the data will be persisted on the hard disk.

candidate : The persistence is guaranteed by the redo log log. When we want to modify the data, MySQL first finds the "page" where the record is located, then loads the page into memory, and modifies the corresponding record .

candidate : In order to prevent the memory modification from being completed, MySQL will hang up (if the memory is changed, it will hang up directly, then this modification is equivalent to lost).

candidate : MySQL has introduced redo log. After the memory is written, a redo log will be written. This redo log records what changes have been made on a certain page this time.

candidate : Even if MySQL hangs in the middle, we can still restore the data according to the redo log.

candidate : redo log is written sequentially, and the writing speed is very fast. And what it records is physical modification (xxx modification is done on the xxxx page), the file size is small, and the recovery speed is also fast.

Candidate talk about consistency again. "Consistency" can be understood as the "purpose" of our transaction, and "isolation", "atomicity" and "persistence" are all means to ensure "consistency". To ensure consistency needs to be guaranteed by the application code

candidate : For example, if an abnormal situation occurs during the transaction, you have to roll back the transaction instead of forcibly committing the transaction to cause data inconsistency.

Interviewer : Well, it’s good, I talked a lot

Interviewer : Just now you mentioned isolation, Then you said that there are four isolation levels in MySQL. Can you introduce them separately?

candidate : Well, in order to clarify the isolation level, let me talk about MySQL lock-related knowledge by the way.

candidate : Under the InnoDB engine, according to the granularity of locks, it can be simply divided into row locks and table locks.

candidate : The row lock actually acts on the index (the index has been mentioned last time, so I won’t repeat it here). When our SQL hits the index, the index node in the hit condition is locked (this is the row lock). If the index is not hit, then we lock the entire index tree (table lock).

candidate : Simply put: whether the entire tree or a few nodes are locked depends entirely on whether the SQL condition hits the corresponding index node.

candidate : Row locks can be simply divided into read locks (shared locks, S locks) and write locks (exclusive locks, X locks).

candidate : The read lock is shared, multiple transactions can read the same resource at the same time, but other transactions are not allowed to modify. Write locks are exclusive, and write locks block other write locks and read locks.

candidate : I will go back to the isolation level now, just use an example to illustrate.

Interviewer : Well...

candidate read uncommit (read uncommitted). For example: A transfers money to B, A executes the transfer statement, but A has not yet submitted the transaction, B reads the data and finds that his account has increased! B told A that I have received the money. A rolls back the transaction [rollback], and when B looks at the money in the account again, he finds that there is not much money.

candidate : The simple definition is: Transaction B reads data that has not yet been committed by Transaction A. This is called "dirty read" in technical terms.

candidate : As far as the dimension of the lock is concerned, in fact, under the read uncommit isolation level, no lock will be added for reading, and an exclusive lock will be added for writing. Do not add any locks to read, which makes exclusive locks unable to exclude them.

candidate : And we know that InnoDB will definitely add a write lock for update operations (it is impossible for the database to allow the same record to be updated at the same time). The read operation, if no lock is added, it will cause the above dirty read.

candidate : Dirty reads are definitely unacceptable in a production environment. If the read is locked, it means that when the data is updated, there is no way to read, which will greatly reduce the performance of the database.

candidate : At the MySQL InnoDB engine level, there is a new solution (solving read and write performance problems after locking), called MVCC (Multi-Version Concurrency Control) multi-version concurrency control

candidate : Under MVCC, you can achieve read and write without blocking and avoid problems like dirty reads. How does MVCC do it?

candidate : MVCC generates a data snapshot (Snapshot) and uses this snapshot to provide a certain level (statement level or transaction level) of consistent reads

candidate : Back to the transaction isolation level, for the read commit (read committed) isolation level, it generates statement-level snapshots, and for repeatable read (repeatable read), it generates transaction-level Snapshot.

candidate : As mentioned earlier, dirty reads occur under the read uncommit isolation level, while the read commit (read submitted) isolation level solves dirty reads. The idea is actually very simple: generate a "version number" when reading, and only read the latest committed "version number" data after other transactions are committed.

candidate : For example: Transaction A reads the record (generates the version number), transaction B modifies the record (write lock is added at this time), when transaction A reads again, it is read based on the latest version number (When transaction B executes commit, a new version number will be generated). If transaction B has not yet committed, then transaction A reads the data of the previous version number.

candidate : Through the concept of "version", this solves the problem of dirty reading, and the "version" is actually the data corresponding to the snapshot.

candidate : read commit (read has been submitted) solves the dirty read, but there will be other concurrency problems. "Non-repeatable read": One transaction reads data that has been committed by another transaction, which means that one transaction can see the changes made by other transactions.

Candidate : Example of non-repeatable reading: A queries the database to get the data, and B modifies the data in the database, causing A to query the database multiple times and the results are different [Harm: the result of each query of A is affected by B of】

candidate : After understanding the basics of MVCC, it is easy to think of how the repeatable read isolation level avoids the problem of non-repeatable read (mentioned earlier).

candidate : repeatable read isolation level is a snapshot of the "transaction level"! Each read is the "current transaction version", even if the current data is modified by other transactions (commit), only the data of the current transaction version will be read.

candidate : The repeatable read (repeatable read) isolation level will have the problem of phantom reading. "Phantom reading" refers to reading data inserted by other transactions within one transaction, resulting in inconsistent reads before and after.

candidate : Under the repeatable read (repeatable read) isolation level under the InnoDB engine, under the influence of snapshot read MVCC, the problem of phantom reading has been solved (because it is reading historical version data)

candidate : If it is the current read (referring to select * from table for update), you need to cooperate with the gap lock to solve the problem of phantom reading.

candidate : The rest is the serializable (serial) isolation level. Its highest isolation level is equivalent to not allowing concurrent transactions. The execution between transactions is serial, which is the least efficient, but It is also the safest.

Interviewer : Well, yes. I think you mentioned MVCC, why not talk about his principle?

candidate : MVCC is mainly realized through read view and undo log

candidate : The undo log is also mentioned earlier, it will record the information before the data is modified, and the atomicity in the transaction is achieved through the undo log. So, undo log can help us find the "version" data

candidate : The read view is actually in the query, InnoDB will generate a read view, read view has several important fields, namely: trx_ids (the collection of transaction version numbers that have not yet submitted commit), low_limit_id (next time The transaction ID value to be generated), low_limit_id (the minimum transaction ID that has not yet submitted the version number), and creator_trx_id (the current transaction version number)

candidate : There are two hidden fields in each row of data, namely DB_TRX_ID (recording the current ID) and DB_ROLL_PTR (pointer to the location of the previous version data in the undo log)

candidate : At this point, it is easy to find that MVCC actually relies on "compare versions" to achieve read and write non-blocking, and the version data exists in the undo log.

candidate : For different isolation levels (read commit and repeatable read), it is nothing more than read commit isolation level, each time a new read view is obtained, and repeatable read isolation level only obtains one read per transaction view

Interviewer : Well, OK. The details will not be exquisite, so let's stop here today.

This article summarizes :

In order to ensure the final consistency of the data
There are four major characteristics of transactions, namely, atomicity, consistency, isolation, and durability
- Atomicity is guaranteed by undo log
- Durability is guaranteed by redo log
- Isolation is available for us to choose from the database isolation level, which are read uncommit, read commit, repeatable read, serializable
- Consistency is the purpose of transactions, and consistency is guaranteed by the application
There are various problems with transaction concurrency, including dirty reads, repeated reads, and phantom reads. The above different isolation levels can solve the problems caused by concurrent transactions, and the isolation level is actually implemented by MySQL locks
Frequent locking will lead to low database performance. MVCC multi-version control is introduced to achieve non-blocking read and write and improve database performance
The principle of MVCC is realized through read view and undo log