Reorganize my understanding of affairs. I wrote a large part of this series of articles in 20 years, and later turned to articles with higher priority. When I wanted to continue writing, I found that the file was not uploaded to the network, so I went to I searched on another computer and couldn't open it. This story tells us that the core data needs to be disaster-tolerant.
foreword
The writing ideas of this series of articles, first put forward the general outline, lead all the articles in the database transaction series, and then focus on other database transactions, like the following:
Business Profile
The real world is mapped to the software world. Sometimes there will be some problems that are exclusive to the software world. To give a simple and classic example of transferring money, A borrows 10 yuan from B. Assuming that there is no online payment yet, A is Take 10 yuan from B's wallet to A, even if multiple people borrow money from B, there is no problem. B will process the loan request in turn. The more people who borrow money in the same time period, the slower B borrows money. It is possible to also consider various factors such as friendship.
But if we introduce this transfer into the software world, it will lead to a problem that does not exist in the real world. If you have fifty in your wallet, you can only borrow fifty. It is impossible that if you have fifty, you lend one hundred. Then a minus fifty appeared in the wallet, which means you are consistent. In addition, there is generally no intermediate state in lending operations. Either the borrowing succeeds or the money does not reach the borrower. This is atomicity. Some operations in the real world are mapped to the software world, and the situation will become more complicated. The operation of A borrowing money from B will be divided into several operations in the database operation. Let us briefly introduce the transfer operation in the database world. How is it, at the beginning of the story we prepare an account table:
CREATE TABLE `accounts` (
`id` bigint(20) NOT NULL COMMENT '主键',
`userId` bigint(20) NOT NULL COMMENT '用户ID',
`money` int(255) NOT NULL COMMENT '钱款',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci;
This table creation statement is the table creation statement under MySQL, and a transfer operation corresponds to the following two SQL statements:
UPDATE accounts SET money = money - 5 WHERE id = 1; // 1是B的ID 2是A的ID,此时的场景是A向B借5块钱
UPDATE accounts SET money = money + 5 WHERE id = 2;
The actual execution process of the above two statements in the database will be more complicated. To illustrate the problem, we simplify the transfer model to the following operations:
1. Read the account balance of small B into variable A, this step is abbreviated as read(A)
2. Subtract the account balance from the balance of small B, abbreviated as A = A - 5
3. Write the modified balance of small B to the disk. This step is simply written as write(A)
- Read the balance of the small A account to the variable B, this step is abbreviated as read(B)
- Add the account balance of small A to the balance transferred from small B, and simply record it as B = B + 5.
- Write the modified balance of the small A account to the disk, this step is abbreviated as write(B)
There is no problem in the real world for small A to borrow money from small B twice. It is nothing more than withdrawing money from the wallet twice. However, the steps corresponding to the two operations in the database may be executed concurrently, not queued. . To illustrate the problem, we denote the two transfer operations as T1 and T2. In the concurrent execution scenario, T1 executes read(A) soon after read(A). We assume that before the transfer operation, small B only has 10 yuan, that is, T1 and T2 are conducting the transfer operation. When I think that B has ten yuan, then assume that after T1 starts to execute 2, 3, 4, 5, and 6, T2 continues to execute. In this case, B only transfers five yuan, and A’s account Is there really an extra ten yuan, and the extra five yuan should be made up by the bank? This is obviously unreasonable. In order to avoid the problems caused by the alternate execution of T1 and T2, the easiest way is to queue up T1 and T2 in the database, which will be slow to death. Therefore, for some database operations corresponding to state transitions in the real world, it is not only necessary to ensure that these operations are performed in an atomic manner, but also to ensure that other state transitions will not affect this state transition. This rule is called It is isolation.
In the above discussion, we have found that the problem caused by modifying a table in the database, just doing the query operation has no effect on the database management system. We call these limited operation sequences on the database. It is a database transaction.
the status of the transaction
Now we already know that a transaction is actually an abstract concept, consisting of a limited sequence of database operations, corresponding to one or more database operations. The different stages of these operations roughly have the following states:
active
When the database operation corresponding to the transaction is being executed, we say that the transaction is in an active state
partially committed
When the last operation in the transaction is completed, but the effect is not flushed to disk because the operations are all performed in memory, we say that the transaction is in a partially committed state
failed
When a transaction is in an active or partially committed state, it may encounter some errors (database errors, operating system errors, or direct power failure, etc.) and cannot continue to execute, or artificially suspend the execution of the current transaction. Says the transaction is in a failed state.
aborted
If the transaction is half executed and becomes a failed state, such as the transfer transaction we talked about above, when the money of small B is deducted and the money of small A is not increased, an error occurs and the current transaction is in a failed state, then It is necessary to adjust the account balance of small B to the amount before the transfer, in other words, to undo the impact of the failed transaction on the current database. We call this undo process a rollback. When the rollback operation is completed, that is, the database is restored to the state before the transaction was executed, we say that the transaction is in the state of the middle finger.
submitted
When a transaction in a partially committed state synchronizes all modified data to disk, we can say that the transaction is in a committed state.
As the data operations corresponding to the transaction are executed to different stages, the state of the transaction is constantly changing. A basic state transition is shown in the following figure:
Problems encountered with transaction parallelism
Letting T1 and T2 queue to execute at the expense of performance is not what we want. What we want is to maintain the isolation of transactions, and to make the server as high as possible when processing multiple transactions that access the same data, and discard some of them. isolation to improve performance. We know that the database is a client/server architecture software. For the same server, there can be several clients connected to it. After each client is connected to the server, it can be called a session (Session) . Each client can issue a request statement to the server in its own session, and a request statement may be part of a transaction, that is, the server may process multiple transactions at the same time. Now let's look at what problems can arise from assuming parallel execution of transactions:
- dirty writing
If a transaction modifies data modified by another uncommitted transaction, it means that a dirty write has occurred. As shown below:
Session A and Session B each start a transaction. The transaction in Session B first changes the userId from 1 to 50, and then Session A changes the userId line of data to 80, and then submits it. Session B encounters an error or other situation during the execution process and performs a rollback, and then the update of Session A no longer exists. This phenomenon is generally called dirty writing, which is a very serious phenomenon.
- dirty read
If a transaction reads data modified by another transaction that has not been committed by the transaction, it means that a dirty read has occurred. As shown below:
As shown in the figure above, Session A and Session B have each opened a transaction. The transaction of Session B first changes the money column of the row whose userId is 1 to 1. Session A queries the records that Session B has not yet submitted, and then Session B B performs a rollback, then Session A seems to have read data that does not exist. This phenomenon is called dirty reading.
- non-repeatable read
If a transaction can only read the data modified by another transaction that has been committed, and every time the other transaction modifies and commits the data, the transaction can query the latest value, which means that a non-repeatable occurs. read.
The modification statement in Session B is an implicit transaction. Implicit transaction means that the transaction is automatically committed after the statement ends. These transactions all modify the value of the record column money whose userId is 1. After each transaction is committed, if the latest value can be obtained from the query in Session A, this phenomenon means that a non-repeatable read occurs.
- hallucinations
If a transaction first queries some records according to certain conditions, and then another transaction inserts records that meet these conditions into the table, when the original transaction queries again according to the conditions, the records inserted by another transaction can also be read. out, it means that a phantom reading has occurred. The schematic diagram is as follows:
Session A first finds some records according to the condition that money is greater than 0, and the records that meet the conditions are inserted into Session B. The transaction in Session A then checks the records inserted in Session B according to the conditions. This phenomenon is called magic. read. If we delete the userId in Session B, and Session A finds that there are fewer when querying according to the conditions, is this a phantom read? No, phantom reading emphasizes that when a transaction reads records multiple times under the same condition, it reads records that have not been read before. For the records read before, which cannot be read later, this should be counted as non-repeatable read.
The isolation level of the transaction is derived from the isolation of the transaction
Above we have introduced the problems that may be caused by concurrent execution of transactions. These problems are also prioritized, and we will sort them according to the severity of the problems:
dirty write > dirty read > non-repeatable read > phantom read
The performance improvement that we mentioned above by giving up part of the isolation is to establish an isolation level to solve the problems caused by concurrent transaction execution. The lower the isolation level is registered, the more serious the problem is likely to occur. The SQL standard specifies the following isolation levels:
- READ UNCOMMITED: uncommitted read
- READ COMMITED: read has been committed
- REPEATABLE READ: Repeatable read
- SERIALIZABLE: Serializable
The SQL standard stipulates that for different isolation levels, concurrent transactions can have problems of different severity, as follows:
Dirty writes are a super serious problem in either case, so under any isolation level, dirty writes are unlikely to happen.
Different database vendors support the four isolation levels specified by the SQL standard differently. For example, Oracle supports the READ COMMITED and SERIALIZABLE isolation levels. Although MySQL supports four isolation levels, under the condition of repeatable read isolation level, MySQL can prevent the occurrence of phantom read problems.
SQL Server supports the SNAPSHOT level in addition to the standard. PostgreSQL only supports READ COMMITED, REPEATABLE READ, and SERIALIZABLE internally. PostgreSQL treats READ UNCOMMITED as READ COMMITED. The higher the isolation level, the stricter the request locking of the read operation, the longer the lock is held, the higher the consistency, and the lower the concurrency performance.
in conclusion
Real-world state modifications are mapped to the database world. We need to ensure that:
atomicity
For indivisible operations, it either succeeds or fails.
isolation
For state transitions in the real world corresponding to certain database operations, it is not only necessary to ensure that these operations are performed in an atomic manner, but also to ensure that other state transitions will not affect this state transition. This rule is called for isolation.
consistency
Some constraints in the real world must be maintained in the software world, such as the maximum value of the RMB. If the data in the database all conform to the constraints of the real world, we say that the data is consistent.
Persistence
When a state in the real world is completed, the result of this transition will be retained forever, this rule we call persistence. When mapping real-world state transitions to the database world, persistence means that all data modified by database operations corresponding to the transition should be preserved on disk.
From this, we lead to the concept of transaction. We call one or more database operations that need to ensure atomicity, isolation, consistency and durability as a transaction. From the isolation of the transaction, we derive the isolation level of the transaction, sacrificing a little isolation in exchange for performance improvement. Transactions may correspond to multiple complex operations in the database from which we derive the state of the transaction.
References
- How MySQL Works: Understanding MySQL from the Roots by 4919 Kids
- Introduction and Changes of Transaction Isolation Level of PostgreSQL
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。