6
头图

There are many ways to store data today, and hard drives are the first choice for most users because of their advantages in price and data protection. However, compared with memory, hard disks are several orders of magnitude slower in terms of IO read and write, so why do you prefer hard disks?

The first thing that needs to be mentioned is that the reason why the operation of the disk is slow is mainly because the reading and writing of the disk is time-consuming. There are three main parts of reading and writing time: seek time + rotation time + transmission time , of which the seek time is the longest. Because the track seek needs to move the magnetic head to the corresponding track, the magnetic arm is driven by the motor to move, which is a mechanical movement and therefore takes a long time. At the same time, our operations on the disk are usually random read and write, so we need to move the magnetic head to the corresponding track frequently, which prolongs the time consuming and makes the performance relatively low.

It seems that if you want to make the disk read and write speed faster, as long as you do not use random read and write, or reduce the number of random times, you can effectively improve the disk read and write speed. How to do it?

sequential read and write

Let's talk about the first method first, how to use sequential read and write instead of random read and write? As mentioned above, the seek time is the longest, so the most intuitive idea is to save this part of the time, and sequential IO can just meet the needs.

Additional write is a typical sequential IO, and a typical product optimized using this idea is a message queue. Taking the popular Kafka as an example, in order to achieve high-performance IO, Kafka uses many optimization methods, among which the optimization method of sequential writing is used.

Kafka provides message persistence capability with a time complexity of O(1), and can ensure constant time complexity access performance even for TB-level data or more. For each partition, it sequentially writes the messages received from the Producer to the corresponding log file, and only opens a new file after a file is full. When consuming, it also starts from a certain global position, that is, a certain position in a certain log file, and reads the messages in sequence.

reduce random

After reading the sequential method, let's take a look at the method of reducing the number of random writes. In many scenarios, in order to facilitate our subsequent reading and operation of data, we require that the data written to the hard disk be in order. For example, in MySQL, the index is organized by B+ tree in the InnoDB engine, and the MySQL primary key is a clustered index (an index type in which data and index data are placed together). Since the data and index data are placed together, Then when data is inserted or updated, we need to find the position to be inserted, and then write the data to a specific position, which generates random IO. Therefore, if we write the data to the .ibd file every time we insert and update data, then the disk must also find the corresponding record, and then update it, the IO cost and search cost of the whole process are very high, and the performance of the database and efficiency will be greatly reduced.

In order to solve the problem of write performance, InnoDB introduced the WAL mechanism, more precisely, the redo log. Next, I will briefly introduce Redo Log.

The InnoDB redo log is a sequential write, fixed-size ring log. There are two main functions:

  • Improve the efficiency of data writing by the InnoDB storage engine
  • Guaranteed crash-safe capability

Here we only care how it improves the efficiency of writing data. The following figure is a schematic diagram of redo log.

As can be seen from the figure, the writing of red olog is written sequentially, and there is no need to find a specific index position, but simply append from the write-pos pointer position.

Secondly, when a write transaction or update transaction is executed, InnoDB first fetches the corresponding Page and then modifies it. When the transaction is committed, the redo log buffer in the memory is forcibly flushed to the hard disk. If binlog is not considered, we can think that the transaction execution can return successfully, and the operation of writing to the DB is performed asynchronously by another thread.

Then, the InnoDB Master Thread can periodically flush the dirty pages in the buffer pool, that is, the pages we modified above, to the disk. At this time, the modified data is actually written to the .ibd file.

Summarize

Many times, we have to store data on the hard disk, but because the hard disk is too "slow" relative to the memory. So we have to figure out a way to improve performance. The article summarizes two methods, the first is additional writing, which uses sequential IO to achieve fast writing; the second is the WAL mechanism introduced in many databases to improve performance. The article takes the InnoDB storage engines of Kafka and MySQL as examples. In addition to these two, interested students can also take a look at how the LSM tree, etcd are written quickly.

Recommended reading

[Vernacular popular science] 10s understand H5 from scratch

interviewer asked, is Redis single-threaded or multi-threaded? I am


云叔_又拍云
5.9k 声望4.6k 粉丝

又拍云是专注CDN、云存储、小程序开发方案、 短视频开发方案、DDoS高防等产品的国内知名企业级云服务商。