The zero copy mentioned in this article is based on network transmission.
What is zero copy
Zero copy does not mean that you do not need to copy, but to reduce the number of unnecessary copies.
Traditional IO process
Usually when we need to access hard disk data, the user process needs to use the kernel to access the hard disk data; the user informs the kernel by calling system methods, such as read(), write(), etc., so that the kernel can do the corresponding things.
read();
The traditional process of reading data:
The copy process before DMA is shown in the figure above:
- The user calls the read system method
- After the CPU receives the read request, it initiates a corresponding command to the disk
- The hard disk prepares the data, puts the data in the buffer, and initiates an IO interrupt instruction to the CPU
- After the CPU receives the interrupt instruction, it pauses what it is doing and reads the data from the disk into the kernel buffer
- Immediately afterwards, the CPU copies the data to the user buffer
- At this point, the user can access the data
above 160f197f6d7407, the copying of data involved requires the CPU to complete. The CPU is a very precious resource. When the CPU is copying the data, it cannot do other things. If the transferred data is very large, the CPU has been copying the data. The inability to perform other tasks is very costly.
DMA
In essence, DMA technology is an independent chip on the computer motherboard. When the computer needs to transfer data between memory and I/O devices, it no longer needs the CPU to perform time-consuming IO operations, but is done through the DMA controller. , The process is as follows.
The above figure shows that the data copy is done by DMA, and the CPU does not need to perform some time-consuming IO operations.
The following figure can express the process of file transfer more vividly:
The steps are explained as follows:
- The user process calls the system function read()
- After the kernel receives the corresponding instruction, it goes to the disk to read the file into the kernel buffer. After the data is ready, it initiates an IO interrupt
- After the CPU receives the IO interrupt signal, it stops its work and copies the data in the kernel buffer to the user process
- The user process calls the system function write() after receiving the data, and the CPU copies the data to the socket buffer
- The DMA controller copies the data in the socket buffer to the network card for data transmission.
The above traditional IO data copy has a lot of room for improvement in performance,
It can be seen from the above figure that in the case of file transfer, we copy the data to the user data buffer, and the user process sends the file directly without any data processing. Therefore, this step is redundant and can be omitted.
Achieve zero copy
The realization of zero copy is mainly optimized for the number of context switching and copying, and the goal of optimization is achieved by reducing the number of context switching and data copying.
Implementation method 1: mmap(..) + write(..)
what is mmap
The full name of mmap is Memory Mapped Files, which is a method of memory file mapping, which maps a file or other object to the address space of the process, and realizes the one-to-one mapping relationship between the file disk address and a virtual address in the process virtual address space. After generation, the user process can manipulate the file data in the memory through the pointer, and the system will automatically write the manipulated data to the disk, without the need to call read(), write() and other system calls to manipulate the data.
Implementation process
Use mmap() function to replace read() function. mmap maps the data in the kernel buffer to user space. There is no need to copy data between user space and the kernel, and they can share data.
As can be seen in the figure, the data is no longer copied to the user buffer
- After the user process calls the system function mmap(), the DMA will copy the data from the disk to the kernel buffer, and the user process shares this memory data with the kernel buffer;
- The user process calls the write() function, and the CPU copies the data from the kernel buffer to the socket buffer;
- Finally, DMA copies the data in the socket buffer to the network card for data transmission.
mmap reduces one copy of data and improves performance, but there are still 4 user mode and kernel mode switches, which is not the most ideal zero copy.
How to reduce context switching?
The user process does not have the authority to directly manipulate the data on the disk, and the kernel has the authority of God, so the user process can transfer the task to the kernel by calling system functions (such as read, wirte).
A system call will have two context switches, first switch from the user mode to the kernel mode to execute the task. After the task is executed, switch from the kernel mode to the user mode, and the user process continues to execute the logic.
Switching back the context takes time. Each context switch takes a few nanoseconds to a few microseconds. It seems that the time is very short, but it will be multiplied under concurrency.
Therefore, we need to reduce the number of context switches, to reduce the number of context switches, we need to reduce the number of system function calls .
Implementation method 2: sendfile function
After Linux 2.1 version, a system call function sendfile() is provided for sending files. The function is as follows:
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
Parameter Description:
out_fd: the file descriptor of the destination side
in_fd: the file descriptor of the source
offset: the offset of the source
count: the length of the copy
Returns the length of the actual copied data
The sendfile function is used to replace the two functions read and write, so that one system call can be reduced and the overhead of two context switches can be reduced.
$ ethtool -k eth0 | grep scatter-gather
scatter-gather: on
- DMA copies the data in the disk to the kernel buffer
- Then transfer the file descriptor and data length in the kernel buffer to the socket (no need to copy the data to the socket)
- The SG-DMA controller of the network card copies the data in the kernel buffer to the network card to complete the data transmission
The above process only involves one system function call, two context switches, and two DMA data copies. There is no need for the CPU to copy data, realizing a true zero copy.
Comparison of mmap and sendfile
- All are achieved by calling the API functions provided by the operating system
- mmap is a file memory mapping. The user process supports read and write operations on the data in the mapped memory, and the final result will be reflected on the disk
- After sendfile reads the data into the kernel buffer, the network card copies the data through the SG-DMA controller
- The realization of zero copy by mmap involves two system function calls, 4 context switches, and three data copies, which is not a true zero copy
- sendfile has only one system function call, 2 context switches, 2 necessary data copies, realizing zero copy in the true sense
- mmap optimization is more on write requests, sendfile is more on optimizing read requests
Kernel buffer (PageCache)
PageCache is a high-speed buffer of the disk. Since it is a very time-consuming operation to find data in the disk, some of the data in the disk is cached in PageCache, and the operations of reading and writing the disk are converted to the memory to improve the efficiency of reading and writing.
PageCache memory space is much smaller than that of disk, so it is impossible for us to put all the data on the disk, so what data do we need to read into the memory and how big is the read?
PageCache uses the pre-reading function. If we need to read 32kb of data, but the data loaded into the memory is not only 32kb, it will read the data in units of pages (64kb per page), so it will not only read 0- For 32kb data, 32-64kb data will also be read, so the cost of reading the 32-64kb part of the data is very small. If it is used by the process before the memory is eliminated, the benefit is very large.
So PageCache has two main benefits:
- Cache recently accessed data
- Pre-reading function
To put it bluntly, PageCache was born to improve disk read and write performance.
to sum up
- Zero copy does not mean that you do not need to copy, but to reduce unnecessary copies, and it is necessary to avoid using the CPU for data copy.
- DMA copy technology is a good substitute for CPU copy
- The sendfile() function realizes zero copy in the true sense, requiring only 2 DMA copies, 1 system function call, and 2 context switches
discuss
- PageCahe has limited memory. If we read a large file, PageCahe will soon be full. If PageCahe is used temporarily for a long time, then other hot data will not be able to use the benefits of PageCahe, which will cause the performance of the disk to decrease. do?
Answer: Let me talk about the answer first: In this case, asynchronous IO + direct IO, we should find a way to bypass PageCache. Large files should not use PageCache. Direct IO will directly bypass PageCacheIO. Reading is blocked. You can consider using asynchronous IO instead.
- Why does RocketMQ use mmap instead of sendfile?
Everyone is welcome to discuss.
Text/Carpenter
Pay attention to Dewu Technology, and work hand in hand to the cloud
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。