Author: Liu Kaiyang
The Beijing DBA of the Aikesheng delivery service team has a strong interest in learning about databases and peripheral technologies, and likes to read books and pursue technology.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.
1. Problem
A few days ago, I chatted with a customer and complained that the data transmission tool before the server was not easy to use. I was asked which transmission tool is more efficient? I was really asked, after all, it has not been practiced, so I will test it today.
A bit of a hasty search, but there are still a lot of tools.
Data transfer tool: ftp sftp scp rsync tftp
2. Preparation
In addition to other factors such as network bandwidth, let's make a comparison of file transfer between Linux servers to see which tool is the fastest and most suitable for my scenario.
Let's compare the speed and simplicity of the above transfer tools in terms of the actual size of the file and the number of files:
First prepare the environment, configure two machines to ssh mutual trust, create two folders, and create a large file of about 50G and 51200 small files of 1M in different ways:
# 制造50G大文件
[root@yang-01 big]# fallocate -l 50G 50g_file
[root@yang-01 big]# ll
total 52428856
-rw-r--r-- 1 root root 53687091200 Apr 10 17:55 50g_file
[root@yang-01 big]# du -sh *
51G 50g_file
# 制造51200个1M文件
[root@yang-01 many]# seq 51200 | xargs -i dd if=/dev/zero of=1m_file{} bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00468854 s, 224 MB/s
······
[root@yang-01 many]# ls | wc -l
51200
[root@yang-01 test]# du -sh many
51G many
Tips: There are two ways to create large files here. You can use them for reference. The generated file is indeed 50G, and the number of bytes corresponds to it. As for why the du tool calculates 51G, this is not the numerical conversion between 1024 and 1000 of the computer. The reason for this is because the du command calculates the total number of blocks used by the file in the file system. It may occur that some of the blocks used are not completely occupied, resulting in a large value.
#### 2.1. Test SCP
The first test is scp, which is the most frequently used remote transmission tool by the author. Usually, the Linux system will have this tool installed.
####### 1 * 50G 文件测试
[root@yang-02 big]# scp /opt/test/big/50g_file root@yang-01:/opt/test/re/
50g_file 100% 50GB 135.5MB/s 06:17
####### 51200 * 1M 文件测试
[root@yang-02 many]# time scp /opt/test/many/1m_file* root@yang-01:/opt/test/re/
1m_file1 100% 1024KB 22.5MB/s 00:00
1m_file10 100% 1024KB 35.8MB/s 00:00
1m_file100 100% 1024KB 14.8MB/s 00:00
1m_file1000 100% 1024KB 32.9MB/s 00:00
1m_file10000 100% 1024KB 35.7MB/s 00:00
······
1m_file9998 100% 1024KB 113.4MB/s 00:00
1m_file9999 100% 1024KB 96.5MB/s 00:00
real 20m43.875s
user 4m2.448s
sys 2m52.604s
[root@yang-01 re]# ls | wc -l
51200
illustrate:
- Similar to the cp command, it is slightly slower than the copy operation due to encrypted transmission of files across machines;
- If there is a situation where the ssh protocol cannot be used in the production environment, nc file transfer can be used;
- Advantages: The tool occupies less system resources of the machine, has less impact, and is easy to use;
2.2. Test FTP
Let's test ftp and see how it works.
####### 1 * 50G 文件测试
[root@yang-01 re]# ftp yang-02
Connected to yang-02 (192.168.88.72).
220 (vsFTPd 3.0.2)
Name (yang-02:root): root
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> get /opt/test/big/50g_file /opt/test/re/50g_file
local: /opt/test/re/50g_file remote: /opt/test/big/50g_file
227 Entering Passive Mode (192,168,88,72,38,232).
150 Opening BINARY mode data connection for /opt/test/big/50g_file (53687091200 bytes).
226 Transfer complete.
53687091200 bytes received in 150 secs (359091.49 Kbytes/sec)
ftp> quit
221 Goodbye.
####### 51200 * 1M 文件测试
[root@yang-01 re]# time ftp yang-02
Connected to yang-02 (192.168.88.72).
220 (vsFTPd 3.0.2)
Name (yang-02:root):
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> prompt off
Interactive mode off.
ftp> cd /opt/test/many
250 Directory successfully changed.
ftp> mget * .*
local: 1m_file1 remote: 1m_file1
227 Entering Passive Mode (192,168,88,72,156,228).
150 Opening BINARY mode data connection for 1m_file1 (1048576 bytes).
226 Transfer complete.
······
1048576 bytes received in 0.00337 secs (311057.86 Kbytes/sec)
local: . remote: .
227 Entering Passive Mode (192,168,88,72,213,4).
550 Failed to open file.
Warning: embedded .. in .. (changing to !!)
local: !! remote: !!
227 Entering Passive Mode (192,168,88,72,223,91).
550 Failed to open file.
ftp> quit
221 Goodbye.
real 14m32.032s
user 0m12.857s
sys 3m21.131s
[root@yang-01 re]# ls | wc -l
51200
illustrate:
- Based on the TCP transmission protocol, the ftp client sends commands to the server to download, upload or change directories;
- It is suitable for file collection on the intranet and access to public files.
- For sporadic file transfer, in order to ensure the security between hosts, this method is generally not used for data transfer.
2.3. Test SFTP
sftp is supplemented with encryption/decryption technology on the basis of ftp. Let's also look at the difference in transmission speed with ftp:
####### 1 * 50G 文件测试
[root@yang-01 re]# sftp root@yang-02
Connected to yang-02.
sftp> get ./big/50g_file /opt/test/re/50g_file
Fetching /./big/50g_file to /opt/test/re/50g_file
/./big/50g_file 100% 50GB 128.7MB/s 06:37
sftp> quit
####### 51200 * 1M 文件测试
[root@yang-01 re]# time sftp root@yang-02
Connected to yang-02.
sftp> get ./many/1m_file* /opt/test/re/
Fetching /./many/1m_file1 to /opt/test/re/1m_file1
/./many/1m_file1 100% 1024KB 77.3MB/s 00:00
······
Fetching /./many/1m_file9999 to /opt/test/re/1m_file9999
/./many/1m_file9999 100% 1024KB 118.0MB/s 00:00
sftp>
sftp> quit
real 19m43.154s
user 4m52.309s
sys 4m47.476s
[root@yang-01 re]# ls | wc -l
51200
illustrate:
- Compared with the ftp tool, the ssh transmission is encrypted on the basis of improving the security level, and the transmission rate of sftp is reduced by about 70%;
- The use of sftp and ftp requires building services, which is troublesome to use, and the transmission rate of sftp is similar to scp.
2.4. Test RSYNC
Take another look at the tests of the rsync tool:
####### 1 * 50G 文件测试
[root@yang-02 big]# time rsync -av ./50g_file root@yang-01:/opt/test/re/50g_file
sending incremental file list
50g_file
sent 53,700,198,488 bytes received 35 bytes 107,940,097.53 bytes/sec
total size is 53,687,091,200 speedup is 1.00
real 8m17.039s
user 5m36.160s
sys 2m41.196s
####### 51200 * 1M 文件测试
[root@yang-02 many]# time rsync -av ./1m_file* root@yang-01:/opt/test/re/
sending incremental file list
1m_file1
1m_file10
1m_file100
······
1m_file9998
1m_file9999
sent 53,702,886,375 bytes received 972,872 bytes 58,278,740.37 bytes/sec
total size is 53,687,091,200 speedup is 1.00
real 15m21.548s
user 5m46.497s
sys 2m38.581s
illustrate:
- rsync is slightly faster than scp and is installed by default in Centos;
- Uses less bandwidth because rsync does decompression as it transfers blocks between servers on both ends.
- Advantages: rsync will only synchronize files that have changed. If there is no change, rsync will not perform overwriting processing, that is, rsync is suitable for incremental synchronization (due to scenario reasons, there is not much demonstration here);
- If a large number of files are transferred, the rsync tool may cause high disk I/O, and if there is a database in the file system, it will have a certain impact.
2.5. Test TFTP
Similar to sftp, measure the speed of tftp:
####### 1 * 50G 文件测试
[root@yang-01 re]# time tftp yang-02
tftp> get ./big/50g_file
real 10m30.114s
user 0m6.029s
sys 1m16.888s
[root@yang-01 re]# ll
total 1805832
-rw-r--r-- 1 root root 1849168384 Apr 11 17:54 50g_file
[root@yang-01 re]# du -sh *
1.8G 50g_file
####### 1 * 1G 文件测试
[root@yang-01 re]# time tftp yang-02
tftp> get ./many/1g_file1
tftp> quit
real 5m54.090s
user 0m29.190s
sys 2m58.866s
[root@yang-01 re]# ll
total 1048576
-rw-r--r-- 1 root root 1073741824 Apr 11 18:09 1g_file1
illustrate:
- The tftp tool is based on the UDP protocol for data transmission, and also needs to configure related services, which is too troublesome to use;
- Test 50G large file transfer is only 1.8G, timeout interrupt occurs, fail, and the time is long;
- The transfer time of a 1G file is 354s, which is estimated to be 50 times longer, and the test of this tool is abandoned.
2.6. Supplementary NC
The editor added that the nc tool must have encountered a scenario where the above tools cannot complete data transmission. We measured the speed:
####### 1 * 50G 文件测试
[root@yang-02 big]# nc 192.168.88.71 10086 < /opt/test/big/50g_file
[root@yang-01 re]# time nc -l 10086 > 50G_file
real 2m30.663s
user 0m9.232s
sys 2m16.370s
####### 51200 * 1M 文件测试
[root@yang-01 many]# tar cfz - *|nc 192.168.88.71 10086
[root@yang-01 re]# time nc -l 10086|tar xfvz -
1m_file1
1m_file10
1m_file100
···
1m_file9997
1m_file9998
1m_file9999
real 11m38.400s
user 3m47.051s
sys 2m33.923s
illustrate:
- This tool is very powerful and can monitor and scan any TCP/UDP port;
- Advantages: The speed is much faster than scp, and there is almost no network protocol overhead;
- Transferring files across machines is only one of its functions. Other functions are waiting for you to explore. It is said that you can measure the speed of the network.
2.7. Supplementary python tools
If there are still scenarios that nc can't solve, let's try using the SimpleHTTPServer module of python:
####### 1 * 50G 文件测试
[root@yang-02 big]# python -m SimpleHTTPServer 10086
Serving HTTP on 0.0.0.0 port 10086 ...
192.168.88.71 - - [13/Apr/2022 16:02:15] "GET /50g_file HTTP/1.1" 200 -
[root@yang-01 re]# wget http://192.168.88.72:10086/50g_file
--2022-04-13 16:02:15-- http://192.168.88.72:10086/50g_file
Connecting to 192.168.88.72:10086... connected.
HTTP request sent, awaiting response... 200 OK
Length: 53687091200 (50G) [application/octet-stream]
Saving to: ‘50g_file’
100%[==================================================================================>] 53,687,091,200 358MB/s in 2m 35s
2022-04-13 16:04:50 (330 MB/s) - ‘50g_file’ saved [53687091200/53687091200]
####### 51200 * 1M 文件测试
[root@yang-02 many]# python -m SimpleHTTPServer 10086
Serving HTTP on 0.0.0.0 port 10086 ...
192.168.88.71 - - [13/Apr/2022 19:46:02] "GET /1m_file1 HTTP/1.1" 200 -
192.168.88.71 - - [13/Apr/2022 19:46:02] "GET /1m_file2 HTTP/1.1" 200 -
······
192.168.88.71 - - [13/Apr/2022 19:55:21] "GET /1m_file51200 HTTP/1.1" 200 -
[root@yang-01 re]# cat liu.sh
#!/bin/bash
for ((i=1;i<=51200;i++))
do
echo "http://192.168.88.72:10086/1m_file$i"
done
[root@yang-01 re]# bash liu.sh > liu.list
[root@yang-01 re]# cat liu.list | wc -l
51200
[root@yang-01 re]# wget -i liu.list
--2022-04-13 19:46:02-- http://192.168.88.72:10086/1m_file1
Connecting to 192.168.88.72:10086... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1048576 (1.0M) [application/octet-stream]
Saving to: ‘1m_file1’
100%[=====================================================================================>] 1,048,576 --.-K/s in 0.05s
2022-04-13 19:46:02 (19.9 MB/s) - ‘1m_file1’ saved [1048576/1048576]
·······
100%[=====================================================================================>] 1,048,576 --.-K/s in 0.08s
2022-04-13 19:55:21 (11.9 MB/s) - ‘1m_file51200’ saved [1048576/1048576]
FINISHED --2022-04-13 19:55:21--
Total wall clock time: 9m 19s
Downloaded: 51200 files, 50G in 8m 11s (104 MB/s)
illustrate:
- A lightweight HTTP protocol web server is implemented using python's SimpleHTTPServer module or python3's http.server module;
- Almost all Linux distributions have python built-in, so this tool is also more convenient to use.
- Under normal circumstances, multi-file transfer is a bit troublesome, and the file names may be irregular, and single files need to be processed one by one.
3. Summary
- Large file transfer rate: FTP > NC > python tools > SCP > SFTP > RSYNC .
- Small file transfer rate: python tools > NC > FTP > RSYNC > SFTP > SCP.
- Different tools have different application scenarios. RSYNC is very fast in the case of incremental synchronization or regular archiving; FTP tools are troublesome to build and deploy, and are suitable for file collection on the intranet and public file retrieval, with low security.
- If the SSH connection or the use of port 22 is closed due to production security restrictions, connection access based on other protocols can be used, and the nc tool is recommended.
- Comparing each tool for the number of files of the same capacity, small files will cause the CPU to read the matching information multiple times, increasing the CPU burden, and the number of IOs will also increase. There is a certain IO bottleneck. Large files of the same capacity and multiple small files File transfer comparison will have a certain speed advantage.
- If it is a large number of file transfers, you can set up multiple transfer processes in batches according to the characteristics of the file names, so as to achieve the effect of concurrent transfer from the source server and better use of network bandwidth.
4. Remarks
- Due to operational limitations, the above test tools ignore the time of interactive login.
- The test is a virtual machine environment, and there is no guarantee that the variables are absolutely controllable (influencing factors such as network bandwidth, throughput, disk performance, etc.), so the time nodes given in this article are for reference only.
- The instructions and expansion of each tool are not given here. There is a lot of information on the Internet, and the bosses can crawl at will.
Easter egg: The editor found that the transmission speed of each tool 50 1G is faster than 1 50G file, so in the same environment, will there be a situation where the 50G capacity (cross-matching of single file capacity and quantity) The transmission speed reaches fastest? Interested friends can continue to study.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。