Hello everyone, this is Liang Xu.
Creating, deleting and modifying files are very common operations performed by users in Linux systems. Everyone knows that when you use the rm
command to delete a single file in a Linux system, it is done almost instantly. But if the number of files is large, the delete operation will take a long time to complete.
Have you ever thought about how long it takes to delete 500,000 small files?
The purpose of writing this article is to find out the fastest way to delete huge files in Linux. Through testing, it is found that the rm
command is really weak!
We will start with some simple file deletion methods, and then compare the speed of different methods to complete the file deletion task. See which way to delete the fastest.
1. Several ways to delete files
To delete files in a Linux system, the most commonly used command is the rm
command. I believe everyone is familiar with this command. Let's briefly review some examples of the rm
$ rm -f testfile
-f
option in the above command indicates that the file will be forcibly deleted without asking for confirmation.
$ rm -rf testdirectory
This command will delete testdirectory
and all its contents (the -r
option used is to delete files recursively).
And to delete a directory, we have another command, that is rmdir
, but it will only delete the directory when the directory is empty.
$ rmdir testdirectory
Now we look at some other different ways to delete files in Linux.
One of my favorite methods is to use the find
command and then delete it. find
command is a very convenient tool that can be used to search for files based on their type, size, creation date, modification date and more different conditions.
Let's look at an find
command using -exec
to call the rm
command.
$ find /test -type f -exec rm {} \;
The above command will delete all files in the /test
First, the find
command will find all files in the directory, and then for each search result, it will execute the rm
command.
Let's look at some different methods that can be used with the find
command to delete files.
$ find /test -mtime +7 -exec rm {} \;
In the above example, the find
command will search /test
directory that were modified 7 days ago, and then delete each file.
$ find /test -size +7M -exec rm {} \;
In the above example, /test
will be searched, and then they will be deleted.
In all of our listed find
command example, will be called for each file found rm
command. For example, in the last find
command above, if there are 50 files larger than 7M in the result, then the rm
command will be called 50 times to delete the files. And such an operation will take longer.
In addition to find
aid in -exec
call parameters rm
command, there is a better option is to use -delete
option. for example:
$ find /test -size +7M -delete
The effect achieved is the same as the previous command.
2. What is the fastest command to use when deleting huge files?
Not much to say, we go directly to the test.
First create 500,000 files with a simple bash for loop.
$ for i in $(seq 1 500000); do echo testing >> $i.txt; done
In the above command, 500,000 txt files will be created in the current working directory with names ranging from 1.txt to 500,000.txt. Each file contains testing
, so the file size is at least several kilobytes.
After creating 500,000 files, we will try to delete them in multiple ways to see which method is the fastest to delete huge files.
Round 1: rm command
First, let us use the simple rm
command, and at the same time we use the time
command to time.
$ time rm -f *
-bash: /bin/rm: Argument list too long
real 0m11.126s
user 0m9.673s
sys 0m1.278s
We can see that the execution result of the rm
Argument list too long , which means that the command has not been deleted because the rm
command is too large to complete, so it just lay flat and went on strike.
Do not pay attention to the time displayed by the time
rm
command did not complete its operation. The time
command only displays how long your command has been executed, and does not care about the final result of the command.
Round 2: find command using the -exec parameter
find
command with the -exec parameter we saw earlier.
$ time find ./ -type f -exec rm {} \;
real 14m51.735s
user 2m24.330s
sys 9m48.743s
From the output we got using the time
command, we can see that it takes 14 minutes and 51 seconds to delete 500,000 files from a single directory. This is quite a long time, because for each file, a separate rm
command will be executed until all files are deleted.
Round 3: find command with -delete parameter
Now let us test the elapsed time by using the -delete
find
$ time find ./ -type f -delete
real 5m11.937s
user 0m1.259s
sys 0m28.441s
The deletion speed was greatly improved, and it only took 5 minutes and 11 seconds! This is an amazing improvement in speed when you delete millions of files in Linux.
Round 4: Perl language
Now let's look at how deleting files using the Perl language works, and how fast it compares to other deletion methods we have seen before.
$ time perl -e 'for(<*>){((stat)[9]<(unlink))}'
real 1m0.488s
user 0m7.023s
sys 0m27.403s
As can be seen from the results, Perl only took about 1 minute to delete 500,000 files in the directory. Compared with the other find
and rm
commands we have seen before, this speed is very fast!
However, if you are interested in using more complex options when using Perl, you need to have a certain understanding of Perl regular expressions.
Round 5: rsync command
There is also a less-used and little-known method that can be used to delete a large number of files in a folder. This method is our famous tool rsync
. Its basic usage is for two local and remote locations in Linux. Transfer and synchronize files between.
Now let's take a look at how to use the rsync
command to delete all files in the folder. It's actually very simple. We can delete the target directory with a large number of files by synchronizing it with the empty directory.
In our example, the /test
directory (target directory) has 500,000 files, and we create an empty directory (source directory) blanktest
Now, we will use the -delete
rsync
command, which will delete all non-existent files in the source directory in the target directory.
$ time rsync -a --delete blanktest/ test/
real 2m52.502s
user 0m2.772s
sys 0m32.649s
As you can see, it only took 2 minutes and 52 seconds to complete the deletion.
Therefore, compared with the find
command, if you want to empty a directory containing millions of files, it is better to rsync
3. Summary
The following table summarizes the speed of deleting 500,000 files in different ways in Linux for your reference.
Order | Spend time |
---|---|
rm command | Unable to delete a large number of files |
Find command using -exec parameter | 14 minutes 51 seconds |
Find command with -delete parameter | 5 minutes 11 seconds |
Perl | 1 minute |
rsync command | 2 minutes 52 seconds |
Finally, recently, many friends asked me for the Linux learning roadmap , so based on my experience, I spent a month staying up late in my spare time and compiled an e-book. Whether you are in an interview or self-improvement, I believe it will be helpful to you!
Give it to everyone for free, just ask you to give me a thumbs up!
e-book | Linux development learning roadmap
I also hope that some friends can join me to make this e-book more perfect!
Gain? I hope that the old guys will have a three-strike combo, so that more people can read this article
Recommended reading:
- dry goods | Essential resources for programmers, advanced architects, free
- book list | programmer must-read classic book list (HD PDF version)
Welcome to follow my blog: Liang Xu Linux Tutorial Network , full of dry goods!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。