2

Hello everyone, this is Liang Xu.

Creating, deleting and modifying files are very common operations performed by users in Linux systems. Everyone knows that when you use the rm command to delete a single file in a Linux system, it is done almost instantly. But if the number of files is large, the delete operation will take a long time to complete.

Have you ever thought about how long it takes to delete 500,000 small files?

The purpose of writing this article is to find out the fastest way to delete huge files in Linux. Through testing, it is found that the rm command is really weak!

We will start with some simple file deletion methods, and then compare the speed of different methods to complete the file deletion task. See which way to delete the fastest.

1. Several ways to delete files

To delete files in a Linux system, the most commonly used command is the rm command. I believe everyone is familiar with this command. Let's briefly review some examples of the rm

$ rm -f testfile

-f option in the above command indicates that the file will be forcibly deleted without asking for confirmation.

$ rm -rf testdirectory

This command will delete testdirectory and all its contents (the -r option used is to delete files recursively).

And to delete a directory, we have another command, that is rmdir , but it will only delete the directory when the directory is empty.

$ rmdir testdirectory

Now we look at some other different ways to delete files in Linux.

One of my favorite methods is to use the find command and then delete it. find command is a very convenient tool that can be used to search for files based on their type, size, creation date, modification date and more different conditions.

Let's look at an find command using -exec to call the rm command.

$ find /test -type f -exec rm {} \;

The above command will delete all files in the /test First, the find command will find all files in the directory, and then for each search result, it will execute the rm command.

Let's look at some different methods that can be used with the find command to delete files.

$ find /test -mtime +7 -exec rm {} \;

In the above example, the find command will search /test directory that were modified 7 days ago, and then delete each file.

$ find /test -size +7M -exec rm {} \;

In the above example, /test will be searched, and then they will be deleted.

In all of our listed find command example, will be called for each file found rm command. For example, in the last find command above, if there are 50 files larger than 7M in the result, then the rm command will be called 50 times to delete the files. And such an operation will take longer.

In addition to find aid in -exec call parameters rm command, there is a better option is to use -delete option. for example:

$ find /test -size +7M -delete

The effect achieved is the same as the previous command.

2. What is the fastest command to use when deleting huge files?

Not much to say, we go directly to the test.

First create 500,000 files with a simple bash for loop.

$ for i in $(seq 1 500000); do echo testing >> $i.txt; done

In the above command, 500,000 txt files will be created in the current working directory with names ranging from 1.txt to 500,000.txt. Each file contains testing , so the file size is at least several kilobytes.

After creating 500,000 files, we will try to delete them in multiple ways to see which method is the fastest to delete huge files.

Round 1: rm command

First, let us use the simple rm command, and at the same time we use the time command to time.

$ time rm -f *
-bash: /bin/rm: Argument list too long
real    0m11.126s
user    0m9.673s
sys     0m1.278s

We can see that the execution result of the rm Argument list too long , which means that the command has not been deleted because the rm command is too large to complete, so it just lay flat and went on strike.

Do not pay attention to the time displayed by the time rm command did not complete its operation. The time command only displays how long your command has been executed, and does not care about the final result of the command.

Round 2: find command using the -exec parameter

find command with the -exec parameter we saw earlier.

$ time find ./ -type f -exec rm {} \;
real    14m51.735s
user    2m24.330s
sys     9m48.743s

From the output we got using the time command, we can see that it takes 14 minutes and 51 seconds to delete 500,000 files from a single directory. This is quite a long time, because for each file, a separate rm command will be executed until all files are deleted.

Round 3: find command with -delete parameter

Now let us test the elapsed time by using the -delete find

$ time find ./ -type f -delete
real    5m11.937s
user    0m1.259s
sys     0m28.441s

The deletion speed was greatly improved, and it only took 5 minutes and 11 seconds! This is an amazing improvement in speed when you delete millions of files in Linux.

Round 4: Perl language

Now let's look at how deleting files using the Perl language works, and how fast it compares to other deletion methods we have seen before.

$ time perl -e 'for(<*>){((stat)[9]<(unlink))}'
real    1m0.488s
user    0m7.023s
sys     0m27.403s

As can be seen from the results, Perl only took about 1 minute to delete 500,000 files in the directory. Compared with the other find and rm commands we have seen before, this speed is very fast!

However, if you are interested in using more complex options when using Perl, you need to have a certain understanding of Perl regular expressions.

Round 5: rsync command

There is also a less-used and little-known method that can be used to delete a large number of files in a folder. This method is our famous tool rsync . Its basic usage is for two local and remote locations in Linux. Transfer and synchronize files between.

Now let's take a look at how to use the rsync command to delete all files in the folder. It's actually very simple. We can delete the target directory with a large number of files by synchronizing it with the empty directory.

In our example, the /test directory (target directory) has 500,000 files, and we create an empty directory (source directory) blanktest Now, we will use the -delete rsync command, which will delete all non-existent files in the source directory in the target directory.

$ time rsync -a --delete blanktest/ test/
real    2m52.502s
user    0m2.772s
sys     0m32.649s

As you can see, it only took 2 minutes and 52 seconds to complete the deletion.

Therefore, compared with the find command, if you want to empty a directory containing millions of files, it is better to rsync

3. Summary

The following table summarizes the speed of deleting 500,000 files in different ways in Linux for your reference.

OrderSpend time
rm commandUnable to delete a large number of files
Find command using -exec parameter14 minutes 51 seconds
Find command with -delete parameter5 minutes 11 seconds
Perl1 minute
rsync command2 minutes 52 seconds


Finally, recently, many friends asked me for the Linux learning roadmap , so based on my experience, I spent a month staying up late in my spare time and compiled an e-book. Whether you are in an interview or self-improvement, I believe it will be helpful to you!

Give it to everyone for free, just ask you to give me a thumbs up!

e-book | Linux development learning roadmap

I also hope that some friends can join me to make this e-book more perfect!

Gain? I hope that the old guys will have a three-strike combo, so that more people can read this article

Recommended reading:

Welcome to follow my blog: Liang Xu Linux Tutorial Network , full of dry goods!


良许
1k 声望1.8k 粉丝