foreword
There are two comparison commands in Linux, comm
and diff
, which are often useful when comparing versions of text files. This article describes their differences and simple usage.
comm command
This command compares two text files and displays the lines unique to each file and the lines they have in common.
Suppose we have two files:
When we run comm file1.txt file2.txt
we get:
The output of comm
is a bit ugly in my opinion, but it's three columns. Pardon my bad lines:
The first column contains lines specific to the first file parameter, the second column contains lines specific to the second file parameter, and the third column contains lines common to both files.
We can choose to hide specified columns by using the option -n
where n
can be 1, 2 or 3. Suppose we only want to output the lines common to both files, we can use comm -12 file1.txt file2.txt
.
diff command
diff
is a more complex tool. It supports multiple output formats and has the ability to handle large sets of text files at once. diff
is often used to create diff
files (patches) that are used by programs such as path
to convert one version of one or more files into another a version. Let's run on the previous two files diff
: diff file1.txt file2.txt
.
This is the default output style. In this format, each set of changes is preceded by a change command in the form range operation range
describing the location and type of change needed to convert the first file into the second.
First look at:
1d0
< a
This tells us that we have to delete the first line of file1
, which is the line with the a.
Next look:
4a4
> e
This tells us that we have to add a line to the first file, in the place of the fourth line. Then tell us on which line to add > e
.
I know it's confusing, and frankly the default styles don't use much contextual formatting and uniform formatting, let's see those explain more. Let's take a look at these and explain further.
We can use contextual formatting by adding the -c
option:
diff -c file1.txt file2.txt
At the top we can see the names and timestamps of the two files, the first is marked with an asterisk and the second is marked with a dash. diff
will use an asterisk or dash to let us know which file it is talking about throughout the rest of the list.
Next we will see an asterisk, this is just for formatting.
Then we get a series of changes, in the first set of changes we can see:
*** 1,4 ****
This means lines 1 to 4 in the first file.
Then you can see:
- a
b
c
d
This is the content of the file. Only there is a --- a
in front of -
, which means we want to delete it.
logo | meaning |
---|---|
blank | No changes required |
(-) | row needs to be deleted |
(+) | need to add line |
! | need to change the line |
In the first set of changes, we can see that the line with -a
needs to be removed from the first file. The second set of changes is:
--- 1,4 ----
b
c
d
+ e
---1,4----
is the range of the second file, + e
means we need to add this line to the first file, remember our goal is to make the first file match the second file.
We can also use unified format, which is similar to format context, but more concise. It eliminates duplicate lines of context. diff -u file1.txt file2.txt
.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。