Linux series comparison command

foreword

There are two comparison commands in Linux, comm and diff , which are often useful when comparing versions of text files. This article describes their differences and simple usage.

comm command

This command compares two text files and displays the lines unique to each file and the lines they have in common.

Suppose we have two files:

When we run comm file1.txt file2.txt we get:

The output of comm is a bit ugly in my opinion, but it's three columns. Pardon my bad lines:

The first column contains lines specific to the first file parameter, the second column contains lines specific to the second file parameter, and the third column contains lines common to both files.

We can choose to hide specified columns by using the option -n where n can be 1, 2 or 3. Suppose we only want to output the lines common to both files, we can use comm -12 file1.txt file2.txt .

diff command

diff is a more complex tool. It supports multiple output formats and has the ability to handle large sets of text files at once. diff is often used to create diff files (patches) that are used by programs such as path to convert one version of one or more files into another a version. Let's run on the previous two files diff : diff file1.txt file2.txt .

This is the default output style. In this format, each set of changes is preceded by a change command in the form range operation range describing the location and type of change needed to convert the first file into the second.

First look at:

 1d0
< a

This tells us that we have to delete the first line of file1 , which is the line with the a.

Next look:

 4a4
> e

This tells us that we have to add a line to the first file, in the place of the fourth line. Then tell us on which line to add > e .

I know it's confusing, and frankly the default styles don't use much contextual formatting and uniform formatting, let's see those explain more. Let's take a look at these and explain further.

We can use contextual formatting by adding the -c option:

 diff -c file1.txt file2.txt

At the top we can see the names and timestamps of the two files, the first is marked with an asterisk and the second is marked with a dash. diff will use an asterisk or dash to let us know which file it is talking about throughout the rest of the list.

Next we will see an asterisk, this is just for formatting.

Then we get a series of changes, in the first set of changes we can see:

 *** 1,4 ****

This means lines 1 to 4 in the first file.

Then you can see:

 - a
  b
  c
  d

This is the content of the file. Only there is a --- a in front of - , which means we want to delete it.

logo	meaning
blank	No changes required
(-)	row needs to be deleted
(+)	need to add line
!	need to change the line

In the first set of changes, we can see that the line with -a needs to be removed from the first file. The second set of changes is:

 --- 1,4 ----
  b
  c
  d
+ e

---1,4---- is the range of the second file, + e means we need to add this line to the first file, remember our goal is to make the first file match the second file.

We can also use unified format, which is similar to format context, but more concise. It eliminates duplicate lines of context. diff -u file1.txt file2.txt .

Linux series comparison command

foreword

comm command

diff command

chuck

引用和评论

检测 CSS 中的 JavaScript 支持

rocky linux 使用记录

快捷键打开某个窗口(如网页chatGPT)

但是，I/O多路复用中是如何判断文件“可读”/“可写”的？

麒麟系统中theia终端崩溃问题排查小记

【笔记】CentOS 7 中配置 YUM

为什么你学不会 Emacs？