题目

  1. Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.
Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

注意点

Note:

  • Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
  • Could you write it in one-line using Unix pipes?

 解法1

# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | awk -F ' ' '{ for(i=1; i<=NF; i++) print $i }' | sort | uniq -c | sort -n -r | awk -F ' ' '{ print $2, $1}'  
   The variable NF is set to the total number of fields in the input record.

解法2

cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -n -r | awk '{ print $2, $1 }'
  • tr -s ' ' '\n': 将多个' ' 替换为单个\n

tr - translate or delete characters

   -s, --squeeze-repeats
          replace each sequence of a repeated character that is listed in the last specified SET, with a single occurrence of that character

解法3:与解法2对比

cat words.txt | sed 's/\s/\n/g' | sort | uniq -c | sort -n -r | awk '{ if($2 != "") print $2, $1 }' 
  • sed 's/ /\n/g': 将单个' ' 替换为单个\n
  • 如果有多个' '也就会生成多个'\n',但是我们只需要一个。
  • 同时多生成的'\n'也会被计数。
  • if($2 != ""):我们在awk输出的时候对空行(换行符)进行检查。

解法4

awk '{ for (i=1; i<=NF; i++) { ++D[$i]; } } END { for (i in D) { print i, D[i] } }' words.txt | sort -n -r -k 2

引用和推荐阅读:

https://leetcode.com/problems...

https://unix.stackexchange.co...

https://leetcode.com/problems...

https://leetcode.com/problems...

该文章遵循创作共用版权协议 CC BY-NC 4.0,要求署名、非商业 、保持一致。在满足创作共用版权协议 CC BY-NC 4.0 的基础上可以转载,但请以超链接形式注明出处。文章仅代表作者的知识和看法,如有不同观点,可以回复并讨论。


罗济高
1 声望1 粉丝