[Leetcode]192.word-frequency

题目

Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.
Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

注意点

Note:

Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
Could you write it in one-line using Unix pipes?

　解法１

# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | awk -F ' ' '{ for(i=1; i<=NF; i++) print $i }' | sort | uniq -c | sort -n -r | awk -F ' ' '{ print $2, $1}'

   The variable NF is set to the total number of fields in the input record.

解法２

cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -n -r | awk '{ print $2, $1 }'

tr -s ' ' '\n': 将多个' '　替换为单个\n

tr - translate or delete characters

   -s, --squeeze-repeats
          replace each sequence of a repeated character that is listed in the last specified SET, with a single occurrence of that character

解法３：与解法２对比

cat words.txt | sed 's/\s/\n/g' | sort | uniq -c | sort -n -r | awk '{ if($2 != "") print $2, $1 }'

sed 's/ /\n/g': 将单个' '　替换为单个\n
如果有多个' '也就会生成多个'\n',但是我们只需要一个。
同时多生成的'\n'也会被计数。
if($2 != "")：我们在awk输出的时候对空行(换行符)进行检查。

解法４

awk '{ for (i=1; i<=NF; i++) { ++D[$i]; } } END { for (i in D) { print i, D[i] } }' words.txt | sort -n -r -k 2

引用和推荐阅读:

https://leetcode.com/problems...

https://unix.stackexchange.co...

https://leetcode.com/problems...

该文章遵循创作共用版权协议 CC BY-NC 4.0，要求署名、非商业、保持一致。在满足创作共用版权协议 CC BY-NC 4.0 的基础上可以转载，但请以超链接形式注明出处。文章仅代表作者的知识和看法，如有不同观点，可以回复并讨论。

[Leetcode]192.word-frequency

题目

注意点

解法１

解法２

解法３：与解法２对比

解法４

罗济高

引用和评论

[Leetcode]195.tenth-line

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

算法的应用场景之寻找最近数&时间线组件锚点跳转对应位置

可视化图解算法34：二叉搜索树的最近公共祖先

Linux 常见系统配置

SSH终端竟能发emoji？开源神器WindTerm让XShell连夜降价，网友：我的PuTTY突然不香了

8个有趣的Linux提示与技巧