10、Python字符串中出现频率最高的单词

新手上路,请多包涵

我需要在一个文本文件中显示 10 个最常用的单词,从最频繁到最少以及它被使用的次数。我不能使用字典或计数器功能。到目前为止我有这个:

 import urllib
cnt = 0
i=0
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
uniques = []
for line in txtFile:
    words = line.split()
    for word in words:
        if word not in uniques:
            uniques.append(word)
for word in words:
    while i<len(uniques):
        i+=1
        if word in uniques:
             cnt += 1
print cnt

现在我想我应该查找数组“uniques”中的每个单词,看看它在这个文件中重复了多少次,然后将其添加到另一个计算每个单词实例的数组中。但这就是我被困的地方。我不知道如何进行。

任何帮助,将不胜感激。谢谢

原文由 KevinKZ 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 380
1 个回答

上述问题可以通过使用下面的 python 集合轻松解决。

 from collections import Counter

data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \

# split() returns list of all the words in the string
split_it = data_set.split()

# Pass the split_it list to instance of Counter class.
Counters_found = Counter(split_it)
#print(Counters)

# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counters_found.most_common(4)
print(most_occur)

原文由 Nikhil Gupta 发布,翻译遵循 CC BY-SA 4.0 许可协议

推荐问题
logo
Stack Overflow 翻译
子站问答
访问
宣传栏