如何匹配文本每个单词在另一个文本中的单词,及该单词对应的值?

依依雨柔
  • 233

文本ttt.txt内容:
president said would bill program loan farmers
corn committee department agriculture
usda house
文本sss.txt内容:
Topic 0th:

said   0.045193
would   0.028879
bill   0.011087
program   0.010718
loan   0.008395
farmers   0.008237
corn   0.008078
committee   0.007022
department   0.006811
agriculture   0.006653
usda   0.006547
house   0.006494
president 

Topic 1th:

said   0.044315
shares   0.031928
stock   0.028001
company   0.023888
group   0.017063
offer   0.016408
share   0.016268
dlrs   0.016034
corp   0.015520
common   0.013463
president  0.000047

如何在sss中匹配ttt中每个单词分别在2个主题下的单词及对应的值?

回复
阅读 2.1k
1 个回答
✓ 已被采纳

# coding: utf8

result = {}
with open('ttt.txt') as f_t, open('sss.txt') as f_s:
    key_set = set(f_t.read().split())     # 将ttt的每个单词存到key集合
    topic = ''
    for line in f_s:
        if line.startswith('Topic'):      # 储存每个Topic
            topic = line.strip()
            result[topic] = {}

        else:
            line_split = line.split()
            if len(line_split) < 2:
                line_split.append('None')  # 防止没有值的key
            key, value = line_split

            if key in key_set:            # 如果第一列在key集合内 就收集值
                result[topic].update({
                    key: value
                })
print(result)
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
宣传栏