请问python里字符串如何变成字典?

新手上路,请多包涵

a = '''The/AT grand/JJ jury/NN commented/VB
on/In a/AT number/NN of/In other/AP topics/NNS ,/,
AMONG/In them/PPO the/AT Atlanta/NP and/CC'''

{'AT': ['the', 'a'], 'JJ': ['grand'], 'NN': ['jury', 'number'], 'VB': ['commented'], 'In': [ 'on', 'of', 'among'], 'AP': ['other'], 'NNS': ['topics'], ',':[','], 'PPO': ['them'], 'NP':['atlanta'], 'CC': ['and']}
将上面的字符串变成下面的字典,但是限定只能用str,list,dictionary相关的结构来解答

题目来源及自己的思路

我是将这个字符串的空白去掉形成了列表,然后将列表变成字符串再把反斜杠去掉,但是去掉后,变成了一个个单一的字符串形成的列表,不知道该怎么形成集合的字典。

相关代码

// 请把代码文本粘贴到下方(请勿用图片代替代码)
import re
a = '''The/AT grand/JJ jury/NN commented/VB
on/In a/AT number/NN of/In other/AP topics/NNS ,/,
AMONG/In them/PPO the/AT Atlanta/NP and/CC'''
b = re.findall('w+/w+', a)
c = str(b).split('/')
print(c)

你期待的结果是什么?实际看到的错误信息又是什么?

我目前的思路也有可能是错的,希望可以指导一下,谢谢!

阅读 2.4k
3 个回答
b = re.findall(r'(.+?)/(.+?) ', a) # 注意最后有个空格
keys = []
for _b in b:
    if _b[1] not in keys:
        keys.append(_b[1])
        
res = {}
for key in keys:
    res[key] = [_b[0] for _b in b if _b[1]==key]

无聊看了下,lambdawn的回答搞笑了,正则没写对丢数据,小写没转换,结果没去重,还能一本正经忽悠...


其实就是个map/reduce的问题,先拆分成map,然后按照键值聚合起来:

可以比对下,这种方式可以写出简洁且效率较高的代码,如下:

def func1():
    # 按空白换行符后分割,每个单元按照/切分后为value->key映射
    b = map(lambda x:x.split('/'), a.split())

    # 按键值key reduce,用set结构去重
    c = {}
    for v in b:
        if v[1] not in c.keys():
            c[v[1]] = set()

        c[v[1]].add(v[0].lower())

    # set转list
    d = {k: list(v) for k, v in c.items()}
    return d

如果非得按照lambdawn这样思路,代码大概如下:

def func2():
    b = re.findall(r'(.+?)/(.+?)\s+', a+' ')
    keys = []
    for _b in b:
        if _b[1] not in keys:
            keys.append(_b[1])

    res = {}
    for key in keys:
        for _b in b:
            if _b[1] == key:
                c = res.get(key, [])
                if _b[1] == key and _b[0].lower() not in c:
                    if not c:
                        res[key] = []
                    res[key].append(_b[0].lower())
    return res

做下简单性能测试对比,

def perf_test():
    start = time.clock()

    for i in xrange(0,10000):
        func1()

    end = time.clock()

    print end-start

    for i in xrange(0,10000):
        func2()

    end = time.clock()

    print end-start

结果,大概差一倍左右
0.340130693836
0.780941427825

>>> text = '''The/AT grand/JJ jury/NN commented/VB
... on/In a/AT number/NN of/In other/AP topics/NNS ,/,
... AMONG/In them/PPO the/AT Atlanta/NP and/CC'''
... 
>>> tuples = [tuple(item.split('/')) for item in text.split()] #分割成二元元组组成的列表
>>> ret_dict = {}
>>> for value, key in tuples: #构建字典,并对字典的值进行去重
...     ret_dict.setdefault(key, set()).add(value.lower())
... 
>>> ret_dict = {key:list(value) for key, value in ret_dict.items()} #将字典的值转换成列表类型
>>> ret_dict
{'AT': ['the', 'a'], 'JJ': ['grand'], 'NN': ['jury', 'number'], 'VB': ['commented'], 'In': ['of', 'on', 'among'], 'AP': ['other'], 'NNS': ['topics'], ',': [','], 'PPO': ['them'], 'NP': ['atlanta'], 'CC': ['and']}
推荐问题