python使用正则提取指定的值并添加到字典

发布于
2017-01-04

<input class="input-xlarge focused" id="listCode" name="listCode" readonly="true" type="text" value="001">
</input>
<input class="input-xlarge focused" id="type" name="type" readonly="true" type="text" value="002">
</input>
<input class="input-xlarge focused" id="yyc" name="yyc" readonly="true" type="text" value="yyzz">
</input>


如何使用python 3.5的正则表达式获取每一行里面的name和value的值，并将name和value的值添加到字典.
最终的结果变为：

dict = {

    'listcode':'001',
    'type':'002',
    'yyc':'yyzz'

}

或者不用正则用beautifulsoup是否可以实现？哪位方便，麻烦指点一二。谢谢。

python

阅读 6.7k

4 个回答

oliver_lv

✓ 已被采纳

参考代码，BeautifulSoup的用法可以阅读官方文档

html = '<input class="input-xlarge focused" id="listCode" name="listCode" readonly="true" type="text" value="001"></input><input class="input-xlarge focused" id="type" name="type" readonly="true" type="text" value="002"></input><input class="input-xlarge focused" id="yyc" name="yyc" readonly="true" type="text" value="yyzz"></input>'
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
content = dict()
datas = soup.find_all("input", class_="input-xlarge focused")
for data in datas:
    content[data["name"]] = data["value"]

print(content)

moming

934

发布于
2017-01-04

re方法

txt = "内容"
import re
inputTxt = re.compile(r'<input.*?</input>', re.S)
nameTxt = re.compile(r'name="(.*?)"')
valueTxt = re.compile(r'value="(.*?)"')

content = {}

for i in re.findall(inputTxt, txt):
    content[re.findall(nameTxt,i)[0]] = re.findall(valueTxt, i)[0]

print(content)

小杰控

1.9k1211

发布于
2017-01-04

同样功能用HTMLParser实现了一下：

from HTMLParser import HTMLParser
from htmlentitydefs import name2codepoint

class MyHTMLParser(HTMLParser):

    def __init__(self):
        self.input_tag_d = {}
        HTMLParser.__init__(self)
        # super(MyHTMLParser, self).__init__()

    def handle_starttag(self, tag, attrs):
        if tag != 'input':
            return
        for attr in attrs:
            if attr[0] == 'name':
                self.input_tag_d[attr[1]] = ''
        for attr in attrs:
            if attr[0] == 'name':
                name = attr[1]
            if attr[0] == 'value' :
                self.input_tag_d[name] = attr[1]

parser = MyHTMLParser()
html_str = '''<input class="input-xlarge focused" id="listCode" name="listCode" readonly="true" type="text" value="001">
</input>
<input class="input-xlarge focused" id="type" name="type" readonly="true" type="text" value="002">
</input>
<input class="input-xlarge focused" id="yyc" name="yyc" readonly="true" type="text" value="yyzz">
</input>'''
parser.feed(html_str)
print(parser.input_tag_d)

>>> {'type': '002', 'yyc': 'yyzz', 'listCode': '001'}

刘布丁

31029

发布于
2017-01-11

更新于
2017-01-11

只用自带re模块不是更容易实现？

s = '''<input class="input-xlarge focused" id="listCode" name="listCode" readonly="true" type="text" value="001">
</input>
<input class="input-xlarge focused" id="type" name="type" readonly="true" type="text" value="002">
</input>
<input class="input-xlarge focused" id="yyc" name="yyc" readonly="true" type="text" value="yyzz">
</input>'''

import re
compile = r'name="(\S+)".*value="(\S+)"'
matches  = re.finditer(compile,s)
result = dict()
for match in matches:
    result[match.group(1)] = match.group(2)
    
print(result)    


{'yyc': 'yyzz', 'type': '002', 'listCode': '001'}

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

python使用正则提取指定的值并添加到字典

re方法

你尚未登录，登录后可以

Qt中布局是否只有5种呢？

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

这段代码为什么不能获取到数据？

请问一下，如何理解reduce函数呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？

如何使用 python 代码实现迅雷磁力链接资源的下载？

在PyCharm开发不同python项目，如果每个项目使用自己的venv环境，是不是每次切换项目都需要修改python interpreter？