python正则怎么提取域名

发布于
2017-06-19

<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@type": "SaleEvent",
    "name": "10% Off First Orders",
    "url": "https://www.myvouchercodes.co.uk/coggles",
    "image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
    "startDate": "2017-02-17",
    "endDate": "2017-12-31",
    "location": {
        "@type": "Place",
        "name": "Coggles",
        "url": "coggles.co.uk",
        "address": "Coggles"
    },
    "description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
    "eventStatus": "EventScheduled"
}</script>

怎么用python正则从这段脚本中提取coggles.co.uk域名呢,望各路高手指点显示下身手...

python

阅读 6.6k

2 个回答

CodeUnsolved

正则实现的话只要保证你的标定/特征是唯一的就好。但是"url"这个标志又不是唯一的。这个时候@prolifes的方法是很好的。

如果一定要正则实现呢，要用到零宽断言（zero-width assertions），当然这个词的翻译比较直，带来很多误解。它其实意思是指定位置的匹配，位置的宽度就是0嘛。

这里我们可以看到我们所需的这个"url"在"location"里面，可以以此为位置信息。

代码如下：

re.search('(?<=location).+?"url": "([^"]+)"', string, re.DOTALL).group(1)

稍微解释一下，
(?<=location)这个地方就是指前面得有location。后面有的话这样写：(?=location)
re.DOTALL这个是必须的，因为这些字符串已经跨行了。他的作用是将.的字符串匹配范围扩大，包含换行符。
"([^"]+)"这个地方是我的习惯，[^"]意指所有非"的字符，这就匹配了双引号中所有的字符串。

prolifes

11.2k51537

发布于
2017-06-19

更新于
2017-06-19

这是一段挺标准的json，粗暴一点，直接转换成json

import json

str = '''
<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@type": "SaleEvent",
    "name": "10% Off First Orders",
    "url": "https://www.myvouchercodes.co.uk/coggles",
    "image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
    "startDate": "2017-02-17",
    "endDate": "2017-12-31",
    "location": {
        "@type": "Place",
        "name": "Coggles",
        "url": "coggles.co.uk",
        "address": "Coggles"
    },
    "description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
    "eventStatus": "EventScheduled"
}</script>
'''

d = json.loads(re.search('({[\s\S]*})', str).group(1))
print d['location']['url']

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

python正则怎么提取域名

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？