提取正则表达式匹配的一部分

新手上路,请多包涵

我想要一个正则表达式来从 HTML 页面中提取标题。目前我有这个:

 title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
    title = title.replace('<title>', '').replace('</title>', '')

是否有正则表达式可以仅提取 的内容,因此我不必删除标签?</p> <blockquote> <p>原文由 <a href="https://link.segmentfault.com/?enc=aYmPRcBHbcwySvOv7oYEDA%3D%3D.mBjMyc7vAqterOX94iGamKL93RvwUMf52bD%2BkQywPmWbB80vL5vBGwQ4AdX2Pv8TFuepbdbP2Rpx4iL4Li6ZVLdR495YeG5%2Bg7qggATeHLc%3D" rel="nofollow" target="_blank">hoju</a> 发布,翻译遵循 CC BY-SA 4.0 许可协议</p> </blockquote> </article><div class="d-flex flex-wrap align-items-center mb-4"><div class="m-n1 d-flex flex-wrap align-items-center"><a class="m-1" href="/site/stackoverflow" target="_blank"><img src="https://avatar-static.segmentfault.com/315/258/3152588376-62ff5cc2ea2d8_huge128" alt="Stack Overflow 翻译" width="24" height="24" style="border-radius:0.1875rem;vertical-align:top"/></a><a href="/t/python" class="m-1 badge-tag "><img src="https://avatar-static.segmentfault.com/252/177/2521771040-54cb53b372821_small" alt="" width="16" height="16" class="me-1"/>python</a><a href="/t/html" class="m-1 badge-tag ">html</a><a href="/t/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F" class="m-1 badge-tag ">正则表达式</a><a href="/t/html-content-extraction" class="m-1 badge-tag ">html-content-extraction</a></div></div><div class="text-secondary font-size-14 d-flex flex-column"><div class="functional-area-bottom d-flex flex-wrap "><button type="button" class="me-2 mb-3 btn btn-primary-100"><i class="far fa-thumbs-up me-1"></i>有用</button><button type="button" class="me-2 mb-3 false btn btn-primary-100"><i class="far fa-flip-horizontal fa-thumbs-down"></i></button><button type="button" class="me-2 mb-3 btn btn-light"><span class="me-1">关注</span><span class="mainLikeNum hidden">0</span></button><button type="button" class="me-2 mb-3 false btn btn-light"><span class="mx-1">收藏</span><span class="mainLikeNum hidden">0</span></button><div class="me-2 mb-3 dropdown"><button type="button" id="react-aria-4" aria-expanded="false" aria-label="分享" class="dropdown-no-angle dropdown-toggle btn btn-light"><i class="far fa-share-nodes"></i></button></div><canvas hidden="" class="qrcode"></canvas></div><div class="d-flex align-items-center justify-content-between right-wrap w-100"><div><button type="button" class="link-secondary question-reply-btn btn-reset btn btn-link btn-sm">回复</button><span class="split-dot"></span><div class="text-secondary font-size-14 d-inline-flex justify-content-between" id="viewsWord" data-viewsword="9">阅读 <!-- -->291<!-- --> </div></div><div class="operation text-end"></div></div></div><div class="lazyload-wrapper "><div class="lazyload-placeholder"></div></div></div></div></div><div id="OA_holder_3" class="OA_holder" style="display:none"></div><div class="mt-4 answer-area card"><div class="d-flex align-items-center justify-content-between bg-transparent card-header"><strong>2<!-- --> <!-- -->个回答</strong><div aria-label="Basic" role="group" class="btn-group"><a role="button" tabindex="0" href="/q/1010000042527904?sort=votes" class="btn btn-secondary btn-sm">得票</a><a role="button" tabindex="0" href="/q/1010000042527904?sort=newest" class="btn btn-outline-secondary btn-sm">最新</a></div></div><div class="list-group list-group-flush"><div class="p-0 stack-overflow-site list-group-item"><div class="p-3 answer-accept"><div class="commentUnit"><div class="information mb-3 font-size-14"><div class="d-flex align-items-center"><div class="d-flex"><div class="d-flex align-items-center "><a href="https://segmentfault.com/a/1190000042203704" class="d-flex align-items-center me-1"><div style="width:32px;height:32px;background:#6F42C1" class="rounded-circle d-flex align-items-center justify-content-center me-2"><i class="far fa-book-open text-white"></i></div><div class="d-flex flex-column"><div><strong class="align-self-center" itemProp="name">社区维基</strong><i class="far fa-globe text-warning ms-1"></i></div><span style="color:#BF7158;font-weight:bold">1</span></div></a></div><div class="vertical-divider"></div><div><a href="/q/1010000042527904/a-1020000042527908/revision" class="link-secondary"><time itemProp="dateCreated" dateTime="2022-09-21T09:10:52.000Z">发布于 <br/> <!-- -->2022-09-21 </time></a></div></div></div><span class="mt-3 me-2 badge rounded-pill bg-success">✓ 已被采纳</span></div><article id="1020000042527908" class="article fmt article-content"><p>Use <code>(</code> <code>)</code> in regexp and <a href="https://link.segmentfault.com/?enc=PuLg3Kk5P2WLboy8v2D7iw%3D%3D.L10pF8KAh2cOHVHOdcmVmL%2BH%2BS0HdFS086B%2BLZNoU%2FKuGb3YfTphZgw%2BUL%2Fej96G49DjJGrvIqHf220d5j%2FuxQ%3D%3D" rel="nofollow" target="_blank"><code>group(1)</code></a> in python to retrieve the captured string ( <a href="https://link.segmentfault.com/?enc=barUlTIGUMDH6I18cVq4KQ%3D%3D.DtWIpWsXBiLpFqmRzqN9HE7RXi9sh5fGovEgZhgwGFaXMBq0DgpbX%2Fx5Dhn8rJMbu0nTgPD87o8L%2FPcEJNWlYA%3D%3D" rel="nofollow" target="_blank"><code>re.search</code></a> will return <code>None</code> if它没有找到结果,所以 <em>不要直接使用 <code>group()</code></em> ):</p> <pre><code> title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE) if title_search: title = title_search.group(1) </code></pre> <blockquote> <p>原文由 <a href="https://link.segmentfault.com/?enc=x7bp7xvpBO2qhsIVXv0YWg%3D%3D.B6pdPuJPFi9roOeSMBTwlzloUFnv5NWu9yCpPRX3pUpLKqVrO%2BI8dKHQRAo0gqricwxwHXvZKEU7L9NbQDDXtkyPQi0pMDNhx2%2F7hThBcwFnj%2FD5n7Eo1stf%2FRU7MnI0" rel="nofollow" target="_blank">Krzysztof Krasoń</a> 发布,翻译遵循 CC BY-SA 4.0 许可协议</p> </blockquote> </article><div class="text-secondary font-size-14"><div class="d-flex"><div class="functional-area-bottom d-flex flex-wrap "><button type="button" class="me-2 mb-3 btn btn-primary-100"><i class="far fa-thumbs-up me-1"></i>有用</button><button type="button" class="me-2 mb-3 false btn btn-primary-100"><i class="far fa-flip-horizontal fa-thumbs-down"></i></button></div></div><div class="d-flex justify-content-between align-items-center flex-fill"><div class="sflex-center"><button type="button" class="btn-reset link-secondary question-reply-btn btn btn-link">回复</button></div><div class="operation text-end"></div></div></div></div><div class="lazyload-wrapper "><div class="lazyload-placeholder"></div></div></div></div><div class="p-0 stack-overflow-site list-group-item"><div class="p-3 "><div class="commentUnit"><div class="information mb-3 font-size-14"><div class="d-flex align-items-center"><div class="d-flex"><div class="d-flex align-items-center "><a href="https://segmentfault.com/a/1190000042203704" class="d-flex align-items-center me-1"><div style="width:32px;height:32px;background:#6F42C1" class="rounded-circle d-flex align-items-center justify-content-center me-2"><i class="far fa-book-open text-white"></i></div><div class="d-flex flex-column"><div><strong class="align-self-center" itemProp="name">社区维基</strong><i class="far fa-globe text-warning ms-1"></i></div><span style="color:#BF7158;font-weight:bold">1</span></div></a></div><div class="vertical-divider"></div><div><a href="/q/1010000042527904/a-1020000042527906/revision" class="link-secondary"><time itemProp="dateCreated" dateTime="2022-09-21T09:10:52.000Z">发布于 <br/> <!-- -->2022-09-21 </time></a></div></div></div></div><article id="1020000042527906" class="article fmt article-content"><p>请注意,从 <code>Python 3.8</code> 开始,并引入 <a href="https://link.segmentfault.com/?enc=XkSNl5jgPqzyuFas%2B23WYQ%3D%3D.MoUQCzD%2BjN3PLST8n0wyX8nep5K0kjfjuwIUK%2FYgVjtMEMWiCqyiUYfx5mQfHsb9" rel="nofollow" target="_blank">赋值表达式(PEP 572)</a> ( <code>:=</code> 运算符),可以通过直接在 <a href="https://link.segmentfault.com/?enc=CtU94WisyhYWPijI0EIW4A%3D%3D.7IASkVms2Mrvj2Qi2wJbm8anA3fBhoxTdD4uTD%2FheBeiG6v382YkwYytZ2CVlpox" rel="nofollow" target="_blank">Krzysztof Krasoń 的解决方案</a> 中捕获匹配结果来改进一点if 条件作为变量并在条件的主体中重新使用它:</p> <pre><code> # pattern = '<title>(.*)</title>' # text = '<title>hello</title>' if match := re.search(pattern, text, re.IGNORECASE): title = match.group(1) # hello </code></pre> <blockquote> <p>原文由 <a href="https://link.segmentfault.com/?enc=Libsn6DlYuJrDR2flz95iQ%3D%3D.mTqqDngIorcxxcs63QvPSLpzLIJ7Hv0ALipwl7D6amaeJE6tP5On2wFos4TgwsiW3u%2FLpYohNF9HDA0xNUoCSRai35aExI%2FFafvjzLQQdEPCwy1GNKywuWGHpFqfrMFm" rel="nofollow" target="_blank">Xavier Guihot</a> 发布,翻译遵循 CC BY-SA 4.0 许可协议</p> </blockquote> </article><div class="text-secondary font-size-14"><div class="d-flex"><div class="functional-area-bottom d-flex flex-wrap "><button type="button" class="me-2 mb-3 btn btn-primary-100"><i class="far fa-thumbs-up me-1"></i>有用</button><button type="button" class="me-2 mb-3 false btn btn-primary-100"><i class="far fa-flip-horizontal fa-thumbs-down"></i></button></div></div><div class="d-flex justify-content-between align-items-center flex-fill"><div class="sflex-center"><button type="button" class="btn-reset link-secondary question-reply-btn btn btn-link">回复</button></div><div class="operation text-end"></div></div></div></div><div class="lazyload-wrapper "><div class="lazyload-placeholder"></div></div></div></div></div></div><div id="answer-question" style="width:100%" class="mt-4 card"><div class="bg-transparent d-flex justify-content-between align-items-center card-header"><strong>撰写回答</strong><div></div></div><div class="card-body"><h6>你尚未登录,登录后可以</h6><ul class="list-inline"><li class="card-text"><i class="far fa-circle-check me-2 text-success"></i>和开发者交流问题的细节</li><li><i class="far fa-circle-check me-2 text-success"></i>关注并接收问题和回答的更新提醒</li><li><i class="far fa-circle-check me-2 text-success"></i>参与内容的编辑和改进,让解决方法与时俱进</li></ul><div><button type="button" class="me-2 require-login btn btn-primary">注册登录</button></div></div></div><div class="mt-4 card"><div class="bg-transparent card-header"><strong>推荐问题</strong></div><ul class="list-group list-group-flush"><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000045672104?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">Qt中布局是否只有5种呢?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">我们经常看到的Qt的布局有:5种(都是继承自QLayout) {代码...} 但是我在官方文档有看到其他的Layout相关命名,例如:QPageLayoutQTextLayout等等请问这些是用于布局的吗?还是说Qt中布局就只有5种呢?</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">4<!-- --> 回答</span><span class="split-dot"></span><span>4.5k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000046180128?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">尝试一下字节的 trae AI IDE ([链接])安装后导入 vscode 的配置,好像一起把 vscode 的插件也导入了也能看到 vscode 之前配置的 ssh remote 但是连不上看到「输出」如下⬇️ {代码...}</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">1<!-- --> 回答</span><span class="split-dot"></span><span>3.3k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000045598131?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">这段代码为什么不能获取到数据?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">{代码...}</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">4<!-- --> 回答</span><span class="split-dot"></span><span>3.8k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000045592218?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">请问一下,如何理解reduce函数呢?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">但是reduce是减少的意思,请问如何形象化地理解reduce呢? 我们可不可以理解:把参数2一个一个地带入到参数1(函数)中执行,这样也就慢慢减少呢?</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">3<!-- --> 回答</span><span class="split-dot"></span><span>2.2k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000045904260?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">通过python+selenium去爬取goodreads上一本书的评论,由于goodreads的评论是一页加载所有内容,不断点load more,就不断在该页面增加内容,在加载到3000-5000条评论时,页面就会崩溃,用的edge,内存设置的无限制。这种情况应该怎么做才行。</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">1<!-- --> 回答</span><span class="split-dot"></span><span>4.5k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000045924573?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">如何使用 python 代码实现迅雷磁力链接资源的下载?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">很多磁力链接,只有使用迅雷客户端才能下载有速度但是迅雷客户端没有可操作的 sdk如果我有很多的磁力链接,需要下载,且需要指定每个磁力的下载位置等等操作,怎么实现自动化和批量化?</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">1<!-- --> 回答</span><span class="split-dot"></span><span>3.9k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-3 list-group-item list-group-item-action"><a href="/q/1010000045693220?utm_source=sf-similar-question" target="_blank"><h5 class="text-break text-body">在PyCharm开发不同python项目,如果每个项目使用自己的venv环境,是不是每次切换项目都需要修改python interpreter?</h5><div class="text-secondary text-truncate-1 font-size-14 mb-2">在PyCharm开发不同python项目,如果每个项目使用自己的venv环境,是不是每次切换项目都需要修改python interpreter?</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">1<!-- --> 回答</span><span class="split-dot"></span><span>2.8k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li></ul></div></div><div class="w-xl-300 d-none d-xl-block col-auto"><div><div class="sticky-outer-wrapper"><div class="sticky-inner-wrapper" style="position:relative;top:0px"><div class="mb-4"><div class="bg-dark card"><div class="d-flex align-items-center card-body"><picture class="me-3 "><img class="rounded" src="https://avatar-static.segmentfault.com/315/258/3152588376-62ff5cc2ea2d8_huge128" width="48" height="48" alt="logo"/></picture><div style="flex:1"><h6 class="text-white">Stack Overflow 翻译</h6><span class="text-secondary">子站问答</span></div><a role="button" tabindex="0" href="/site/stackoverflow" class="ms-3 btn btn-warning">访问</a></div></div></div><div class="mb-4"><div role="alert" class="fade d-none d-xl-block alert alert-light show">本篇内容翻译自 Stack Overflow,如果你觉得翻译结果值得改进,欢迎直接编辑修改,感谢你为社区贡献。</div></div><div class=""><div id="first-ad" class="border-0 overflow-hidden sflex-center float-ads mb-4 d-none card"><div id="OA_holder_1" class="OA_holder" style="display:none"></div></div></div><div class=""><div class="card"><div class="bg-transparent card-header"><strong>相似问题</strong></div><ul class="list-group list-group-flush"><li class="py-2 list-group-item list-group-item-action"><a href="/q/1010000042526906?utm_source=sf-similar-question" target="_blank"><div class="text-break text-body text-truncate-2">匹配点的正则表达式</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">1<!-- --> 回答</span><span class="split-dot"></span><span>779<!-- --> 阅读</span></p></a></li><li class="py-2 list-group-item list-group-item-action"><a href="/q/1010000042821527?utm_source=sf-similar-question" target="_blank"><div class="text-break text-body text-truncate-2">使用 python 正则表达式匹配日期</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">2<!-- --> 回答</span><span class="split-dot"></span><span>475<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-2 list-group-item list-group-item-action"><a href="/q/1010000015585071?utm_source=sf-similar-question" target="_blank"><div class="text-break text-body text-truncate-2">正则表达式问题???</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">3<!-- --> 回答</span><span class="split-dot"></span><span>2.2k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-2 list-group-item list-group-item-action"><a href="/q/1010000042522804?utm_source=sf-similar-question" target="_blank"><div class="text-break text-body text-truncate-2">Python string.replace 正则表达式</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">2<!-- --> 回答</span><span class="split-dot"></span><span>967<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li><li class="py-2 list-group-item list-group-item-action"><a href="/q/1010000009882216?utm_source=sf-similar-question" target="_blank"><div class="text-break text-body text-truncate-2">正则表达式 python爬虫</div><p class="text-secondary mb-0 font-size-14"><span class="text-primary">2<!-- --> 回答</span><span class="split-dot"></span><span>2.4k<!-- --> 阅读</span><span class="ms-3 badge rounded-pill bg-success">✓ 已解决</span></p></a></li></ul><div class="bg-transparent text-center border-top py-2 card-footer">找不到问题?<a href="/ask" target="_blank">创建新问题</a></div></div></div><div class="my-4 "><div class="d-none card"><div class="bg-white border-0 card-header"><strong>宣传栏</strong></div><div class="card-body"><div class="mb-3"><div id="OA_holder_7" class="OA_holder" style="display:none"></div></div><div class="mb-3"><div id="OA_holder_9" class="OA_holder" style="display:none"></div></div><div><div id="OA_holder_10" class="OA_holder" style="display:none"></div></div></div></div></div><div class="d-none d-xl-block"><div id="OA_holder_31" class="OA_holder" style="display:none"></div></div></div></div></div></div></div><div class="d-block d-sm-none fix-bottom-action-wrap"><div class="w-100 fixed-bottom-action d-flex align-items-center "><button type="button" aria-label="浏览" class="btn-reset text-center link-dark w-25 btn btn-link"><i class="far fa-eye"></i><span class="mainLikeNum ms-1">291</span></button><button type="button" aria-label="收藏" class="btn-reset text-center link-dark w-25 btn btn-link"><i class="far fa-bookmark"></i></button><a class="text-center link-dark w-25" href="#comment-area"><i class="far fa-message-lines"></i></a><div class="bottom-share-wrap w-25 text-center dropdown"><button type="button" id="react-aria-5" aria-expanded="false" aria-label="分享" class="btn-reset link-dark dropdown-toggle btn btn-link"><i class="far fa-share-nodes"></i></button></div><canvas hidden="" class="qrcode"></canvas></div><canvas hidden="" class="qrcode"></canvas></div></div><footer id="footer" class="d-none d-sm-block bg-white py-5 font-size-14 border-top"><div class="container"><div class="row"><dl class="col-md-2 col-4"><dt class="h6">思否旗下产品</dt><dd class="my-1"><a class="link-secondary" href="/" target="_blank">SegmentFault</a></dd><dd class="my-1"><a class="link-secondary" href="https://business.segmentfault.com?utm_source=sf-footer" target="_blank">思否企业服务</a></dd><dd class="my-1"><a class="link-secondary" href="https://ke.segmentfault.com?utm_source=sf-footer" target="_blank">思否公开课</a></dd><dd class="my-1"><a class="link-secondary" href="https://business.segmentfault.com/answer-ee?utm_source=sf-footer" target="_blank">思否企业问答</a></dd></dl><dl class="col-md-2 col-4"><dt class="h6">ONES 旗下产品</dt><dd class="my-1"><a class="link-secondary" href="https://ones.cn?utm_source=sf-footer" target="_blank">ONES</a></dd><dd class="my-1"><a class="link-secondary" href="https://tower.im/?utm_source=sf-footer" target="_blank">Tower</a></dd><dd class="my-1"><a class="link-secondary" href="https://wiz.cn?utm_source=sf-footer" target="_blank">为知笔记</a></dd><dd class="my-1"><a class="link-secondary" href="https://ones.cn/app-center/segmentfault?utm_source=segmentfault" target="_blank">企业问答</a></dd><dd class="my-1"><a class="link-secondary" href="https://ones.com/" target="_blank">ONES.com</a></dd><dd class="my-1"><a class="link-secondary" href="https://ones.com.cn/" target="_blank">项目管理百科</a></dd></dl><dl class="col-md-2 col-4"><dt class="h6">资源</dt><dd class="my-1"><a class="link-secondary" href="/weekly?utm_source=sf-footer" target="_blank">每周精选</a></dd><dd class="my-1"><a class="link-secondary" href="/users?utm_source=sf-footer" target="_blank">用户排行榜</a></dd><dd class="my-1"><a class="link-secondary" href="/help?utm_source=sf-footer" target="_blank">帮助中心</a></dd><dd class="my-1"><a class="link-secondary" href="/0x?utm_source=sf-footer" target="_blank">建议反馈</a></dd><dd class="my-1"><a class="link-secondary" href="/help/rights?utm_source=sf-footer" target="_blank">声望</a></dd><dd class="my-1"><a class="link-secondary" href="/help/badges?utm_source=sf-footer" target="_blank">勋章</a></dd></dl><dl class="col-md-2 col-4"><dt class="h6">合作</dt><dd class="my-1"><a class="link-secondary" href="https://business.segmentfault.com/about?utm_source=sf-footer" target="_blank">关于我们</a></dd><dd class="my-1"><a class="link-secondary" href="https://business.segmentfault.com/ads?utm_source=sf-footer" target="_blank">广告投放</a></dd><dd class="my-1"><a class="link-secondary" href="https://business.segmentfault.com/contact?utm_source=sf-footer" target="_blank">联系我们</a></dd><dd class="my-1"><a class="link-secondary" href="https://business.segmentfault.com/partners" target="_blank">合作伙伴</a></dd></dl><dl class="col-md-2 col-4"><dt class="h6">关注</dt><dd class="my-1"><a class="link-secondary" href="/blog/segmentfault?utm_source=sf-footer" target="_blank">产品技术日志</a></dd><dd class="my-1"><a class="link-secondary" href="/blog/community_governance?utm_source=sf-footer" target="_blank">社区运营日志</a></dd><dd class="my-1"><a class="link-secondary" href="/blog/segmentfault_news?utm_source=sf-footer" target="_blank">市场运营日志</a></dd><dd class="my-1"><a class="link-secondary" href="/blog/interview?utm_source=sf-footer" target="_blank">社区访谈</a></dd></dl><dl id="license" class="col-md-2 col-4"><dt class="h6">条款</dt><dd class="my-1"><a class="link-secondary" href="/tos?utm_source=sf-footer" target="_blank">服务协议</a></dd><dd class="my-1"><a class="link-secondary" href="/privacy?utm_source=sf-footer" target="_blank">隐私政策</a></dd><dd class="my-1"><a class="link-secondary" href="/app?utm_source=sf-footer">下载 App</a></dd><dd class="my-1"><div class="shareContent"><a class="share_weixin" data-share="weixin" href="#javascript" aria-label="微信"><i class="fab fa-weixin"></i></a><a class="share_weibo" data-share="weibo" href="http://weibo.com/segmentfault" target="_blank" aria-label="微博"><i class="fab fa-weibo"></i></a><a class="share_github" data-share="github" href="https://github.com/SegmentFault" target="_blank" aria-label="github"><i class="fab fa-github"></i></a><a class="share_twitter me-0" data-share="twitter" href="https://twitter.com/segment_fault" target="_blank" aria-label="twitter"><i class="fab fa-x-twitter"></i></a></div></dd></dl></div><hr class="mb-4 mt-2 bg-black bg-opacity-50"/><div class="row"><div class="col-md-8"><div class="text-secondary"><p class="mb-1">Copyright © 2011-<!-- -->2025<!-- --> SegmentFault. 当前呈现版本 <!-- -->25.03.28</p><p class="mb-1"><a class="link-secondary me-2" target="_blank" href="http://beian.miit.gov.cn" rel="nofollow noreferrer">浙ICP备15005796号-2</a><a class="link-secondary me-2" target="_blank" href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=33010602002000" rel="nofollow noreferrer">浙公网安备33010602002000号</a><span class="text-secondary">ICP 经营许可 浙B2-20201554</span></p><p class="mb-0">杭州堆栈科技有限公司版权所有</p></div></div><div class="col-md-4 text-end font-size-14"><a href="https://business.segmentfault.com/" class="link-secondary" target="_blank">思否 - 凝聚集体智慧,推动技术进步</a></div></div></div></footer></div></div><script src="https://static.geetest.com/static/tools/gt.js"></script><script defer="" data-domain="segmentfault.com" src="https://stats.segmentfault.net/js/plausible.js"></script><script defer="" src="https://hm.baidu.com/hm.js?e23800c454aa573c0ccb16b52665ac26"></script><script async="" src="https://www.googletagmanager.com/gtag/js?id=G-MJYFRXB3ZX"></script><script id="google-analytics"> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-MJYFRXB3ZX') </script></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"initialState":{"@@dva":0,"loading":{"global":false,"models":{},"effects":{}},"question":{"questionDetail":{"1010000042527904":{"question":{"id":1010000042527904,"url":"/q/1010000042527904","title":"提取正则表达式匹配的一部分","tags_list":"1040000000089534,1040000000089571,1040000000090556,1040000042525291","parsed_text":"","short_url":"http://sfau.lt/bNc2BB6","share_url":"/q/1010000042527904","edit_url":"/q/1010000042527904/edit","excerpt":"我想要一个正则表达式来从 HTML 页面中提取标题。目前我有这个: {代码...} 是否有正则表达式可以仅提取 的内容,因此我不必删除标签? 原文由 hoju 发布,翻译遵循 CC BY-SA 4.0 许可协议","thumbnail_url":null,"status":0,"comments":0,"followers":0,"is_followed":false,"ip_address":null,"likes":-1,"hates":-1,"is_liked":false,"is_hated":false,"bookmarks":0,"is_bookmarked":false,"is_invited":false,"created":1663751452,"modified":1663751452,"votes":0,"answers":2,"real_views":291,"user_id":1030000042357468,"is_edited":0,"is_site":1,"is_wiki":1,"tags":[{"name":"python","url":"/t/python","id":1040000000089534,"rank":458390,"icon_url":"https://avatar-static.segmentfault.com/252/177/2521771040-54cb53b372821_small","content_count":64102},{"name":"html","url":"/t/html","id":1040000000089571,"rank":776121,"icon_url":"","content_count":50611},{"name":"正则表达式","url":"/t/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F","id":1040000000089653,"rank":85063,"icon_url":"","content_count":3911},{"name":"html-content-extraction","url":"/t/html-content-extraction","id":1040000042525291,"rank":0,"icon_url":"","content_count":3}],"user":null,"editor":null,"site":{"id":1140000000383666,"url":"/site/stackoverflow","thumbnail_url":"https://avatar-static.segmentfault.com/315/258/3152588376-62ff5cc2ea2d8_huge128","description":"众包翻译 Stack Overflow 上的热门问题,欢迎大家帮助改进翻译结果","name":"Stack Overflow 翻译","slug":"stackoverflow"},"revision":{"created":1663751452},"activity_submission":null,"status_key":"available"},"invited_info":null,"site":{"id":1140000000383666,"url":"/site/stackoverflow","thumbnail_url":"https://avatar-static.segmentfault.com/315/258/3152588376-62ff5cc2ea2d8_huge128","description":"众包翻译 Stack Overflow 上的热门问题,欢迎大家帮助改进翻译结果","name":"Stack Overflow 翻译","slug":"stackoverflow"},"page_title":"python - 提取正则表达式匹配的一部分","keywords":"python,html,regex,html-content-extraction","description":"我想要一个正则表达式来从 HTML 页面中提取标题。目前我有这个: {代码...} 是否有正则表达式可以仅提取 的内容,因此我不必删除标签? 原文由 hoju 发布,翻译遵循 CC BY-SA 4.0 许可协议","answers":{"accepted":[{"id":1020000042527908,"url":"/q/1010000042527904/a-1020000042527908","parsed_text":"","short_url":"http://sfau.lt/bOc2BCa","edit_url":"/q/1010000042527904/a-1020000042527908/edit","likes":-1,"hates":-1,"is_liked":false,"user_id":1030000042357479,"is_hated":false,"created":1663751452,"comments":0,"modified":1663751452,"votes":0,"reason":null,"question_id":1010000042527904,"is_edited":0,"is_wiki":1,"ip_address":null,"status":1,"user":null,"editor":null,"operator":null,"activity_submission":null,"member_actions":[],"status_key":"accepted","type":"accepted"}],"available":[{"id":1020000042527906,"url":"/q/1010000042527904/a-1020000042527906","parsed_text":"","short_url":"http://sfau.lt/bOc2BB8","edit_url":"/q/1010000042527904/a-1020000042527906/edit","likes":-1,"hates":-1,"is_liked":false,"user_id":1030000042357474,"is_hated":false,"created":1663751452,"comments":0,"modified":1663751452,"votes":0,"reason":null,"question_id":1010000042527904,"is_edited":0,"is_wiki":1,"ip_address":null,"status":0,"user":null,"editor":null,"operator":null,"activity_submission":null,"member_actions":[],"status_key":"available","type":"available"}],"ignored":[]},"draft":null,"answered_id":null,"answered":null,"can_answer":false,"sort":"rank","actions":[],"tag":{"name":"Stack Overflow 翻译","url":"/site/stackoverflow","icon_url":"https://avatar-static.segmentfault.com/315/258/3152588376-62ff5cc2ea2d8_huge128"},"member_actions":[],"answer_limit_time":0,"extra":{"reason":null,"operator":null,"operator_time":null},"ai_answer":null,"is_ai":false,"ai_votes":null,"submission_activities":[],"isServerLoaded":true}},"relateDetail":{"1010000042527904":{"tags":["1040000000089534","1040000000089571","1040000000090556","1040000042525291"],"related":[{"id":1010000042526906,"url":"/q/1010000042526906","title":"匹配点的正则表达式","excerpt":"想知道从 \u0026quot;blah blah blah test.this@gmail.com blah blah\u0026quot; 匹配 \u0026quot;test.this\u0026quot; 的最佳方法是什么?使用 Python。","real_views":779,"answers":1,"accepted_answer_id":0,"tags":[{"id":1040000000089534,"name":"python"},{"id":1040000000089653,"name":"正则表达式"}]},{"id":1010000042821527,"url":"/q/1010000042821527","title":"使用 python 正则表达式匹配日期","excerpt":"我想匹配具有以下格式的日期: 2010-08-27, 2010/08/27 现在我不是很在意日期是否实际可行,只是它的格式正确。 请告诉这个正则表达式。 谢谢 原文由 user1308308 发布,翻译遵循 CC BY-SA 4.0 许可协议","real_views":475,"answers":2,"accepted_answer_id":1020000042821531,"tags":[{"id":1040000000089534,"name":"python"},{"id":1040000000089653,"name":"正则表达式"},{"id":1040000042521531,"name":"python-3.x"}]},{"id":1010000015585071,"url":"/q/1010000015585071","title":"正则表达式问题???","excerpt":"第一个\u0026lt;td\u0026gt;作为key,第二个\u0026lt;td\u0026gt;作为value,请问我是用正则表达式该怎样获取呢?或者有什么办法能够方便的取得这样的key-value值呢??","real_views":2153,"answers":3,"accepted_answer_id":1020000015585667,"tags":[{"id":1040000000089449,"name":"java"},{"id":1040000000089571,"name":"html"},{"id":1040000000089534,"name":"python"},{"id":1040000000089387,"name":"php"}]},{"id":1010000042522804,"url":"/q/1010000042522804","title":"Python string.replace 正则表达式","excerpt":"我正在使用 之前发布 的行替换功能来替换使用 Python 的 string.replace(pattern, sub) 的行。例如,我使用的正则表达式在 vim 中有效,但在 string.replace() 中似乎无效。","real_views":967,"answers":2,"accepted_answer_id":1020000042522808,"tags":[{"id":1040000000089534,"name":"python"},{"id":1040000000089653,"name":"正则表达式"},{"id":1040000013631681,"name":"replace"}]},{"id":1010000009882216,"url":"/q/1010000009882216","title":"正则表达式 python爬虫","excerpt":"import urllib.request req = urllib.request.urlopen('[链接]') reqOut[3]: \u0026lt;http.client.HTTPResponse at 0x52bf6d8\u0026gt; buf = req.read() buf = buf.decode('utf-8') urllist = re.findall(r'//img.+.png',buf)这样正常显示.png结尾的图片网址urllist = re.findall(r'//img.+.jpg',buf)也基本正常urllist = re.fin...","real_views":2429,"answers":2,"accepted_answer_id":1020000009882774,"tags":[{"id":1040000000089534,"name":"python"},{"id":1040000000089409,"name":"html5"},{"id":1040000000089571,"name":"html"}]}]}},"quotedData":{"rows":[],"page":1,"size":5,"total_page":0,"total":0},"questionRecommendationList":[{"id":1010000045672104,"url":"/q/1010000045672104","title":"Qt中布局是否只有5种呢?","excerpt":"我们经常看到的Qt的布局有:5种(都是继承自QLayout) {代码...} 但是我在官方文档有看到其他的Layout相关命名,例如:QPageLayoutQTextLayout等等请问这些是用于布局的吗?还是说Qt中布局就只有5种呢?","real_views":4471,"answers":4,"accepted_answer_id":1020000045672119,"tags":[{"id":1040000000391379,"name":"后端"},{"id":1040000000089534,"name":"python"},{"id":1040000000089776,"name":"qt"},{"id":1040000003028952,"name":"pyside"},{"id":1040000000090411,"name":"布局"}]},{"id":1010000046180128,"url":"/q/1010000046180128","title":"字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办?","excerpt":"尝试一下字节的 trae AI IDE ([链接])安装后导入 vscode 的配置,好像一起把 vscode 的插件也导入了也能看到 vscode 之前配置的 ssh remote 但是连不上看到「输出」如下⬇️ {代码...}","real_views":3270,"answers":1,"accepted_answer_id":1020000046180172,"tags":[{"id":1040000000089899,"name":"前端"},{"id":1040000000391379,"name":"后端"},{"id":1040000000089534,"name":"python"},{"id":1040000000089524,"name":"算法"},{"id":1040000044137798,"name":"llm"}]},{"id":1010000045598131,"url":"/q/1010000045598131","title":"这段代码为什么不能获取到数据?","excerpt":"{代码...}","real_views":3822,"answers":4,"accepted_answer_id":1020000045598157,"tags":[{"id":1040000000089534,"name":"python"}]},{"id":1010000045592218,"url":"/q/1010000045592218","title":"请问一下,如何理解reduce函数呢?","excerpt":"但是reduce是减少的意思,请问如何形象化地理解reduce呢? 我们可不可以理解:把参数2一个一个地带入到参数1(函数)中执行,这样也就慢慢减少呢?","real_views":2182,"answers":3,"accepted_answer_id":1020000046004994,"tags":[{"id":1040000000089899,"name":"前端"},{"id":1040000000391379,"name":"后端"},{"id":1040000000089534,"name":"python"},{"id":1040000000089436,"name":"javascript"}]},{"id":1010000045904260,"url":"/q/1010000045904260","title":"如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃?","excerpt":"通过python+selenium去爬取goodreads上一本书的评论,由于goodreads的评论是一页加载所有内容,不断点load more,就不断在该页面增加内容,在加载到3000-5000条评论时,页面就会崩溃,用的edge,内存设置的无限制。这种情况应该怎么做才行。","real_views":4524,"answers":1,"accepted_answer_id":1020000045936030,"tags":[{"id":1040000000089534,"name":"python"},{"id":1040000000139924,"name":"selenium"},{"id":1040000000089743,"name":"内存管理"}]},{"id":1010000045924573,"url":"/q/1010000045924573","title":"如何使用 python 代码实现迅雷磁力链接资源的下载?","excerpt":"很多磁力链接,只有使用迅雷客户端才能下载有速度但是迅雷客户端没有可操作的 sdk如果我有很多的磁力链接,需要下载,且需要指定每个磁力的下载位置等等操作,怎么实现自动化和批量化?","real_views":3916,"answers":1,"accepted_answer_id":1020000046047489,"tags":[{"id":1040000000391379,"name":"后端"},{"id":1040000000089534,"name":"python"}]},{"id":1010000045693220,"url":"/q/1010000045693220","title":"在PyCharm开发不同python项目,如果每个项目使用自己的venv环境,是不是每次切换项目都需要修改python interpreter?","excerpt":"在PyCharm开发不同python项目,如果每个项目使用自己的venv环境,是不是每次切换项目都需要修改python interpreter?","real_views":2808,"answers":1,"accepted_answer_id":1020000045693229,"tags":[{"id":1040000000391379,"name":"后端"},{"id":1040000000089534,"name":"python"},{"id":1040000000090615,"name":"pycharm"}]}]},"action":{"1010000042527904":{"votes":0,"isLiked":false,"isHated":false,"bookmarks":0,"isBookmarked":false},"1020000042527908":{"votes":0,"isLiked":false,"isHated":false,"bookmarks":0,"isBookmarked":false},"1020000042527906":{"votes":0,"isLiked":false,"isHated":false,"bookmarks":0,"isBookmarked":false}},"editor":{"markdownContent":"","title":{"value":"","isInvalid":false,"errorMsg":""},"log":{"value":"","isInvalid":false,"errorMsg":""},"tags":{"value":[],"isInvalid":false,"errorMsg":""},"draftInfo":{"id":"","status":"","statusText":""},"isFull":false,"initTags":[],"detail":{}},"global":{"sessionUser":null,"isHiddenHeader":false,"isHiddenFooter":false,"title":"python - 提取正则表达式匹配的一部分 - SegmentFault 思否","titleAlias":"提取正则表达式匹配的一部分 - SegmentFault 思否","isShowLogin":false,"beginnerGuideState":{"visible":false,"type":1},"isShowBindMobile":false,"unactivated":false,"isShowRegister":false,"headOptions":{"keywords":"python,html,regex,html-content-extraction","description":"我想要一个正则表达式来从 HTML 页面中提取标题。目前我有这个: {代码...} 是否有正则表达式可以仅提取 的内容,因此我不必删除标签? 原文由 hoju 发布,翻译遵循 CC BY-SA 4.0 许可协议"},"sessionInfo":{"key":"ed600765eb008e6f5d197f40d8e889b4","login":false,"id":null},"singleNotice":"","currentRoute":{"noLayout":false,"customLayout":false,"headerType":"default","platform":"","action":"","param":""},"letterNum":0,"noticeNum":0,"serverData":{"Token":"","userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","platform":""},"userStat":{},"newTask":{},"authChannel":"","followedTags":[],"globalCss":"","baiduOxAppState":{"isShow":false,"copyUrl":""},"wxShareCover":"","isBaiduOxApp":false,"showProductAuthModal":{"isShow":false,"title":""},"routeInterceptor":{"path":""},"safeCheckModal":{"isShow":false,"pageSource":""},"followSFState":{"isShow":false},"messageNotice":{"event":{"general":0,"ranked":0,"followed":0,"inbox":0,"comment":0,"marketing":0},"message":0,"badge":0},"badgeModalState":{},"asidesData":{},"metaQuestions":{},"notices":{"rows":[],"size":3},"recommendSites":[{"id":1140000000592323,"url":"/site/thinking","thumbnail_url":"https://avatar-static.segmentfault.com/159/147/1591470715-67498d9667680_tiny24","name":"极客观点","slug":"thinking"},{"id":1140000044937282,"url":"/site/rdmanagement","thumbnail_url":"https://avatar-static.segmentfault.com/297/875/2978758509-67498f579c239_tiny24","name":"项目管理","slug":"rdmanagement"},{"id":1140000044557488,"url":"/site/harmonyos","thumbnail_url":"https://avatar-static.segmentfault.com/112/093/1120939266-65aa32e446da8_tiny24","name":"HarmonyOS","slug":"harmonyos"}],"adOptions":{"tag":"python,html,正则表达式,html-content-extraction"},"serverTime":1743240972}}},"__N_SSP":true},"page":"/Questions/Detail","query":{"qid":"1010000042527904"},"buildId":"W6mkEYg65e8jb51xMvCnI","assetPrefix":"https://static.segmentfault.com/main_site_next/prod","runtimeConfig":{"publicPath":"https://static.segmentfault.com/main_site_next/prod/","appVersion":"25.03.28"},"isFallback":false,"isExperimentalCompile":false,"gssp":true,"scriptLoader":[]}</script></body></html>