正则表达式如何匹配多个相同标签的内容

<Keyword MajorTopicYN="N">chromatin remodeling complex</Keyword>
<Keyword MajorTopicYN="N">ciliopathy</Keyword>
<Keyword MajorTopicYN="N">dynein arm assembly factor</Keyword>

我想要匹配keyword标签之间的全部内容
现在我的写法如下,不过只匹配到第一个keyword标签的内容,请问该如何修改?

if(/\<Keyword .*?\>(.*?)\<\/Keyword\>/g.test(str)){
    console.log(RegExp.$1);
}
阅读 6.4k
2 个回答

这两种正则方式都可以尝试一下。
或者还可以使用DOM操作的方式,query出来所有的Keyword,然后取innerHTML

DOM操作示例:

let result = (() => {
  let str = `
    <Keyword MajorTopicYN="N">chromatin remodeling complex</Keyword>
    <Keyword MajorTopicYN="N">ciliopathy</Keyword>
    <Keyword MajorTopicYN="N">dynein arm assembly factor</Keyword>
  `
  let parse = new DOMParser()

  let dom = parse.parseFromString(`<div>${str}</div>`, 'text/xml')

  let keywordList = dom.querySelectorAll('Keyword')

  return (keywordList && keywordList.length) ? Array.from(keywordList).map(dom => dom.innerHTML) : null
})()

console.log(result)

正则示例:

let result1 = (() => {
  let str = `
    <Keyword MajorTopicYN="N">chromatin remodeling complex</Keyword>
    <Keyword MajorTopicYN="N">ciliopathy</Keyword>
    <Keyword MajorTopicYN="N">dynein arm assembly factor</Keyword>
  `
  let reg = /<Keyword[^>]*>(.*)<\/Keyword>/gmi
  let tagReg = /^<Keyword[^>]*>(.*)<\/Keyword>$/
  let result = str.match(reg)

  return result.length ? result.map(tag => tag.match(tagReg)[1]) : null 
})()

let result2 = (() => {
  let str = `
    <Keyword MajorTopicYN="N">chromatin remodeling complex</Keyword>
    <Keyword MajorTopicYN="N">ciliopathy</Keyword>
    <Keyword MajorTopicYN="N">dynein arm assembly factor</Keyword>
  `
  let reg = /<Keyword[^>]*>(.*)<\/Keyword>/gmi

  let result = []
  str.replace(reg, (_, $1) => result.push($1))

  return result
})()

console.log(result1, result2)
from bs4 import BeautifulSoup
html = '''<Keyword MajorTopicYN="N">chromatin remodeling complex</Keyword>
<Keyword MajorTopicYN="N">ciliopathy</Keyword>
<Keyword MajorTopicYN="N">dynein arm assembly factor</Keyword>'''

soup = BeautifulSoup(html,'html.parser')
ss = soup.find_all('keyword')
for s in ss:
    print s.get_text()
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题