使用python方式,如何解析xml下面的多级节点。现在有一个xml文件"test",如下:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD esearch 20060628//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
<Count>11</Count>
<RetMax>11</RetMax>
<RetStart>0</RetStart>
<QueryKey>1</QueryKey>
<WebEnv>NCID_1_22456141_130.14.22.215_9001_1482583903_1954957455_0MetA0_S_MegaStore_F_1</WebEnv>
<IdList>
<Id>25256062</Id>
<Id>24081686</Id>
<Id>23962761</Id>
<Id>23524110</Id>
<Id>20092289</Id>
<Id>19845767</Id>
<Id>17158054</Id>
<Id>17077549</Id>
<Id>15643558</Id>
<Id>9741881</Id>
<Id>11038888</Id>
</IdList>
</eSearchResult>
我使用sax方式,实在是不知道怎么修改了。我的sax方式的代码如下
#!/usr/bin/python3
# -*- coding: UTF-8 -*-
#由itemList文件得到item的列表
import xml.sax
class MovieHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.Count = ""
self.RetMax = ""
self.RetStart = ""
self.IdList = {}
# 元素开始事件处理
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "eSearchResult":
print ("*****eSearchResult*****")
#title = attributes["title"]
#print ("Title:"+title)
elif tag == "IdList":
print ("*****IdList*****")
self.IdList = {}
# 元素结束事件处理
def endElement(self, tag):
if self.CurrentData == "Count":
print ("Count:"+self.Count)
elif self.CurrentData == "RetMax":
print ("RetMax:"+self.RetMax)
elif self.CurrentData == "RetStart":
print ("RetStart:"+ self.RetStart)
elif self.CurrentData == "IdList":
#print ("IdList:"+self.Id)
print("-----IdList------")
self.CurrentData = ""
# 内容事件处理
def characters(self, content):
if self.CurrentData == "Count":
self.Count = content
elif self.CurrentData == "RetMax":
self.RetMax = content
elif self.CurrentData == "RetStart":
self.RetStart = content
elif self.CurrentData == "Id":
#self.IdList.append(content)
self.IdList={}
if ( __name__ == "__main__"):
# 创建一个 XMLReader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
# 重写 ContextHandler
Handler = MovieHandler()
parser.setContentHandler( Handler )
parser.parse("test")
实在没有办法解决了。但是使用elementtree形式,很容易,代码如下
#!/usr/bin/python3
from xml.etree.ElementTree import parse
doc=parse("test");
for item in doc.iterfind('IdList/Id'):
id=item.text;
print(id)
希望大虾可以帮忙解决sax方式解析此xml文件
在MovieHandler里加一个tag的栈, 内容是"从顶到现在node的tag", 用startElement和endElement维护这个栈.
然后在characters的里面用那个栈来判断, 比如当栈等于
['IdList', 'Id']
时print