Abstract: We often need to parse data written in different languages. Python provides many libraries to parse or split data written in other languages. In this Python XML Parser tutorial, you will learn how to parse XML using Python.
This article is shared from the HUAWEI CLOUD Community " Learn Python from scratch | How to parse and modify XML in Python? ", original author: Yuchuan.
We often need to parse data written in different languages. Python provides many libraries to parse or split data written in other languages. In this Python XML Parser tutorial, you will learn how to parse XML using Python.
Here are all the topics covered in this tutorial:
What is XML?
Python XML Parsing Modules
xml.etree.ElementTree Module
- Using parse() function
- Using fromstring() function
- Finding Elements of Interest
- Modifying XML files
- Adding to XML
- Deleting from XML
xml.dom.minidom Module
- Using parse() function
- Using fromString() function
- Finding Elements of Interest
let's start. :)
What is XML?
XML stands for Extensible Markup Language. It is similar to HTML in appearance, but XML is used for data representation, and HTML is used to define the data being used. XML is specifically designed to send and receive data back and forth between the client and the server. Take a look at the following example:
example:
<? xml version ="1.0" encoding ="UTF-8" ?>
<metadata>
<food>
<item name ="breakfast" > Idly </item>
<price> $2.5 </price>
<description>
两个 idly's with chutney
< /description>
<calories> 553 </calories>
</food>
<food>
<item name ="breakfast" > Paper Dosa </item>
<price> $2.7 </price>
<
<calories> 700 </calories>
</food>
<food>
<item name ="breakfast" > Upma </item>
<price> $3.65 </price>
<description>
Rava upma with bajji
</description>
<calories> 600 </calories>
</food>
<food>
<item name ="breakfast" > Bisi Bele Bath </item>
<price> $4.50 </price>
<description>
Bisi Bele Bath with sev
</description>
<calories> 400 </calories>
</food>
<food>
<item name ="breakfast" > Kesari Bath </item>
<price> $1.95 </price>
<description>
藏红花甜拉瓦
</description>
<calories> 950 </calories>
</食物>
</元数据>
The example above shows the content of the file I named "Sample.xml", and I will use the same content for all upcoming examples in this Python XML parser tutorial.
Python XML parsing module
Python allows the use of two modules to parse these XML documents, namely the xml.etree.ElementTree module and Minidom (the minimal DOM implementation). Parsing means reading information from a file and splitting it into multiple parts by identifying the parts of that particular XML file. Let us further understand how to use these modules to parse XML data.
xml.etree.ElementTree module:
This module helps us format XML data in a tree structure, which is the most natural representation of hierarchical data. The element type allows a hierarchical data structure to be stored in memory and has the following properties:
ElementTree is a class that wraps element structure and allows conversion with XML. Now let's try to parse the above XML file using the python module.
There are two ways to parse files using the "ElementTree" module. The first is to use the parse() function, and the second is the fromstring() function. The parse() function parses the XML document provided as a file, and fromstring parses the XML provided as a string, that is, within three quotes.
Use the parse() function:
As mentioned earlier, this function uses XML in file format to parse it. Look at the following example:
example:
import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
As you can see, the first thing you need to do is to import the xml.etree.ElementTree module. Then, the parse() method parses the "Sample.xml" file. The getroot() method returns the root element of "Sample.xml".
When you execute the above code, you will not see the output returned, but there will be no errors indicating that the code has been successfully executed. To check the root element, you can simply use the print statement as follows:
example:
import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
print(myroot)
Output: <Element'metadata' at 0x033589F0>
The above output indicates that the root element in our XML document is "metadata".
Use the fromstring() function:
You can also use the fromstring() function to parse your string data. If you want to do this, pass the XML as a string to triple quotes, like this:
import xml.etree.ElementTree as ET
data='''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>
Two idly's with chutney
</description>
<calories>553</calories>
</food>
</metadata>
'''
myroot = ET.fromstring(data)
#print(myroot)
print(myroot.tag)
The above code will return the same output as the previous one. Please note that the XML document used as a string is only part of "Sample.xml", and I use it to improve visibility. You can also use complete XML documents.
You can also retrieve the root label using the "label" object as follows:
example:
print(myroot.tag)
Output: Metadata
You can also slice the label string output by specifying the part of the string you want to see in the output.
example:
print(myroot.tag[0:4])
Output: Yuan
As mentioned earlier, tags can also have dictionary attributes. To check if the root tag has any attributes, you can use the "attrib" object as follows:
example:
print(myroot.attrib)
Output: {}
As you can see, the output is an empty dictionary because our root tag has no attributes.
Find the elements of interest:
The root is also composed of subtags. To retrieve the children of the root tag, you can use the following command:
example:
print(myroot[0].tag)
Output: food
Now, if you want to retrieve all the first subtags of the root, you can iterate it using a for loop, as shown below:
example:
for x in myroot[0]:
print(x.tag, x.attrib)
Output:
item {'name': 'breakfast'}
price{}
description{}
Calories {}
All items returned are sub-attributes and tags of the food.
To separate text from XML using ElementTree, you can use the text property. For example, if I want to retrieve all information about the first food, I should use the following code:
example:
for x in myroot[0]:
print(x.text)
Output:
Lazily
$ 2.5
Two leisurely with chutney
553
As you can see, the text information of the first item has been returned as output. Now, if you want to display all items with a specific price, you can use the get() method. This method accesses the attributes of the element.
example:
for x in myroot.findall('food'):
item =x.find('item').text
price = x.find('price').text
print(item, price)
Output:
Idly $2.5
Paper Dosa $2.7
Upma $3.65
Bisi Bele Bath $4.50
Kesari Bath $1.95
The above output shows all the required items and the price of each item. With ElementTree, you can also modify XML files.
Modify the XML file:
You can manipulate the elements in the XML file. For this, you can use the set() function. Let's first look at how to add something to XML.
Add to XML:
The following example shows how to add content to the project description.
example:
for description in myroot.iter('description'):
new_desc = str(description.text)+'wil be served'
description.text = str(new_desc)
description.set('updated', 'yes')
mytree.write('new.xml')
The write() function helps to create a new xml file and write the updated output to the same file. However, you can also use the same function to modify the original file. After executing the above code, you will be able to see that a new file with updated results has been created.
The picture above shows the modified description of our food. To add a new subtag, you can use the SubElement() method. For example, if you want to add a new professional label to the first item Idly, you can do the following:
example:
ET.SubElement(myroot[0], 'speciality')
for x in myroot.iter('speciality'):
new_desc = 'South Indian Special'
x.text = str(new_desc)
mytree.write('output5.xml')
Output:
As you can see, a new label has been added under the first food label. By specifying the subscript within the [] brackets, you can add tags anywhere. Now let's take a look at how to use this module to delete items.
Remove from XML:
To use ElementTree to delete attributes or child elements, you can use the pop() method. This method will remove the required attributes or elements that the user does not need.
example:
myroot[0][0].attrib.pop('name', None)
# create a new XML file with the results
mytree.write('output5.xml')
Output:
The image above shows that the name attribute has been removed from the item tag. To delete the complete label, you can use the same pop() method as follows:
example:
myroot[0].remove(myroot[0][0])
mytree.write('output6.xml')
Output:
The output shows that the first child element of the food label has been deleted. If you want to delete all tags, you can use the clear() function, as shown below:
example:
myroot[0].clear()
mytree.write('output7.xml')
Output:
When the above code is executed, the first subtag of the food tag will be completely deleted, including all subtags. So far, we have been using the xml.etree.ElementTree module in this Python XML parser tutorial. Now let's see how to use Minidom to parse XML.
xml.dom.minidom module:
This module is basically used by people who are proficient in DOM (Document Object Module). DOM applications usually parse XML into DOM first. In xml.dom.minidom, this can be achieved in the following ways:
Use the parse() function:
The first method is to use the parse() function by providing the XML file to be parsed as a parameter. E.g:
example:
from xml.dom import minidom
p1 = minidom.parse("sample.xml");
After doing this, you will be able to split the XML file and get the required data. You can also use this function to parse open files.
example:
dat=open('sample.xml')
p2=minidom.parse(dat)
In this case, the variable storing the opened file is provided as a parameter to the parsing function.
Use the parseString() method:
This method is used when you want to provide XML to be parsed as a string.
example:
p3 = minidom.parseString('<myxml>Using<empty/> parseString</myxml>')
You can use any of the above methods to parse XML. Now let's try to use this module to get data.
Find the elements of interest:
After my file is parsed, if I try to print it, the returned output will show a message stating that the variable storing the parsed data is a DOM object.
example:
dat=minidom.parse('sample.xml')
print(dat)
Output:
<xml.dom.minidom.Document object is at 0x03B5A308>
Use GetElementByTagName to access the element:
example:
tagname= dat.getElementsByTagName('item')[0]
print(tagname)
If I try to get the first element using the GetElementByTagName method, I will see the following output:
Output:
<DOM element: item at 0xc6bd00>
Note that only one output is returned because I used the [0] subscript for convenience, which will be deleted in further examples.
To access the value of an attribute, I have to use the value attribute as follows:
example:
dat = minidom.parse('sample.xml')
tagname= dat.getElementsByTagName('item')
print(tagname[0].attributes['name'].value)
Output: breakfast
To retrieve the data present in these tags, you can use the data attribute as shown below:
example:
print(tagname[1].firstChild.data)
Output: Paper Dosa
You can also use the value attribute to split and retrieve the value of the attribute.
Example:
print(items[1].attributes['name'].value)
Output: breakfast
To print out all the items available in our menu, you can iterate through these items and return all items.
example:
for x in items:
print(x.firstChild.data)
Output:
Stand by
Paper DOSA
UPMA
Bes Belle Bath
Kesari bath
To count the number of items on the menu, you can use the len() function as follows:
example:
print(len(items))
The output specifies that our menu contains 5 items.
This brings us to the end of this Python XML parser tutorial. I hope you have understood everything clearly.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。