我读取一个xhtml
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta name="calibre:cover" content="true"/>
<title>Cover</title>
<style type="text/css" title="override_css">
@page {padding: 0pt; margin:0pt}
body { text-align: center; padding:0pt; margin: 0pt; }
</style>
</head>
<body>
<div>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
<image width="200" height="266" xlink:href="cover1.jpeg"/>
</svg>
</div>
</body>
</html>
把一个xhtml里的body里的保存到新的html
from lxml import etree
with open("test.xhtml", 'r', encoding='utf8') as html:
tree = etree.parse(html)
body = tree.find(
'//xmlns:body',
namespaces={'xmlns': 'http://www.w3.org/1999/xhtml'}
)
nsmap = body.nsmap
# 这里不加nsmap所有标签都会有namespaces
page_xml = etree.Element('div', nsmap=nsmap)
for child in body.iterchildren():
page_xml.append(child)
etree.ElementTree(page_xml).write(
"new.html",
pretty_print=True,
encoding='utf-8',
method='html'
)
最后转换出来new.html多了一个xmlns,问题来了怎么去掉呢?
<div xmlns="http://www.w3.org/1999/xhtml">
<div>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
<image width="200" height="266" xlink:href="cover1.jpeg"/>
</svg>
</div>
</div>
用
HTML
处理就不会带名字空间: