lxml怎么删除namespaces

我读取一个xhtml

<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
        <meta name="calibre:cover" content="true"/>
        <title>Cover</title>
        <style type="text/css" title="override_css">
            @page {padding: 0pt; margin:0pt}
            body { text-align: center; padding:0pt; margin: 0pt; }
        </style>
    </head>
    <body>
        <div>
            <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
                <image width="200" height="266" xlink:href="cover1.jpeg"/>
            </svg>
        </div>
    </body>
</html>

把一个xhtml里的body里的保存到新的html

from lxml import etree
with open("test.xhtml", 'r', encoding='utf8') as html:
    tree = etree.parse(html)
    body = tree.find(
        '//xmlns:body',
         namespaces={'xmlns': 'http://www.w3.org/1999/xhtml'}
    )
    nsmap = body.nsmap
    # 这里不加nsmap所有标签都会有namespaces
    page_xml = etree.Element('div', nsmap=nsmap)
    for child in body.iterchildren():
        page_xml.append(child)
    etree.ElementTree(page_xml).write(
        "new.html",
        pretty_print=True,
        encoding='utf-8',
        method='html'
    )

最后转换出来new.html多了一个xmlns,问题来了怎么去掉呢?

<div xmlns="http://www.w3.org/1999/xhtml">
    <div>
        <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
            <image width="200" height="266" xlink:href="cover1.jpeg"/>
        </svg>
    </div>
</div>
阅读 3.9k
1 个回答

HTML 处理就不会带名字空间:

# -*- coding: utf-8 -*-

from lxml import etree

content = '''
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
        <meta name="calibre:cover" content="true"/>
        <title>Cover</title>
        <style type="text/css" title="override_css">
            @page {padding: 0pt; margin:0pt}
            body { text-align: center; padding:0pt; margin: 0pt; }
        </style>
    </head>
    <body>
        <div>
            <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
                <image width="200" height="266" xlink:href="cover1.jpeg"/>
            </svg>
        </div>
    </body>
</html>
'''

print etree.tostring(etree.HTML(content).xpath('//body/*')[0])
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题