BeautifulSoup innerhtml？

Question

新手上路，请多包涵

假设我有一个带有 div 的页面。我可以使用 soup.find() 轻松获得该 div。

现在我有了结果，我想打印整个 innerhtml div ：我的意思是，我需要一个包含所有 html 标签和文本的字符串，就像我在 javascript 中使用 obj.innerHTML 得到的字符串一样。这可能吗？

原文由 Matteo Monti 发布，翻译遵循 CC BY-SA 4.0 许可协议

python html beautifulsoup innerhtml

阅读 693

1 个回答

得票最新

社区维基

1

发布于
2022-12-19

长话短说

对于 BeautifulSoup 4 使用 element.encode_contents() 如果你想要一个 UTF-8 编码的字节串或者使用 element.decode_contents() 如果你想要一个 Python Unicode 字符串。例如， DOM 的 innerHTML 方法可能看起来像这样：

 def innerHTML(element):
    """Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
    return element.encode_contents()

这些函数当前不在联机文档中，因此我将引用代码中的当前函数定义和文档字符串。

`encode_contents` - 从 4.0.4 开始

def encode_contents(
    self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
    formatter="minimal"):
    """Renders the contents of this tag as a bytestring.

    :param indent_level: Each line of the rendering will be
       indented this many spaces.

    :param encoding: The bytestring will be in this encoding.

    :param formatter: The output formatter responsible for converting
       entities to Unicode characters.
    """

另请参阅有关格式化程序的文档；您很可能会使用 formatter="minimal" （默认值）或 formatter="html" （对于 html 实体），除非您想以某种方式手动处理文本。

encode_contents 返回编码字节串。如果您想要 Python Unicode 字符串，请改用 decode_contents 。

`decode_contents` - 自 4.0.1

decode_contents encode_contents 同样的事情，但返回 Python Unicode 字符串而不是编码的字节串。

 def decode_contents(self, indent_level=None,
                   eventual_encoding=DEFAULT_OUTPUT_ENCODING,
                   formatter="minimal"):
    """Renders the contents of this tag as a Unicode string.

    :param indent_level: Each line of the rendering will be
       indented this many spaces.

    :param eventual_encoding: The tag is destined to be
       encoded into this encoding. This method is _not_
       responsible for performing that encoding. This information
       is passed in so that it can be substituted in if the
       document contains a <META> tag that mentions the document's
       encoding.

    :param formatter: The output formatter responsible for converting
       entities to Unicode characters.
    """

BeautifulSoup 3

BeautifulSoup 3 没有上述功能，而是有 renderContents

 def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
                   prettyPrint=False, indentLevel=0):
    """Renders the contents of this tag as a string in the given
    encoding. If encoding is None, returns a Unicode string.."""

为了与 BS3 兼容，此功能已添加回 BeautifulSoup 4（在 4.0.4 中）。

原文由 ChrisD 发布，翻译遵循 CC BY-SA 4.0 许可协议

查看全部 1 个回答

推荐问题

Stack Overflow 翻译

子站问答

访问

本篇内容翻译自 Stack Overflow，如果你觉得翻译结果值得改进，欢迎直接编辑修改，感谢你为社区贡献。

相似问题

找不到问题？创建新问题

BeautifulSoup innerhtml？

长话短说

`encode_contents` - 从 4.0.4 开始

`decode_contents` - 自 4.0.1

BeautifulSoup 3

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

问一个鼠标滚动事件，这种是怎么实现的？

css如何设置纵向滚动条的高度？

Stack Overflow 翻译

BeautifulSoup innerhtml？

长话短说

encode_contents - 从 4.0.4 开始

decode_contents - 自 4.0.1

BeautifulSoup 3

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

问一个鼠标滚动事件，这种是怎么实现的？

css如何设置纵向滚动条的高度？

Stack Overflow 翻译

`encode_contents` - 从 4.0.4 开始

`decode_contents` - 自 4.0.1