获取当前节点:etree.tostring
正确显示中文
方法一:使用html库的unescape函数
html.unescape
from lxml import etree
import html
with open('list.html', 'r', encoding='utf-8') as f:
text = f.read()
tree = etree.HTML(text)
r = html.unescape(etree.tostring(tree.xpath(
'//*[@id="scroll_marquee"]')[0]).decode('utf-8'))
print(r)
print(type(r))
参考链接:爬取网页时调用tostring()中文乱码("数字;")解决方案
方法二:使用lxml库的etree.tostring方法
from lxml import etree
import requests
response = requests.get('https://www.baidu.com/).text
tree = etree.HTML(response)
strs = tree.xpath( "//body")
strs = strs[0]
strs = str(etree.tostring(info, encoding="utf-8"), encoding='utf-8')
print (strs)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。