Python使用urllib抓取汉字乱码

# translation.py
import urllib.request
import urllib.parse


url = 'http://fanyi.baidu.com/v2transapi'
data = {}


data['from'] = 'en'
data['to'] = 'zh'
data['query'] = 'Most solar heating systems use large aluminum or alloy sheets, painted black to absorb the sun\'s heat. '
data['transtype'] = 'realtime'
data['simple_means_flag'] = '3'


data = urllib.parse.urlencode(data).encode('utf-8')



response = urllib.request.urlopen(url, data)


html = response.read().decode('utf-8')


with open('fanyi.txt', 'w', encoding='utf-8') as fanyi:
    fanyi.write(html)
print(html)

这样的代码运行之后发现返回的汉字全部变成unicode代码:\u5927\u591a\u6570\u592a\u9633\u80fd\u52a0\u70ed\u7cfb\u7edf\u4f7f\u7528\u5927\u7684\u94dd\u6216\u5408\u91d1\u677f\uff0c\u6d82\u4e0a\u9ed1\u8272\u4ee5\u5438\u6536\u592a\u9633\u7684\u70ed\u91cf\u3002

请问如何才能修改代码,使汉字正常显示?

阅读 4.2k
2 个回答

这是一段 json 结构的数据,所以需要进行一些处理

import urllib.request
import urllib.parse
import json


url = 'http://fanyi.baidu.com/v2transapi'
data = {}


data['from'] = 'en'
data['to'] = 'zh'
data['query'] = 'Most solar heating systems use large aluminum or alloy sheets, painted black to absorb the sun\'s heat. '
data['transtype'] = 'realtime'
data['simple_means_flag'] = '3'


data = urllib.parse.urlencode(data).encode('utf-8')



response = urllib.request.urlopen(url, data)


html = response.read().decode('utf-8')
_html = json.loads(html)


with open('fanyi.txt', 'w', encoding='utf-8') as fanyi:
    fanyi.write(html)
print(_html)

得到这些说明你的程序运行是正确的。

>>> print u'\u5927\u591a\u6570\u592a\u9633\u80fd\u52a0\u70ed\u7cfb\u7edf\u4f7f\u7528\u5927\u7684\u94dd\u6216\u5408\u91d1\u677f\uff0c\u6d82\u4e0a\u9ed1\u8272\u4ee5\u5438\u6536\u592a\u9633\u7684\u70ed\u91cf\u3002'
大多数太阳能加热系统使用大的铝或合金板,涂上黑色以吸收太阳的热量。
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题