一个Python脚本 展示结巴分词结果 如下所示
test.py
# -*- coding: utf-8 -*-
import jieba
print("/".join(jieba.cut("hello world")))
print("/".join(jieba.cut("你好世界")))
运行该脚本 一切正常
python test.py
Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/1k/p6dwx_0x28b6y752c1269hhw0000gn/T/jieba.cache
Loading model cost 0.398 seconds.
Prefix dict has been built succesfully.
hello/ /world
你好/世界
不想展示结巴相关的一些多余信息 于是通过Linux重定向的方式屏蔽这些信息 如下所示
python test.py 2>/dev/null
hello/ /world
你好/世界
于是可以==》 如果将正常输出屏蔽掉的话(1>/dev/null) 应该只会显示 python test.py 1>/dev/null
= python test.py
- python test.py 2>/dev/null
即
Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/1k/p6dwx_0x28b6y752c1269hhw0000gn/T/jieba.cache
Loading model cost 0.398 seconds.
Prefix dict has been built succesfully.
但是发现实际输出的更多
python test.py 1>/dev/null
Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/1k/p6dwx_0x28b6y752c1269hhw0000gn/T/jieba.cache
Loading model cost 0.379 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "test.py", line 10, in <module>
print("/".join(jieba.cut("你好世界")))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
这多出来的输出是怎么来的啊?怎么一开始执行 python test.py
的时候就没有这个输出呢?
你是 Python 2 么?Python 2 默认重定向到文件时是 ascii 编码的输出,在输出中文时会出错的。
解决方案是设置 PYTHONIOENCODING 为 utf-8,或者使用 Python 3。