在使用结巴分词对一系列弹幕文本(由excel另存为txt)进行分类时,使用如下代码:
import jieba
jieba.load_userdict("/Users/arthurccli/Downloads/弹幕/user_dict2.txt")
with open('/Users/arthurccli/Downloads/弹幕/corpus_seg.txt','w+') as fw:
name2label = {'A1': '__label__A1', 'A2': '__label__A2', 'A3': '__label__A3', 'A4': '__label__A4'}
for lines in open( "/Users/arthurccli/Downloads/弹幕/corpus.txt", 'r'):
name,content =lines.strip().split('\t')
label = name2label.get(name)
words = jieba.cut(content)
fw.write('%s\t%s'%(label,' '.join(words))+'\n')
fw.close()
运行后返回提示:
traceback (most recent call last):
File "model_train2.py", line 12, in <module>
name,content =lines.strip().split('\t')
ValueError: too many values to unpack
在网上找了一些答案,有人说是变量数目不对,但是没有搞懂这里为什么会出错?