NLTK中文分词报错

from nltk.tokenize import StanfordSegmenter
segmenter = StanfordSegmenter(
  path_to_sihan_corpora_dict="E:/NLP/NLP_code/Installation/base/stanford-segmenter-2017-06-09/data",   path_to_model="E:/NLP/NLP_code/Installation/base/stanford-segmenter-2017-06-09/data/pku.gz",   path_to_dict="E:/NLP/NLP_code/Installation/base/stanford-segmenter-2017-06-09/data/dict-chris6.ser.gz")
res = segmenter.segment(u"北海已成为中国对外开放中升起的一颗明星")
print(res)

C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\python.exe D:/programming/leetcode/test.py
D:/programming/leetcode/test.py:3: DeprecationWarning: 
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use nltk.parse.corenlp.CoreNLPTokenizer instead.'
  path_to_sihan_corpora_dict="E:/NLP/NLP_code/Installation/base/stanford-segmenter-2017-06-09/data",   path_to_model="E:/NLP/NLP_code/Installation/base/stanford-segmenter-2017-06-09/data/pku.gz",   path_to_dict="E:/NLP/NLP_code/Installation/base/stanford-segmenter-2017-06-09/data/dict-chris6.ser.gz")
Traceback (most recent call last):
  File "D:/programming/leetcode/test.py", line 4, in <module>
    res = segmenter.segment(u"北海已成为中国对外开放中升起的一颗明星")
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\tokenize\stanford_segmenter.py", line 182, in segment
    return self.segment_sents([tokens])
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\tokenize\stanford_segmenter.py", line 210, in segment_sents
    stdout = self._execute(cmd)
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\tokenize\stanford_segmenter.py", line 229, in _execute
    stdout, _stderr = java(cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE)
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\internals.py", line 129, in java
    p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 971, in _execute_child
    args = list2cmdline(args)
  File "C:\Users\lybroman\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 461, in list2cmdline
    needquote = (" " in arg) or ("\t" in arg) or not arg
TypeError: argument of type 'NoneType' is not iterable

阅读 8.1k

java_class='edu.stanford.nlp.ie.crf.CRFClassifier', path_to_jar='/home/kenwood/stanford/segmenter/stanford-segmenter.jar', path_to_slf4j='/home/kenwood/stanford/segmenter/slf4j-api.jar', path_to_sihan_corpora_dict='/home/kenwood/stanford/segmenter/data', path_to_model='/home/kenwood/stanford/segmenter/data/pku.gz', path_to_dict='/home/kenwood/stanford/segmenter/data/dict-chris6.ser.gz'

path_to_sihan_corpora_dict="E:\stanford_nlp\stanford-segmenter-2018-10-16\data", path_to_model="E:\stanford_nlp\stanford-segmenter-2018-10-16\data\pku.gz", path_to_dict="E:\stanford_nlp\stanford-segmenter-2018-10-16\data\dict-chris6.ser.gz", java_class = 'edu.stanford.nlp.ie.crf.CRFClassifier')

NLTK中文分词报错

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何实现一个深拷贝函数？

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？