from nltk.tokenize.stanford_segmenter import StanfordSegmenter
segmenter = StanfordSegmenter(path_to_jar="E:/anaconda/StanfordNLTK/stanford-segmenter.jar",
java_class="edu.stanford.nlp.ie.crf.CRFClassifier",
path_to_slf4j="E:/anaconda/StanfordNLTK/slf4j-api.jar",
path_to_sihan_corpora_dict="E:/anaconda/StanfordNLTK/data",
path_to_model="E:/anaconda/StanfordNLTK/data/pku.gz",
path_to_dict="E:/anaconda/StanfordNLTK/data/dict-chris6.ser.gz"
)
出现提示:E:pycharmpycharmpycharm2017pjbpycharm-professional-2017.2.3helperspydevpydevconsole.py:6: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use nltk.parse.corenlp.CoreNLPTokenizer instead.'
升级nltk到3.3版本或3.4版本即可消去提示。如果提前设置了环境变量则path_to_jar和path_to_slf4j可以不用传。剩余的4个参数还是要传的。