既然加了nltk的tag,大概是想问怎么调nltk的api吧 from nltk.stem import PorterStemmer from nltk.tokenize import RegexpTokenizer from nltk.corpus import stopwords tokenizer = RegexpTokenizer(r'\w+') stemmer = PorterStemmer() stop_words = set(stopwords.words('english')) input_str = input_str.lower() raw_tokens = tokenizer.tokenize(input_str) stemmed_tokens = [stemmer.stem(token) for token in raw_tokens] stemmed_tokens = map(stemmer.stem, raw_tokens) stemmed_tokens_without_stopword = filter(lambda i: i not in stop_words, stemmed_tokens)
既然加了nltk的tag,大概是想问怎么调nltk的api吧