# 困惑度详解（perplexity）

jasperyang

## 喔的天

wiki上有介绍了三种方式，下面我作个小小的翻译，不想看的直接跳过。

1. Perplexity of a probability distribution
2. Perplexity of a probability model
3. Perplexity per word（我们下面用的方法就是这个）

## 正文

``````import numpy as np

def f_testset_word_count(testset):                                     #测试集的词数统计
'''reture the sum of words in testset which is the denominator of the formula of Perplexity'''
return (len(testset.split()))

def graph_draw(topic,perplexity):             #做主题数与困惑度的折线图
x=topic
y=perplexity
plt.plot(x,y,marker="*",color="red",linewidth=2)
plt.xlabel("Number of Topic")
plt.ylabel("Perplexity")
plt.show()

word_topic = {}
f = open('test_data/model-final.tassign')
f = open('test_data/model-final.tassign')

# 用作循环
_topic=[]
perplexity_list=[]

_topic.append(10)
for pattern in patterns:
word = int(pattern.split(':')[0])
topic = int(pattern.split(':')[1])
pattern = pattern.replace(':','_')
if not word_topic.has_key(pattern)==True:
word_topic[pattern] = phi[topic][word]

duishu = 0.0
for frequency in word_topic.values():
duishu += -math.log(frequency)
kuohaoli = duishu/testset_word_count
perplexity = math.exp(kuohaoli)
perplexity_list.append(perplexity)

graph_draw(_topic,perplexity_list)

``````

##### jasperyang

Highest purpose is Hacking...

197 声望
56 粉丝
0 条评论