推荐系统评测指标

一. 评测指标

用户满意度、预测准确度、覆盖率、多样性、新颖性、惊喜度、信任度、实时性、健壮性、商业目标

1. 用户满意度

满意度是评测推荐系统的最重要指标，只能通过用户调查或者在线实验获得，主要是通过调查问卷的形式,需要从不同的侧面询问用户对结果的不同感受

2. 预测准确度

标是最重要的推荐系统离线评测指标，通过离线实验计算

def Recall(train, test, N):
   hit = 0
   all = 0
   for user in train.keys():
   tu = test[user]
   rank = GetRecommendation(user, N)
   for item, pui in rank:
   if item in tu:
   hit += 1
   all += len(tu)
   return hit / (all * 1.0)
def Precision(train, test, N):
   hit = 0
   all = 0
   for user in train.keys():
   tu = test[user]
   rank = GetRecommendation(user, N)
   for item, pui in rank:
   if item in tu:
   hit += 1
   all += N
   return hit / (all * 1.0)

1. 评分预测准确度

一般通过均方根误差（RMSE）和平均绝对误差（MAE）计算,那么RMSE的定义为：

def RMSE(records):
     return math.sqrt(\
     sum([(rui-pui)*(rui-pui) for u,i,rui,pui in records])\
     / float(len(records)))
def MAE(records):
     return sum([abs(rui-pui) for u,i,rui,pui in records])\
     / float(len(records))

2. TopN推荐

TopN推荐的预测准确率一般通过准确率（precision）/召回率（recall）度量。

def PrecisionRecall(test, N):
   hit = 0
   n_recall = 0
   n_precision = 0
   for user, items in test.items():
   rank = Recommend(user, N)
   hit += len(rank & items)
   n_recall += len(items)
   n_precision += N
   return [hit / (1.0 * n_recall), hit / (1.0 * n_precision)]

为了全面评测TopN推荐的准确率和召回率，一般会选取不同的推荐列表长度N，
计算出一组准确率/召回率，然后画出准确率/召回率曲线（precision/recall curve）。

3. 覆盖率

覆盖率（coverage）描述一个推荐系统对物品长尾的发掘能力

def Popularity(train, test, N):
   item_popularity = dict()
   for user, items in train.items():
   for item in items.keys()
   if item not in item_popularity:
   item_popularity[item] = 0
   item_popularity[item] += 1
   ret = 0
   n = 0
   for user in train.keys():
   rank = GetRecommendation(user, N)
   for item, pui in rank:
   ret += math.log(1 + item_popularity[item])
   n += 1
   ret /= n * 1.0
   return ret

4. 多样性

为了满足用户广泛的兴趣，推荐列表需要能够覆盖用户不同的兴趣领域

而推荐系统的整体多样性可以定义为所有用户推荐列表多样性的平均值

5. 新颖性

新颖的推荐是指给用户推荐那些他们以前没有听说过的物品

6. 惊喜度

如果推荐结果和用户的历史兴趣不相似，但却让用户觉得满意，那么就可以说推荐结果的惊喜度很高

7. 信任度

度量推荐系统的信任度只能通过问卷调查的方式，询问用户是否信任推荐系统的推荐结果

8. 实时性

物品（新闻、微博等）具有很强的时效性，所以需要在物品还具有时效性时就将它们推荐给用户

9. 健壮性

如果攻击后的推荐列表相对于攻击前没有发生大的变化，就说明算法比较健壮

10. 商业目标

网站评测推荐系统更加注重网站的商业目标是否达成，而商业目标和网站的盈利模式是息息相关的。