新手上路，请多包涵

所以我正在使用 sci-kit 学习对一些数据进行分类。我有 13 个不同的类值/分类来对数据进行分类。现在我已经能够使用交叉验证并打印混淆矩阵。但是，它只显示没有类标签的 TP 和 FP 等，所以我不知道哪个类是什么。下面是我的代码和我的输出：

 def classify_data(df, feature_cols, file):
    nbr_folds = 5
    RANDOM_STATE = 0
    attributes = df.loc[:, feature_cols]  # Also known as x
    class_label = df['task']  # Class label, also known as y.
    file.write("\nFeatures used: ")
    for feature in feature_cols:
        file.write(feature + ",")
    print("Features used", feature_cols)

    sampler = RandomOverSampler(random_state=RANDOM_STATE)
    print("RandomForest")
    file.write("\nRandomForest")
    rfc = RandomForestClassifier(max_depth=2, random_state=RANDOM_STATE)
    pipeline = make_pipeline(sampler, rfc)
    class_label_predicted = cross_val_predict(pipeline, attributes, class_label, cv=nbr_folds)
    conf_mat = confusion_matrix(class_label, class_label_predicted)
    print(conf_mat)
    accuracy = accuracy_score(class_label, class_label_predicted)
    print("Rows classified: " + str(len(class_label_predicted)))
    print("Accuracy: {0:.3f}%\n".format(accuracy * 100))
    file.write("\nClassifier settings:" + str(pipeline) + "\n")
    file.write("\nRows classified: " + str(len(class_label_predicted)))
    file.write("\nAccuracy: {0:.3f}%\n".format(accuracy * 100))
    file.writelines('\t'.join(str(j) for j in i) + '\n' for i in conf_mat)

#Output
Rows classified: 23504
Accuracy: 17.925%
0   372 46  88  5   73  0   536 44  317 0   200 127
0   501 29  85  0   136 0   655 9   154 0   172 67
0   97  141 78  1   56  0   336 37  429 0   435 198
0   135 74  416 5   37  0   507 19  323 0   128 164
0   247 72  145 12  64  0   424 21  296 0   304 223
0   190 41  36  0   178 0   984 29  196 0   111 43
0   218 13  71  7   52  0   917 139 177 0   111 103
0   215 30  84  3   71  0   1175    11  55  0   102 62
0   257 55  156 1   13  0   322 184 463 0   197 160
0   188 36  104 2   34  0   313 99  827 0   69  136
0   281 80  111 22  16  0   494 19  261 0   313 211
0   207 66  87  18  58  0   489 23  157 0   464 239
0   113 114 44  6   51  0   389 30  408 0   338 315

如您所见，您无法真正知道哪一列是什么，而且打印也“未对齐”，因此很难理解。

有没有办法打印标签？

原文由 fall2 发布，翻译遵循 CC BY-SA 4.0 许可协议

python 机器学习 scikit-learn confusion-matrix

阅读 679

2 个回答

得票最新

社区维基

发布于
2023-01-10

✓ 已被采纳

从 doc 来看，似乎没有这样的选项来打印混淆矩阵的行和列标签。但是，您可以使用参数指定标签顺序 labels=...

例子：

 from sklearn.metrics import confusion_matrix

y_true = ['yes','yes','yes','no','no','no']
y_pred = ['yes','no','no','no','no','no']
print(confusion_matrix(y_true, y_pred))
# Output:
# [[3 0]
#  [2 1]]
print(confusion_matrix(y_true, y_pred, labels=['yes', 'no']))
# Output:
# [[1 2]
#  [0 3]]

If you want to print the confusion matrix with labels, you may try pandas and set the index and columns of the DataFrame .

 import pandas as pd
cmtx = pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=['yes', 'no']),
    index=['true:yes', 'true:no'],
    columns=['pred:yes', 'pred:no']
)
print(cmtx)
# Output:
#           pred:yes  pred:no
# true:yes         1        2
# true:no          0        3

或者

unique_label = np.unique([y_true, y_pred])
cmtx = pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=unique_label),
    index=['true:{:}'.format(x) for x in unique_label],
    columns=['pred:{:}'.format(x) for x in unique_label]
)
print(cmtx)
# Output:
#           pred:no  pred:yes
# true:no         3         0
# true:yes        2         1

原文由 pe-perry 发布，翻译遵循 CC BY-SA 4.0 许可协议

社区维基

发布于
2023-01-10

重要的是要确保标记混淆矩阵行和列的方式与 sklearn 对类进行编码的方式完全对应。标签的真实顺序可以使用分类器的 .classes_ 属性来揭示。您可以使用下面的代码准备一个混淆矩阵数据框。

 labels = rfc.classes_
conf_df = pd.DataFrame(confusion_matrix(class_label, class_label_predicted, columns=labels, index=labels))
conf_df.index.name = 'True labels'

要注意的第二件事是您的分类器不能很好地预测标签。正确预测标签的数量显示在混淆矩阵的主对角线上。您在矩阵中有非零值，并且根本没有预测某些类 - 列全为零。使用默认参数运行分类器然后尝试优化它们可能是个好主意。

原文由 KRKirov 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Sci-kit 学习如何为混淆矩阵打印标签？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译