新手上路，请多包涵

下面是决策树的一个片段，因为它非常大。

当节点中的最低值低于 5 时，如何使树停止生长。这是生成决策树的代码。在 SciKit - 决策树上，我们可以看到唯一的方法是通过 min_impurity_decrease ，但我不确定它具体是如何工作的。

 import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

X, y = make_classification(n_samples=1000,
                           n_features=6,
                           n_informative=3,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
                                  'Feature 2':X[:,1],
                                  'Feature 3':X[:,2],
                                  'Feature 4':X[:,3],
                                  'Feature 5':X[:,4],
                                  'Feature 6':X[:,5],
                                  'Class':y})

y_train = df['Class']
X_train = df.drop('Class',axis = 1)

dt = DecisionTreeClassifier( random_state=42)
dt.fit(X_train, y_train)

from IPython.display import display, Image
import pydotplus
from sklearn import tree
from sklearn.tree import _tree
from sklearn import tree
import collections
import drawtree
import os

os.environ["PATH"] += os.pathsep + 'C:\\Anaconda3\\Library\\bin\\graphviz'

dot_data = tree.export_graphviz(dt, out_file = 'thisIsTheImagetree.dot',
                                 feature_names=X_train.columns, filled   = True
                                    , rounded  = True
                                    , special_characters = True)

graph = pydotplus.graph_from_dot_file('thisIsTheImagetree.dot')

thisIsTheImage = Image(graph.create_png())
display(thisIsTheImage)
#print(dt.tree_.feature)

from subprocess import check_call
check_call(['dot','-Tpng','thisIsTheImagetree.dot','-o','thisIsTheImagetree.png'])

更新

我认为 min_impurity_decrease 可以在某种程度上帮助实现目标。作为调整 min_impurity_decrease 实际上修剪树。谁能解释一下 min_impurity_decrease。

我试图理解 scikit learn 中的方程式，但我不确定 right_impurity 和 left_impurity 的值是多少。

 N = 256
N_t = 256
impurity = ??
N_t_R = 242
N_t_L = 14
right_impurity = ??
left_impurity = ??

New_Value = N_t / N * (impurity - ((N_t_R / N_t) * right_impurity)
                    - ((N_t_L / N_t) * left_impurity))
New_Value

更新 2

我们不是在特定值下修剪，而是在特定条件下修剪。例如我们确实以 ⁶⁄₄ 和 ⁵⁄₅ 拆分，但不以 ⁶⁰⁰⁰⁄₄ 或 ⁵⁰⁰⁰⁄₅ 拆分。假设一个值与其在节点中的相邻值相比是否低于某个百分比，而不是某个值。

       11/9
   /       \
  6/4       5/5
 /   \     /   \
6/0  0/4  2/2  3/3

原文由 user9238790 发布，翻译遵循 CC BY-SA 4.0 许可协议

python scikit-learn

阅读 361

2 个回答

得票最新

社区维基

发布于
2023-01-09

✓ 已被采纳

不能使用 min_impurity_decrease 或任何其他内置停止条件来直接限制叶子的最低值（特定类别的出现次数）。

我认为在不更改 scikit-learn 源代码的情况下实现此目的的唯一方法是对树进行 后修剪。为此，您只需遍历树并删除最小类数小于 5（或您能想到的任何其他条件）的节点的所有子节点。我将继续你的例子：

 from sklearn.tree._tree import TREE_LEAF

def prune_index(inner_tree, index, threshold):
    if inner_tree.value[index].min() < threshold:
        # turn node into a leaf by "unlinking" its children
        inner_tree.children_left[index] = TREE_LEAF
        inner_tree.children_right[index] = TREE_LEAF
    # if there are shildren, visit them as well
    if inner_tree.children_left[index] != TREE_LEAF:
        prune_index(inner_tree, inner_tree.children_left[index], threshold)
        prune_index(inner_tree, inner_tree.children_right[index], threshold)

print(sum(dt.tree_.children_left < 0))
# start pruning from the root
prune_index(dt.tree_, 0, 5)
sum(dt.tree_.children_left < 0)

此代码将首先打印 74 ，然后打印 91 。这意味着代码已经创建了 17 个新的叶节点（实际上是通过删除到它们祖先的链接）。这棵树，以前看起来像

现在看起来像

所以你可以看到确实减少了很多。

原文由 David Dale 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2023-01-09

编辑： 正如@SBylemans 和@Viktor 在评论中指出的那样，这是不正确的。我不会删除答案，因为其他人可能也认为这是解决方案。

将 min_samples_leaf 设置为 5。

min_samples_leaf :

一个叶节点需要的最小样本数：

更新： 我认为不能用 min_impurity_decrease 来完成。想想以下场景：

       11/9
   /         \
  6/4       5/5
 /   \     /   \
6/0  0/4  2/2  3/3

根据您的规则，您不想拆分节点 6/4 因为 4 小于 5 但您想要拆分 5/5 节点。然而，分裂 6/4 节点有0.48信息增益和分裂 5/5 有0信息增益。

原文由 Seljuk Gulcan 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

修剪决策树

更新

更新 2

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译