人工智能 - Machine Learning Algorithm Series (17) - Decision Tree Learning Algorithm - 机器学习算法系列

The background knowledge required to read this article: a little bit of programming knowledge

I. Introduction

In life, every time it’s time to eat, we will recite in our hearts—“What will we eat next?” Maybe we don’t want to go far after a long day of work. At this time, we will decide that the distance between the restaurants should not exceed 200 meters, and then Looking at the twenty dollars in my wallet, I decided that I could not eat more than twenty, and finally ordered Lanzhou Ramen. As can be seen from the above example, the Lanzhou ramen we eat today is determined by the previous series of decisions.

<center> Figure 1-1 </center>

As shown in Figure 1-1, the above decision-making process is represented by a binary tree, which is called a decision tree. In machine learning, the decision tree model shown in Figure 1-1 can also be trained through the data set. This algorithm is called Decision Tree Learning (Decision Tree Learning) ¹ .

2. Introduction to the model

Model

The decision tree learning algorithm (Decision Tree Learning), must first be a tree structure, composed of internal nodes and leaf nodes, the internal node represents a dimension (feature), and the leaf node represents a classification. The nodes are connected by certain conditions, so the decision tree can be regarded as a collection of if...else...rules.

<center> Figure 2-1 </center>

As shown in Figure 2-1, it shows a basic decision tree data structure and the decision method it contains.

Feature selection

Since a decision needs to be made, what needs to be decided is from which dimension (feature) to make the decision, such as the distance of the store and the amount of change in the wallet in the previous example. In machine learning, we need a quantitative indicator to determine the more appropriate features to use, that is, the "purity" of the obtained subset is higher after using the feature to divide. At this time, three indicators - Information Gain (Information Gain), Gini Index (Gini Index), and Mean Square Error (MSE) are introduced to solve the aforementioned problems.

Information Gain

Equation 2-1 is an indicator that represents the purity of the sample set, which is called Information Entropy, where D represents the sample set, K represents the number of classifications of the sample set, and p_k represents the proportion of the k-th sample in the sample set. The smaller the value of Ent(D), the higher the purity of the sample set.

$$ \operatorname{Ent}(D)=-\sum_{k=1}^{K} p_{k} \log _{2} p_{k} $$

Equation 2-2 represents the impact on the sample set after dividing by a discrete attribute, which is called Information Gain, where D represents the sample set, a represents the discrete attribute, and V represents the number of all possible values of the discrete attribute a, D^v represents the subsample set of the vth value in the sample set.

$$ \operatorname{Gain}(D, a)=\operatorname{Ent}(D)-\sum_{v=1}^{V} \frac{\left|D^{v}\right|}{|D|} \operatorname{Ent}\left(D^{v}\right) $$

<center> formula 2-2 </center>

When the attribute is a continuous attribute, its possible values are not as limited as the discrete attribute. At this time, the values of the continuous attribute in the sample set can be sorted and the average value of the two can be used as the dividing point. Rewrite Equation 2-2 to get As the result of formula 2-3, where T_a represents the average set, D_t^v represents the subset, when v = - represents the sample subset smaller than the average t in the sample, when v = + represents the sample greater than the average t in the sample subset For the sample subset, take the largest information gain among the division points as the information gain value of this attribute.

$$ \begin{aligned} T_{a} &=\left\{\frac{a^{i}+a^{i+1}}{2} \mid 1 \leq i \leq n-1\right\} \\ \operatorname{Gain}(D, a) &=\max _{t \in T_{a}} \operatorname{Gain}(D, a, t) \\ &=\max _{t \in T_{a}} \operatorname{Ent}(D)-\sum_{v \in\{-,+\}} \frac{\left|D_{t}^{v}\right|}{|D|} \operatorname{Ent}\left(D_{t}^{v}\right) \end{aligned} $$

The larger the value of Gain(D, a) is, the higher the purity of the sample set is after being divided by this attribute. From this, the most suitable partition attribute can be found, as shown in Equation 2-4:

$$ a_{\text {best }}=\underset{a}{\operatorname{argmax}} \operatorname{Gain}(D, a) $$

Gini Index

Equation 2-5 is another indicator that represents the purity of the sample set, which is called the Gini value (Gini), where D represents the sample set, K represents the number of classifications in the sample set, and p_k represents the proportion of the kth class of samples in the sample set. The smaller the value of Gini(D), the higher the purity of the sample set.

$$ \operatorname{Gini}(D)=1-\sum_{k=1}^{K} p_{k}^{2} $$

<center> formula 2-5 </center>

Equation 2-6 represents the impact on the sample set after dividing by a discrete attribute, which is called the Gini Index, where D represents the sample set, a represents the discrete attribute, and V represents the number of all possible values of the discrete attribute a, D^v represents the subsample set of the vth value in the sample set.

$$ \operatorname{Gini_{-}index}(D, a)=\sum_{v=1}^{V} \frac{\left|D^{v}\right|}{|D|} \operatorname{Gini}\left(D^{v}\right) $$

<center> formula 2-6 </center>

As in Equation 2-3, take the average of the two consecutive attributes as the dividing point, rewrite Equation 2-6, and get the result as in Equation 2-7, where T_a represents the average set, D_t^v represents the subset, When v = - it means the sample subset smaller than the mean t in the sample, and when v = + means the sample subset greater than the mean t in the sample, take the smallest Gini index among the dividing points as the Gini index value of this attribute.

$$ \operatorname{Gini_{-}index}(D, a)=\min _{t \in T_{a}} \sum_{v \in\{-,+\}} \frac{\left|D_{t}^{v}\right|}{|D|} \operatorname{Gini}\left(D_{t}^{v}\right) $$

<center> formula 2-7 </center>

The smaller the value of Gini_index(D, a), the higher the purity of the sample set divided by the discrete attribute. From this, the most suitable partition attribute can be found, as shown in Equation 2-8:

$$ a_{\text {best }}=\underset{a}{\operatorname{argmin}} \operatorname{Gini\_index}(D, a) $$

<center> formula 2-8 </center>

Mean Square Error (MSE)

The first two indicators make the decision tree can be used for classification problems. If the decision tree is used for regression problems, different indicators are needed to determine the characteristics of the division. This indicator is the mean square error shown in Equation 2-9 ( MSE), where T_a represents the average set, y_t^v represents the subset label, when v = - represents the sample subset label smaller than the mean t in the sample, and when v = + represents the sample subset greater than the mean t in the sample label, the latter item is the mean of the corresponding subset label.

$$ \operatorname{MSE}(D, a)=\min _{t \in T_{a}} \sum_{v \in\{-,+\}}\left(y_{t}^{v}-\hat{y_{t}^{v}}\right)^{2} $$

The smaller the value of MSE(D, a), the better the decision tree fits the sample set. From this, the most suitable partition attribute can be found, as shown in Equation 2-10:

$$ a_{\text {best }}=\underset{a}{\operatorname{argmin}} \operatorname{MSE}(D, a) $$

Knowing the data structure of the decision tree model and how to divide the best data set, then let's learn how to generate a decision tree.

3. Algorithm steps

Since the data structure of the decision tree is a tree, its child nodes must also be a tree. The decision tree can be generated recursively. The steps are as follows:

Generate a new node node;

When only one category C exists in the sample:

Mark the node node as the leaf node of category C, and return the node node;

Iterate over all features:

Calculate the information gain or Gini index or mean square error of the current feature;

The best partition feature is recorded in the node node;

After dividing according to the best feature, the left part recursively calls the current method as the left child node of node node;

After dividing according to the best feature, the right part recursively calls the current method as the right child node of the node node;

return node node;

4. Regularization

When the decision tree is generated recursively, the model will be very accurate in classifying the training data, but the performance of the unknown prediction data will not be ideal. This is the so-called overfitting phenomenon. At this time, it can be learned from the previous linear regression. As with the solution to overfitting, regularize the model.

Depth of decision tree

The regularization effect of the decision tree can be achieved by limiting the maximum depth of the decision tree to prevent the decision tree from overfitting. At this time, it is only necessary to add a parameter for recording the depth of the current recursive tree in the algorithm step. When the preset maximum depth is reached, no new child nodes will be generated, and the current node will be marked as the classification accounting for the sample. than the largest classification and exit the current recursion.

leaf size of decision tree

Another way to regularize decision trees is to limit the minimum number of samples contained in leaf nodes, which can also prevent overfitting. When the number of samples contained in the nodule, mark the current node as the category with the largest proportion of categories in the sample and exit the current recursion

Pruning of decision trees

The decision tree can also be prevented from overfitting by pruning it, cutting off redundant subtrees. There are two methods of pruning, namely prepruning and post-pruning.

Pre-pruning

As the name suggests, pre-pruning is to decide whether to generate sub-nodes when generating a decision tree. The method of judgment is to use the validation data set to compare the accuracy of generating sub-nodes and not generating sub-nodes. When the accuracy of generating sub-nodes is If it is promoted, a child node will be generated, otherwise, no child node will be generated.

<center> Figure 4-1 The picture comes from Zhou Zhihua's "Machine Learning" </center>

post pruning

Post-pruning is to generate a complete decision tree first, and then start from the leaf nodes. The same judgment method as pre-pruning is used. When the accuracy of generating sub-nodes is improved, the sub-nodes are retained, otherwise, the sub-nodes are retained. Node cut.

<center> Figure 4-2 The picture comes from Zhou Zhihua's "Machine Learning" </center>

Five, code implementation

Use Python to implement decision tree classification based on information gain:

import numpy as np

class GainNode:
    """
    分类决策树中的结点
    基于信息增益-Information Gain
    """
    
    def __init__(self, feature=None, threshold=None, gain=None, left=None, right=None):
        # 结点划分的特征下标
        self.feature = feature
        # 结点划分的临界值，当结点为叶子结点时为分类值
        self.threshold = threshold
        # 结点的信息增益值
        self.gain = gain
        # 左结点
        self.left = left
        # 右结点
        self.right = right

class GainTree:
    """
    分类决策树
    基于信息增益-Information Gain
    """
    
    def __init__(self, max_depth = None, min_samples_leaf = None):
        # 决策树最大深度
        self.max_depth = max_depth
        # 决策树叶结点最小样本数
        self.min_samples_leaf = min_samples_leaf
        
    def fit(self, X, y):
        """
        分类决策树拟合
        基于信息增益-Information Gain
        """
        y = np.array(y)
        self.root = self.buildNode(X, y, 0)
        return self
    
    def buildNode(self, X, y, depth):
        """
        构建分类决策树结点
        基于信息增益-Information Gain
        """
        node = GainNode()
        # 当没有样本时直接返回
        if len(y) == 0:
            return node
        y_classes = np.unique(y)
        # 当样本中只存在一种分类时直接返回该分类
        if len(y_classes) == 1:
            node.threshold = y_classes[0]
            return node
        # 当决策树深度达到最大深度限制时返回样本中分类占比最大的分类
        if self.max_depth is not None and depth >= self.max_depth:
            node.threshold = max(y_classes, key=y.tolist().count)
            return node
        # 当决策树叶结点样本数达到最小样本数限制时返回样本中分类占比最大的分类
        if self.min_samples_leaf is not None and len(y) <= self.min_samples_leaf:
            node.threshold = max(y_classes, key=y.tolist().count)
            return node
        max_gain = -np.inf
        max_middle = None
        max_feature = None
        # 遍历所有特征，获取信息增益最大的特征
        for i in range(X.shape[1]):
            # 计算特征的信息增益
            gain, middle = self.calcGain(X[:,i], y, y_classes)
            if max_gain < gain:
                max_gain = gain
                max_middle = middle
                max_feature = i
        # 信息增益最大的特征
        node.feature = max_feature
        # 临界值
        node.threshold = max_middle
        # 信息增益
        node.gain = max_gain
        X_lt = X[:,max_feature] < max_middle
        X_gt = X[:,max_feature] > max_middle
        # 递归处理左集合
        node.left = self.buildNode(X[X_lt,:], y[X_lt], depth + 1)
        # 递归处理右集合
        node.right = self.buildNode(X[X_gt,:], y[X_gt], depth + 1)
        return node
    
    def calcMiddle(self, x):
        """
        计算连续型特征的俩俩平均值
        """
        middle = []
        if len(x) == 0:
            return np.array(middle)
        start = x[0]
        for i in range(len(x) - 1):
            if x[i] == x[i + 1]:
                continue
            middle.append((start + x[i + 1]) / 2)
            start = x[i + 1]
        return np.array(middle)

    def calcEnt(self, y, y_classes):
        """
        计算信息熵
        """
        ent = 0
        for j in range(len(y_classes)):
            p = len(y[y == y_classes[j]])/ len(y)
            if p != 0:
                ent = ent + p * np.log2(p)
        return -ent

    def calcGain(self, x, y, y_classes):
        """
        计算信息增益
        """
        x_sort = np.sort(x)
        middle = self.calcMiddle(x_sort)
        max_middle = -np.inf
        max_gain = -np.inf
        ent = self.calcEnt(y, y_classes)
        # 遍历每个平均值
        for i in range(len(middle)):
            y_gt = y[x > middle[i]]
            y_lt = y[x < middle[i]]
            ent_gt = self.calcEnt(y_gt, y_classes)
            ent_lt = self.calcEnt(y_lt, y_classes)
            # 计算信息增益
            gain = ent - (ent_gt * len(y_gt) / len(x) + ent_lt * len(y_lt) / len(x))
            if max_gain < gain:
                max_gain = gain
                max_middle = middle[i]
        return max_gain, max_middle
    
    def predict(self, X):
        """
        分类决策树预测
        """
        y = np.zeros(X.shape[0])
        self.checkNode(X, y, self.root)
        return y
    
    def checkNode(self, X, y, node, cond = None):
        """
        通过分类决策树结点判断分类
        """
        # 当没有子结点时，直接返回当前临界值
        if node.left is None and node.right is None:
            return node.threshold
        X_lt = X[:,node.feature] < node.threshold
        if cond is not None:
            X_lt = X_lt & cond
        # 递归判断左结点
        lt = self.checkNode(X, y, node.left, X_lt)
        if lt is not None:
            y[X_lt] = lt
        X_gt = X[:,node.feature] > node.threshold
        if cond is not None:
            X_gt = X_gt & cond
        # 递归判断右结点
        gt = self.checkNode(X, y, node.right, X_gt)
        if gt is not None:
            y[X_gt] = gt

Use Python to implement decision tree classification based on Gini index:

import numpy as np

class GiniNode:
    """
    分类决策树中的结点
    基于基尼指数-Gini Index
    """
    
    def __init__(self, feature=None, threshold=None, gini_index=None, left=None, right=None):
        # 结点划分的特征下标
        self.feature = feature
        # 结点划分的临界值，当结点为叶子结点时为分类值
        self.threshold = threshold
        # 结点的基尼指数值
        self.gini_index = gini_index
        # 左结点
        self.left = left
        # 右结点
        self.right = right

class GiniTree:
    """
    分类决策树
    基于基尼指数-Gini Index
    """
    
    def __init__(self, max_depth = None, min_samples_leaf = None):
        # 决策树最大深度
        self.max_depth = max_depth
        # 决策树叶结点最小样本数
        self.min_samples_leaf = min_samples_leaf
        
    def fit(self, X, y):
        """
        分类决策树拟合
        基于基尼指数-Gini Index
        """
        y = np.array(y)
        self.root = self.buildNode(X, y, 0)
        return self

    def buildNode(self, X, y, depth):
        """
        构建分类决策树结点
        基于基尼指数-Gini Index
        """
        node = GiniNode()
        # 当没有样本时直接返回
        if len(y) == 0:
            return node
        y_classes = np.unique(y)
        # 当样本中只存在一种分类时直接返回该分类
        if len(y_classes) == 1:
            node.threshold = y_classes[0]
            return node
        # 当决策树深度达到最大深度限制时返回样本中分类占比最大的分类
        if self.max_depth is not None and depth >= self.max_depth:
            node.threshold = max(y_classes, key=y.tolist().count)
            return node
        # 当决策树叶结点样本数达到最小样本数限制时返回样本中分类占比最大的分类
        if self.min_samples_leaf is not None and len(y) <= self.min_samples_leaf:
            node.threshold = max(y_classes, key=y.tolist().count)
            return node
        min_gini_index = np.inf
        min_middle = None
        min_feature = None
        # 遍历所有特征，获取基尼指数最小的特征
        for i in range(X.shape[1]):
            # 计算特征的基尼指数
            gini_index, middle = self.calcGiniIndex(X[:,i], y, y_classes)
            if min_gini_index > gini_index:
                min_gini_index = gini_index
                min_middle = middle
                min_feature = i
        # 基尼指数最小的特征
        node.feature = min_feature
        # 临界值
        node.threshold = min_middle
        # 基尼指数
        node.gini_index = min_gini_index
        X_lt = X[:,min_feature] < min_middle
        X_gt = X[:,min_feature] > min_middle
        # 递归处理左集合
        node.left = self.buildNode(X[X_lt,:], y[X_lt], depth + 1)
        # 递归处理右集合
        node.right = self.buildNode(X[X_gt,:], y[X_gt], depth + 1)
        return node
    
    def calcMiddle(self, x):
        """
        计算连续型特征的俩俩平均值
        """
        middle = []
        if len(x) == 0:
            return np.array(middle)
        start = x[0]
        for i in range(len(x) - 1):
            if x[i] == x[i + 1]:
                continue
            middle.append((start + x[i + 1]) / 2)
            start = x[i + 1]
        return np.array(middle)

    def calcGiniIndex(self, x, y, y_classes):
        """
        计算基尼指数
        """
        x_sort = np.sort(x)
        middle = self.calcMiddle(x_sort)
        min_middle = np.inf
        min_gini_index = np.inf
        for i in range(len(middle)):
            y_gt = y[x > middle[i]]
            y_lt = y[x < middle[i]]
            gini_gt = self.calcGini(y_gt, y_classes)
            gini_lt = self.calcGini(y_lt, y_classes)
            gini_index = gini_gt * len(y_gt) / len(x) + gini_lt * len(y_lt) / len(x)
            if min_gini_index > gini_index:
                min_gini_index = gini_index
                min_middle = middle[i]
        return min_gini_index, min_middle

    def calcGini(self, y, y_classes):
        """
        计算基尼值
        """
        gini = 1
        for j in range(len(y_classes)):
            p = len(y[y == y_classes[j]])/ len(y)
            gini = gini - p * p
        return gini
    
    def predict(self, X):
        """
        分类决策树预测
        """
        y = np.zeros(X.shape[0])
        self.checkNode(X, y, self.root)
        return y
    
    def checkNode(self, X, y, node, cond = None):
        """
        通过分类决策树结点判断分类
        """
        if node.left is None and node.right is None:
            return node.threshold
        X_lt = X[:,node.feature] < node.threshold
        if cond is not None:
            X_lt = X_lt & cond
        lt = self.checkNode(X, y, node.left, X_lt)
        if lt is not None:
            y[X_lt] = lt
        X_gt = X[:,node.feature] > node.threshold
        if cond is not None:
            X_gt = X_gt & cond
        gt = self.checkNode(X, y, node.right, X_gt)
        if gt is not None:
            y[X_gt] = gt

Use Python to implement mean square error based decision tree regression:

import numpy as np

class RegressorNode:
    """
    回归决策树中的结点
    """
    
    def __init__(self, feature=None, threshold=None, mse=None, left=None, right=None):
        # 结点划分的特征下标
        self.feature = feature
        # 结点划分的临界值，当结点为叶子结点时为分类值
        self.threshold = threshold
        # 结点的均方差值
        self.mse = mse
        # 左结点
        self.left = left
        # 右结点
        self.right = right

class RegressorTree:
    """
    回归决策树
    """
    
    def __init__(self, max_depth = None, min_samples_leaf = None):
        # 决策树最大深度
        self.max_depth = max_depth
        # 决策树叶结点最小样本数
        self.min_samples_leaf = min_samples_leaf
        
    def fit(self, X, y):
        """
        回归决策树拟合
        """
        self.root = self.buildNode(X, y, 0)
        return self

    def buildNode(self, X, y, depth):
        """
        构建回归决策树结点
        """
        node = RegressorNode()
        # 当没有样本时直接返回
        if len(y) == 0:
            return node
        y_classes = np.unique(y)
        # 当样本中只存在一种分类时直接返回该分类
        if len(y_classes) == 1:
            node.threshold = y_classes[0]
            return node
        # 当决策树深度达到最大深度限制时返回样本中分类占比最大的分类
        if self.max_depth is not None and depth >= self.max_depth:
            node.threshold = np.average(y)
            return node
        # 当决策树叶结点样本数达到最小样本数限制时返回样本中分类占比最大的分类
        if self.min_samples_leaf is not None and len(y) <= self.min_samples_leaf:
            node.threshold = np.average(y)
            return node
        min_mse = np.inf
        min_middle = None
        min_feature = None
        # 遍历所有特征，获取均方差最小的特征
        for i in range(X.shape[1]):
            # 计算特征的均方差
            mse, middle = self.calcMse(X[:,i], y)
            if min_mse > mse:
                min_mse = mse
                min_middle = middle
                min_feature = i
        # 均方差最小的特征
        node.feature = min_feature
        # 临界值
        node.threshold = min_middle
        # 均方差
        node.mse = min_mse
        X_lt = X[:,min_feature] < min_middle
        X_gt = X[:,min_feature] > min_middle
        # 递归处理左集合
        node.left = self.buildNode(X[X_lt,:], y[X_lt], depth + 1)
        # 递归处理右集合
        node.right = self.buildNode(X[X_gt,:], y[X_gt], depth + 1)
        return node
    
    def calcMiddle(self, x):
        """
        计算连续型特征的俩俩平均值
        """
        middle = []
        if len(x) == 0:
            return np.array(middle)
        start = x[0]
        for i in range(len(x) - 1):
            if x[i] == x[i + 1]:
                continue
            middle.append((start + x[i + 1]) / 2)
            start = x[i + 1]
        return np.array(middle)

    def calcMse(self, x, y):
        """
        计算均方差
        """
        x_sort = np.sort(x)
        middle = self.calcMiddle(x_sort)
        min_middle = np.inf
        min_mse = np.inf
        for i in range(len(middle)):
            y_gt = y[x > middle[i]]
            y_lt = y[x < middle[i]]
            avg_gt = np.average(y_gt)
            avg_lt = np.average(y_lt)
            mse = np.sum((y_lt - avg_lt) ** 2) + np.sum((y_gt - avg_gt) ** 2)
            if min_mse > mse:
                min_mse = mse
                min_middle = middle[i]
        return min_mse, min_middle
    
    def predict(self, X):
        """
        回归决策树预测
        """
        y = np.zeros(X.shape[0])
        self.checkNode(X, y, self.root)
        return y
    
    def checkNode(self, X, y, node, cond = None):
        """
        通过回归决策树结点判断分类
        """
        if node.left is None and node.right is None:
            return node.threshold
        X_lt = X[:,node.feature] < node.threshold
        if cond is not None:
            X_lt = X_lt & cond
        lt = self.checkNode(X, y, node.left, X_lt)
        if lt is not None:
            y[X_lt] = lt
        X_gt = X[:,node.feature] > node.threshold
        if cond is not None:
            X_gt = X_gt & cond
        gt = self.checkNode(X, y, node.right, X_gt)
        if gt is not None:
            y[X_gt] = gt

6. Third-party library implementation

scikit-learn ² decision tree classification implementation

from sklearn import tree

# 决策树分类
clf = tree.DecisionTreeClassifier()
# 拟合数据
clf = clf.fit(X, y)

scikit-learn ³ decision tree regression implementation

from sklearn import tree

# 决策树回归
clf = tree.DecisionTreeRegressor()
# 拟合数据
clf = clf.fit(X, y)

Seven, animation demonstration

Figure 7-1 shows the classification result of an unregularized decision tree, and Figure 7-2 shows the classification result of a regularized decision tree (max_depth = 3, min_samples_leaf = 5)

<center> Figure 7-1 </center>

<center> Figure 7-2 </center>

Figure 7-3 shows the regression results of an unregularized decision tree, and Figure 7-4 shows the regression results of a regularized decision tree (max_depth = 3, min_samples_leaf = 5)

<center> Figure 7-3 </center>

<center> Figure 7-4 </center>

It can be seen that the decision tree without regularization obviously overfits the training data set, and the decision tree after regularization is relatively better.

Eight, mind map

<center> Figure 8-1 </center>

9. References

Full demo please click here

Note: This article strives to be accurate and easy to understand, but because the author is also a beginner and has limited level, if there are errors or omissions in the text, I urge readers to criticize and correct by leaving a message

This article was first published in - AI map , welcome to pay attention

Machine Learning Algorithm Series (17) - Decision Tree Learning Algorithm

I. Introduction

2. Introduction to the model

Model

Feature selection

Information Gain

Gini Index

Mean Square Error (MSE)

3. Algorithm steps

4. Regularization

Depth of decision tree

leaf size of decision tree

Pruning of decision trees

Pre-pruning

post pruning

Five, code implementation

6. Third-party library implementation

Seven, animation demonstration

Eight, mind map

9. References

Saisimon

引用和评论

机器学习算法系列（二十）-梯度提升决策树算法（Gradient Boosted Decision Trees / GBDT）

LRU算法，你别跑，我就要吃透你

人工智能与机器学习入门：基尼系数（Gini Index）和基于熵（Entropy）

Open WebUI：开源AI交互平台的全面解析

大模型中的Token究竟是什么？从原理到作用深度解析

一文掌握 MCP 上下文协议：从理论到实践

人工智能与机器学习入门：决策树应用