在 scikit-learn 中，GradientBoostingRegressor 和 GradientBoostingClassifier 是对梯度提升树算法的具体实现。它们提供了一系列用于控制模型初始化、训练过程以及预测行为的参数设置，帮助我们灵活地调整模型性能与泛化能力。下来，我们来看看这些类中一些最关键的参数，以及它们的作用。

class GradientBoostingClassifier(ClassifierMixin, BaseGradientBoosting):
    def __init__(
        self,
        *,
        loss="log_loss",
        learning_rate=0.1,
        n_estimators=100,
        subsample=1.0,
        criterion="friedman_mse",
        min_samples_split=2,
        min_samples_leaf=1,
        min_weight_fraction_leaf=0.0,
        max_depth=3,
        min_impurity_decrease=0.0,
        init=None,
        random_state=None,
        max_features=None,
        verbose=0,
        max_leaf_nodes=None,
        warm_start=False,
        validation_fraction=0.1,
        n_iter_no_change=None,
        tol=1e-4,
        ccp_alpha=0.0,
    ):


class GradientBoostingRegressor(RegressorMixin, BaseGradientBoosting):
    def __init__(
        self,
        *,
        loss="squared_error",
        learning_rate=0.1,
        n_estimators=100,
        subsample=1.0,
        criterion="friedman_mse",
        min_samples_split=2,
        min_samples_leaf=1,
        min_weight_fraction_leaf=0.0,
        max_depth=3,
        min_impurity_decrease=0.0,
        init=None,
        random_state=None,
        max_features=None,
        alpha=0.9,
        verbose=0,
        max_leaf_nodes=None,
        warm_start=False,
        validation_fraction=0.1,
        n_iter_no_change=None,
        tol=1e-4,
        ccp_alpha=0.0,
    ):

1. 梯度提升树全局参数

参数	说明
loss	损失函数类型：分类：`log_loss`、`exponential` 回归：`squared_error`
learning_rate	学习率，用于缩小每棵树的贡献
n_estimators	提升迭代次数(弱学习器的数量)
subsample	每次迭代训练时使用的样本比例
init	初始化模型(可为 `zero` 或自定义基模型)
validation_fraction	早停时用于验证的样本比例
n_iter_no_change	连续若干轮未改进后停止训练
tol	提前停止的容差阈值
warm_start	是否在已有模型基础上继续训练
random_state	控制随机性(特征打乱、样本抽样、验证集划分等)
verbose	输出训练过程信息的详细程度

2. 单个决策树构建参数

参数	说明
criterion	分裂质量评估指标： `friedman_mse` 或 `squared_error`
max_depth	决策树的最大深度，控制树的复杂度
min_samples_split	内部节点再划分所需的最小样本数
min_samples_leaf	叶子节点所需的最小样本数
min_weight_fraction_leaf	叶子节点所需的样本权重总和的最小比例
min_impurity_decrease	节点划分时，要求 `impurity` 至少下降的最小值
max_features	寻找最优划分时考虑的特征数量
max_leaf_nodes	限制每棵树的最大叶子节点数
ccp_alpha	代价复杂度剪枝

《GBDT 梯度提升树》使用详解

1. 梯度提升树全局参数

2. 单个决策树构建参数

取消回复

文章目录

《GBDT 梯度提升树》