精华内容
下载资源
问答
  • lgb.cv函数使用方法

    千次阅读 2020-09-22 11:42:13
    train_set 训练集。 nfold n折交叉验证。 metrics 评价标准。 num_boost_round 最大迭代次数。 early_stopping_rounds 早停轮数。 verbose_eval 每间隔n个迭代就显示一次进度 stratified 默认True,是否采用分层抽样...

    性感官方,在线文档
    只写常用的参数

    参数

    params 基学习器的参数。
    train_set 训练集。
    nfold n折交叉验证。
    metrics 评价标准。
    num_boost_round 最大迭代次数。
    early_stopping_rounds 早停轮数。
    verbose_eval 每间隔n个迭代就显示一次进度
    stratified 默认True,是否采用分层抽样,建议采用。
    shuffle 默认True,是否洗牌,不建议采用。
    seed 相当于random_state

    param需要填写的参数

    链接
    objective树的类型。回归:regression;二分类:binary;多分类:multiclass;排序等。
    boosting gbdt,rf,dart。
    n_jobs
    learning_rate
    num_leaves
    max_depth
    subsample
    colsample_bytree

    返回值

    返回值是一个字典,show一下python小白都会用,一般用到的方法有。
    len(cv_results[‘l2-mean’]) 确定基学习器的数量。
    cv_results[‘l2-mean’][-1] 确定最后得分

    展开全文
  • 机器学习应用之LGBM详解

    千次阅读 2021-02-04 04:45:03
    内置方式建模要把数据读取成Dataset格式lgb.train去训练# coding: utf-8import jsonimport lightgbm as lgbimport pandas as pdfrom sklearn.metrics import mean_squared_error# 加载数据集合print('加载数据...')...

    内置方式建模要把数据读取成Dataset格式

    lgb.train去训练

    # coding: utf-8

    import json

    import lightgbm as lgb

    import pandas as pd

    from sklearn.metrics import mean_squared_error

    # 加载数据集合

    print('加载数据...')

    df_train = pd.read_csv('../data/regression.train.txt', header=None, sep='\t')

    df_test = pd.read_csv('../data/regression.test.txt', header=None, sep='\t')

    # 设定训练集和测试集

    y_train = df_train[0].values

    y_test = df_test[0].values

    X_train = df_train.drop(0, axis=1).values

    X_test = df_test.drop(0, axis=1).values

    # 构建lgb中的Dataset格式

    lgb_train = lgb.Dataset(X_train, y_train)

    lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

    # 敲定好一组参数

    params = {

    'task': 'train',

    'boosting_type': 'gbdt',

    'objective': 'regression',

    'metric': {'l2', 'auc'},

    'num_leaves': 31,

    'learning_rate': 0.05,

    'feature_fraction': 0.9,

    'bagging_fraction': 0.8,

    'bagging_freq': 5,

    'verbose': 0

    }

    print('开始训练...')

    # 训练

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=20,

    valid_sets=lgb_eval,

    early_stopping_rounds=5)

    # 保存模型

    print('保存模型...')

    # 保存模型到文件中

    gbm.save_model('../../tmp/model.txt')

    print('开始预测...')

    # 预测

    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)

    # 评估

    print('预估结果的rmse为:')

    print(mean_squared_error(y_test, y_pred) ** 0.5)添加样本权重训练

    # coding: utf-8

    import json

    import lightgbm as lgb

    import pandas as pd

    import numpy as np

    from sklearn.metrics import mean_squared_error

    import warnings

    warnings.filterwarnings("ignore")

    # 加载数据集

    print('加载数据...')

    df_train = pd.read_csv('../data/binary.train', header=None, sep='\t')

    df_test = pd.read_csv('../data/binary.test', header=None, sep='\t')

    W_train = pd.read_csv('../data/binary.train.weight', header=None)[0]

    W_test = pd.read_csv('../data/binary.test.weight', header=None)[0]

    y_train = df_train[0].values

    y_test = df_test[0].values

    X_train = df_train.drop(0, axis=1).values

    X_test = df_test.drop(0, axis=1).values

    num_train, num_feature = X_train.shape

    # 加载数据的同时加载权重

    lgb_train = lgb.Dataset(X_train, y_train,

    weight=W_train, free_raw_data=False)

    lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train,

    weight=W_test, free_raw_data=False)

    # 设定参数

    params = {

    'boosting_type': 'gbdt',

    'objective': 'binary',

    'metric': 'binary_logloss',

    'num_leaves': 31,

    'learning_rate': 0.05,

    'feature_fraction': 0.9,

    'bagging_fraction': 0.8,

    'bagging_freq': 5,

    'verbose': 0

    }

    # 产出特征名称

    feature_name = ['feature_' + str(col) for col in range(num_feature)]

    print('开始训练...')

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=10,

    valid_sets=lgb_train, # 评估训练集

    feature_name=feature_name,

    categorical_feature=[21])模型的载入与预测

    # 查看特征名称

    print('完成10轮训练...')

    print('第7个特征为:')

    print(repr(lgb_train.feature_name[6]))

    # 存储模型

    gbm.save_model('../../tmp/lgb_model.txt')

    # 特征名称

    print('特征名称:')

    print(gbm.feature_name())

    # 特征重要度

    print('特征重要度:')

    print(list(gbm.feature_importance()))

    # 加载模型

    print('加载模型用于预测')

    bst = lgb.Booster(model_file='../../tmp/lgb_model.txt')

    # 预测

    y_pred = bst.predict(X_test)

    # 在测试集评估效果

    print('在测试集上的rmse为:')

    print(mean_squared_error(y_test, y_pred) ** 0.5)接着之前的模型继续训练

    # 继续训练

    # 从../../tmp/model.txt中加载模型初始化

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=10,

    init_model='../../tmp/lgb_model.txt',

    valid_sets=lgb_eval)

    print('以旧模型为初始化,完成第 10-20 轮训练...')

    # 在训练的过程中调整超参数

    # 比如这里调整的是学习率

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=10,

    init_model=gbm,

    learning_rates=lambda iter: 0.05 * (0.99 ** iter),

    valid_sets=lgb_eval)

    print('逐步调整学习率完成第 20-30 轮训练...')

    # 调整其他超参数

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=10,

    init_model=gbm,

    valid_sets=lgb_eval,

    callbacks=[lgb.reset_parameter(bagging_fraction=[0.7] * 5 + [0.6] * 5)])

    print('逐步调整bagging比率完成第 30-40 轮训练...')自定义损失函数

    # 类似在xgboost中的形式

    # 自定义损失函数需要

    def loglikelood(preds, train_data):

    labels = train_data.get_label()

    preds = 1. / (1. + np.exp(-preds))

    grad = preds - labels

    hess = preds * (1. - preds)

    return grad, hess

    # 自定义评估函数

    def binary_error(preds, train_data):

    labels = train_data.get_label()

    return 'error', np.mean(labels != (preds > 0.5)), False

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=10,

    init_model=gbm,

    fobj=loglikelood,

    feval=binary_error,

    valid_sets=lgb_eval)

    print('用自定义的损失函数与评估标准完成第40-50轮...')sklearn与LightGBM配合使用

    LightGBM建模,sklearn评估

    # coding: utf-8

    import lightgbm as lgb

    import pandas as pd

    from sklearn.metrics import mean_squared_error

    from sklearn.model_selection import GridSearchCV

    # 加载数据

    print('加载数据...')

    df_train = pd.read_csv('../data/regression.train.txt', header=None, sep='\t')

    df_test = pd.read_csv('../data/regression.test.txt', header=None, sep='\t')

    # 取出特征和标签

    y_train = df_train[0].values

    y_test = df_test[0].values

    X_train = df_train.drop(0, axis=1).values

    X_test = df_test.drop(0, axis=1).values

    print('开始训练...')

    # 直接初始化LGBMRegressor

    # 这个LightGBM的Regressor和sklearn中其他Regressor基本是一致的

    gbm = lgb.LGBMRegressor(objective='regression',

    num_leaves=31,

    learning_rate=0.05,

    n_estimators=20)

    # 使用fit函数拟合

    gbm.fit(X_train, y_train,

    eval_set=[(X_test, y_test)],

    eval_metric='l1',

    early_stopping_rounds=5)

    # 预测

    print('开始预测...')

    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration_)

    # 评估预测结果

    print('预测结果的rmse是:')

    print(mean_squared_error(y_test, y_pred) ** 0.5)网格搜索查找最优超参数

    # 配合scikit-learn的网格搜索交叉验证选择最优超参数

    estimator = lgb.LGBMRegressor(num_leaves=31)

    param_grid = {

    'learning_rate': [0.01, 0.1, 1],

    'n_estimators': [20, 40]

    }

    gbm = GridSearchCV(estimator, param_grid)

    gbm.fit(X_train, y_train)

    print('用网格搜索找到的最优超参数为:')

    print(gbm.best_params_)

    绘图解释

    %matplotlib inline

    # coding: utf-8

    import lightgbm as lgb

    import pandas as pd

    try:

    import matplotlib.pyplot as plt

    except ImportError:

    raise ImportError('You need to install matplotlib for plotting.')

    # 加载数据集

    print('加载数据...')

    df_train = pd.read_csv('../data/regression.train.txt', header=None, sep='\t')

    df_test = pd.read_csv('../data/regression.test.txt', header=None, sep='\t')

    # 取出特征和标签

    y_train = df_train[0].values

    y_test = df_test[0].values

    X_train = df_train.drop(0, axis=1).values

    X_test = df_test.drop(0, axis=1).values

    # 构建lgb中的Dataset数据格式

    lgb_train = lgb.Dataset(X_train, y_train)

    lgb_test = lgb.Dataset(X_test, y_test, reference=lgb_train)

    # 设定参数

    params = {

    'num_leaves': 5,

    'metric': ('l1', 'l2'),

    'verbose': 0

    }

    evals_result = {} # to record eval results for plotting

    print('开始训练...')

    # 训练

    gbm = lgb.train(params,

    lgb_train,

    num_boost_round=100,

    valid_sets=[lgb_train, lgb_test],

    feature_name=['f' + str(i + 1) for i in range(28)],

    categorical_feature=[21],

    evals_result=evals_result,

    verbose_eval=10)

    print('在训练过程中绘图...')

    ax = lgb.plot_metric(evals_result, metric='l1')

    plt.show()

    print('画出特征重要度...')

    ax = lgb.plot_importance(gbm, max_num_features=10)

    plt.show()

    print('画出第84颗树...')

    ax = lgb.plot_tree(gbm, tree_index=83, figsize=(20, 8), show_info=['split_gain'])

    plt.show()

    #print('用graphviz画出第84颗树...')

    #graph = lgb.create_tree_digraph(gbm, tree_index=83, name='Tree84')

    #graph.render(view=True)

    展开全文
  • 先跑1.main_train_lmh.ipynb再跑1.main_test_lmh.ipynb models文件里是放训练好的lgb(也可以别的)模型的 功能里是放计算好的特征的 results提交的结果会放到这里,提交前打开答案,替换全部“为空就可以直
  • LGB、XGB、CBT参数

    千次阅读 2019-10-11 22:09:41
    train_data = lgb.Dataset(data, label=label, feature_name=['c1', 'c2', 'c3'], categorical_feature=['c3'], weight=w ) LightGBM 可以直接使用 categorical features(分类特征)作为 input(输入). 它不需要...

    LGB:
    1、lgb.Dataset()

    train_data = lgb.Dataset(data, label=label, feature_name=['c1', 'c2', 'c3'], categorical_feature=['c3'], weight=w )
    
    LightGBM 可以直接使用 categorical features(分类特征)作为 input(输入). 它不需要被转换成 one-hot coding(独热编码), 并且它比 one-hot coding(独热编码)更快(约快上 8 倍)
    
    注意: 在你构造 Dataset 之前, 你应该将分类特征转换为 int 类型的值.
    
    需要时可以设置权重weight,也可用函数train_data.set_weight(w)
    

    2、交叉验证cv

    num_round = 10

    lgb.cv(param, train_data, num_round, nfold=5)

    3、

    #创建成lgb特征的数据集格式

    lgb_train = lgb.Dataset(X_train, y_train) # 将数据保存到LightGBM二进制文件将使加载更快

    lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train) # 创建验证数据

    #将参数写成字典下形式

    params = {

    'task': 'train',
    
    'boosting_type': 'gbdt',  # 设置提升类型
    
    'objective': 'regression', # 目标函数
    
    'metric': {'l2', 'auc'},  # 评估函数
    
    'num_leaves': 31,   # 叶子节点数
    
    'learning_rate': 0.05,  # 学习速率
    
    'feature_fraction': 0.9, # 建树的特征选择比例
    
    'bagging_fraction': 0.8, # 建树的样本采样比例
    
    'bagging_freq': 5,  # k 意味着每 k 次迭代执行bagging
    
    'verbose': 1 # <0 显示致命的, =0 显示错误 (警告), >0 显示信息
    

    }

    print(‘Start training…’)

    #训练 cv and train

    gbm = lgb.train(params,lgb_train,num_boost_round=20,valid_sets=lgb_eval,early_stopping_rounds=5) # 训练数据需要参数列表和数据集

    print(‘Save model…’)

    gbm.save_model(‘model.txt’) # 训练后保存模型到文件

    print(‘Start predicting…’)

    #预测数据集

    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) #如果在训练期间启用了早期停止,可以通过best_iteration方式从最佳迭代中获得预测

    #评估模型

    print(‘The rmse of prediction is:’, mean_squared_error(y_test, y_pred) ** 0.5) # 计算真实值和预测值之间的均方根误差

    #读模型数据
    import pandas as pd
    df = pd.read('data.csv')
    
    #剔除模型无关变量和完全线性相关的变量
    
    #写入Train_X,test_X,train_y,test_y
    from sklearn.cross_validation import train_test_split
    train_X, test_X, train_y, test_y = train_test_split(X, Y, train_size = 0.8, random_state = 123)
     
    import lightgbm as lgb
    lgb_train = lgb.Dataset(train_X, train_y, free_raw_data=False)
    lgb_eval = lgb.Dataset(test_X, test_y, reference=lgb_train,free_raw_data=False)
     
    param = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': {'l2', 'auc'},
    'num_leaves': 40,
    'learning_rate': 0.01,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbose': 0
    }
    param['is_unbalance']='true'
    param['metric'] = 'auc'
    
    bst=lgb.cv(param,lgb_train, num_boost_round=1000, nfold=6, early_stopping_rounds=100)
    estimators = lgb.train(param,lgb_train,num_boost_round=len(bst['auc-mean']))
     
    print('Start training...')
     
     
    y_pred = estimators.predict(test_X, num_iteration=estimators.best_iteration)
     
    from sklearn import metrics
    print('The roc of prediction is:', metrics.roc_auc_score(test_y, y_pred) )
     
    print('Feature names:', estimators.feature_name())
    print('Feature importances:', list(estimators.feature_importance()))
    

    XGB

    def XGB_predict(train_x,train_y,val_X,val_Y,test_x,res):
        print("XGB test")
        # create dataset for lightgbm
    
        xgb_val = xgb.DMatrix(val_X, label=val_Y)
        xgb_train = xgb.DMatrix(X_train, label=y_train)
        xgb_test = xgb.DMatrix(test_x)
        # specify your configurations as a dict
        params = {
                  'booster': 'gbtree',
                  # 'objective': 'multi:softmax', # 多分类的问题、
                  # 'objective': 'multi:softprob', # 多分类概率
                  'objective': 'binary:logistic',
                  'eval_metric': 'auc',
                  # 'num_class': 9, # 类别数,与 multisoftmax 并用
                  'gamma': 0.1, # 用于控制是否后剪枝的参数,越大越保守,一般0.1、0.2这样子。
                  'max_depth': 8, # 构建树的深度,越大越容易过拟合
                  'alpha': 0, # L1正则化系数
                  'lambda': 10, # 控制模型复杂度的权重值的L2正则化项参数,参数越大,模型越不容易过拟合。
                  'subsample': 0.7, # 随机采样训练样本
                  'colsample_bytree': 0.5, # 生成树时进行的列采样
                  'min_child_weight': 3,
                  # 这个参数默认是 1,是每个叶子里面 h 的和至少是多少,对正负样本不均衡时的 0-1 分类而言
                  # ,假设 h 在 0.01 附近,min_child_weight 为 1 意味着叶子节点中最少需要包含 100 个样本。
                  # 这个参数非常影响结果,控制叶子节点中二阶导的和的最小值,该参数值越小,越容易 overfitting。
                  'silent': 0, # 设置成1则没有运行信息输出,最好是设置为0.
                  'eta': 0.03, # 如同学习率
                  'seed': 1000,
                  'nthread': -1, # cpu 线程数
                  'missing': 1,
                  'scale_pos_weight': (np.sum(y==0)/np.sum(y==1)) # 用来处理正负样本不均衡的问题,通常取:sum(negative cases) / sum(positive cases)
                  # 'eval_metric': 'auc'
                  }
    
        plst = list(params.items())
        num_rounds = 5000 # 迭代次数
        watchlist = [(xgb_train, 'train'), (xgb_val, 'val')]
        # 交叉验证
        # result = xgb.cv(plst, xgb_train, num_boost_round=200, nfold=4, early_stopping_rounds=200, verbose_eval=True, folds=StratifiedKFold(n_splits=4).split(X, y))
        # 训练模型并保存
        # early_stopping_rounds 当设置的迭代次数较大时,early_stopping_rounds 可在一定的迭代次数内准确率没有提升就停止训练
        model = xgb.train(plst, xgb_train, num_rounds, watchlist, early_stopping_rounds=200)
        res['score'] = model.predict(xgb_test)
        res['score'] = res['score'].apply(lambda x: float('%.6f' % x))
        return res
    

    CBT

    展开全文
  • lgb交叉验证

    2021-07-01 16:28:06
    def Model_lgb(train,test,features,target,params,n_estimators): X_train = train[features].values y_train = train[target].values X_test = test[features].values folds = KFold(n_splits=5, shuffle=...
    def Model_lgb(train,test,features,target,params,n_estimators):
        X_train = train[features].values
        y_train = train[target].values
        X_test = test[features].values
    
        folds = KFold(n_splits=5, shuffle=True, random_state=1)
        oof_lgb = np.zeros((X_train.shape[0]))
        predictions = np.zeros((len(test)))
        lgb_importance = []
    
        for fold_, (trn_idx, val_idx) in enumerate(folds.split(X_train, y_train)):
            print("fold n°{}".format(fold_+1))
            print(X_train[trn_idx].shape,type(X_train))
            x_tr,y_tr = X_train[trn_idx],y_train[trn_idx]
            x_va,y_va = X_train[val_idx],y_train[val_idx]
    
            trn_data = lgb.Dataset(x_tr, y_tr)
            val_data = lgb.Dataset(x_va, y_va)
            clf = lgb.train(params, trn_data, num_boost_round = n_estimators, valid_sets = [trn_data, val_data], verbose_eval=1, early_stopping_rounds = 50)#,
            oof_lgb[val_idx] = clf.predict(x_va, num_iteration=clf.best_iteration)
            lgb_importance.append(clf.feature_importance())
            predictions += clf.predict(X_test, num_iteration=clf.best_iteration) / folds.n_splits
        print('valid_ACC = {}'.format(abs(y_train-oof_lgb).mean()))
        gc.collect()
        return predictions,lgb_importance,oof_lgb
    
    展开全文
  •  header=None)[0] y_train = df_train[0] y_test = df_test[0] X_train = df_train.drop(0, axis=1) X_test = df_test.drop(0, axis=1) num_train, num_feature = X_train.shape # create dataset ...
  • 这篇文章主要将上一篇文章中的 lgb 训练函数列出来,上一篇主要详细讲解预处理和后处理。 import lightgbm as lgb import numpy as np ...def base_train(x_train, y_train, x_test, y_test, cate_cols=..
  • lgb参数

    千次阅读 2019-11-21 00:27:19
     model = lgb.train(params,  train_set=d_train,  valid_sets=watchlist,  verbose_eval=10) # num_iterations, default=100, type=int, alias=num_iteration, num_tree, num_trees, num_round, num_rounds, ...
  • lgb多分类参数设置

    千次阅读 2019-09-05 13:13:16
    版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。 本文链接:...数据 train_x, test_x, train_y, test_y = train_test_split(data, target, shuffle = ...
  • 主要写一写lgb得基础和怎么用 lgb网格调参怎么用: import pandas as pd # 数据科学计算工具 import numpy as np # 数值计算工具 import matplotlib.pyplot as plt # 可视化 import seaborn as sns # matplotlib...
  • lgb调参

    2020-08-13 11:28:07
    lgb.LGBMRegressor为例 model_lgb = lgb.LGBMRegressor(objective='regression', max_depth = 3, learning_rate=0.1, n_estimators=3938, metric='rmse', bagging_fraction = 0.8,
  • Lgb模型搭建

    千次阅读 2018-10-31 20:35:10
    dataset = pd.read_csv("./data/train.csv") # 训练集 d_x = dataset.iloc[:, 2:].values d_y = dataset['type'].values dataset_future = pd.read_csv("./data/test.csv") # 测试集(用于在线提交结果) d_...
  • 点击 机器学习算法与Python学习 ,选择加星标精彩内容不迷路来源:Microstrong,编辑:AI有道1. LightGBM简介GBDT (Gradient Boosting De...
  • LGB + K-fold Cross Validation 用法小记

    千次阅读 2019-02-12 13:36:43
    import lightgbm as lgb from sklearn.model_selection import StratifiedKFold # 假设这里准备好了训练数据train_data,它是一个pandas的dataframe,包括特征列和score列 train_label = tra...
  • lgb使用方法与调参

    千次阅读 2020-07-19 13:28:54
    本文参考了lgb中文文档,和lgb调参笔记。 import lightgbm as lgb lgb参数 *type 1 不需要调试的参数 * boosting_type 1.gbdt 2.rf objective 1.regression 回归 2.binary 二分类(01) 3.multiclass 多...lgb_train
  • 天池&DataWhale:Task04建模与调参 项目地址:... 建模之前的预操作 from sklearn.model_selection import KFold # 分离数据集,方便进行交叉验证 X_train = data.loc[data['sample']=='train', :...
  • xgb lgb 自定义评价函数差别

    千次阅读 2019-04-23 20:42:35
    关于xgb lgb自定义评价函数,其实是区别的, 但步骤都是一样的 XGB: #自定义评价函数---适用于XGBClassifier #preds是预测结果概率-但是需要转换成label #dtrain是xgb的矩阵,使用get_label() 可获取到真实的label ...
  • 【day4-建模与调参】

    2020-09-22 21:36:36
    目录 1.简介 2.学习目标 3.内容介绍 4.模型相关原理介绍 5.模型对比与性能评估 1.逻辑回归 2.决策树模型 ...此部分为零基础入门金融风控的 Task4 建模调参部分,了解各种模型以及模型的评价和调参策
  • lgb_train = lgb.Dataset(X_train, y_trainlgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)  # specify your configurations as a dict  params = {   'boosting_...
  • lgb_train = lgb.Dataset(X_train, y_train) lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train) # specify your configurations as a dict params = { 'boosting_type': 'gbdt', 'objective': 'binary...
  • 机器学习一个小目标——Task3

    千次阅读 2018-11-18 11:07:23
    lgb_train = lgb . Dataset ( train , y_train ) lgb_eval = lgb . Dataset ( test , y_test , reference = lgb_train ) """==========================================================================...
  • 1. 写在前面如果想从事数据挖掘或者机器学习的工作,掌握常用的机器学习算法是非常有必要的,在这简单的先捋一捋, 常见的机器学习算法:监督学习算法:逻辑回归,线性回归,决策树,朴素贝叶斯,...
  • LGB交叉验证KFOLD

    2020-09-15 17:45:34
    import lightgbm as lgb from sklearn.model_selection import StratifiedKFold # 假设这里准备好了训练数据train_data,它是一个pandas的dataframe,包括特征列和score列 train_label = train_data['score'] # ...
  • 你的main.py脚本看起来像这样: import pandas as pd import lightgbm as lgb from sklearn.model_selection import train_test_split data = pd.read_csv('data/train.csv', nrows=10000) X = data.drop(['ID_code...
  • 提升算法——lightGBM原理学习笔记

    千次阅读 2018-10-31 17:22:27
    中文官方文档:http://lightgbm.apachecn.org/cn/latest/Installation-Guide.html 英文官方文档:... 1.lightGBM安装 在anaconda中输入:pip install lightGBM即可 输入import lightgbm as lgb做测试...
  •  header=None)[0] y_train = df_train[0] y_test = df_test[0] X_train = df_train.drop(0, axis=1) X_test = df_test.drop(0, axis=1) num_train, num_feature = X_train.shape # create dataset ...
  • split, y_train_split, X_val, y_val = X_train.iloc[train_index], y_train[train_index], X_train.iloc[valid_index], y_train[valid_index] train_matrix = lgb.Dataset(X_train_split, label=y_train_split) ...
  • LGBM函数及参数详解

    万次阅读 2019-04-02 21:05:52
    lgb.update(train_set=None, fobj=None) Update for one iteration. Train API  lightgbm.train lightgbm.train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 4,801
精华内容 1,920
关键字:

lgb.train