精华内容
下载资源
问答
  • 关于lgb模型参数的问题,可以采用贝叶斯全局优化来调参 import lightgbm as lgb from bayes_opt import BayesianOptimization import warnings warnings.filterwarnings("ignore") from sklearn.datasets import ...

    关于lgb模型参数的问题,可以采用贝叶斯全局优化来调参

    import lightgbm as lgb
    from bayes_opt import BayesianOptimization
    import warnings
    warnings.filterwarnings("ignore")
    
    from sklearn.datasets import make_classification
    
    X, y = make_classification(n_samples=10000,n_features=20,n_classes=2,random_state=2)
    data = lgb.Dataset(X,y)
    
    def lgb_cv(feature_fraction,bagging_fraction,min_data_in_leaf,max_depth,min_split_gain,num_leaves,lambda_l1,lambda_l2,num_iterations=1000):
            params = {'objective':'binary','num_iterations': num_iterations, 'early_stopping_round':50, 'metric':'l1'}
            params['feature_fraction'] = max(min(feature_fraction, 1), 0)
            params['bagging_fraction'] = max(min(bagging_fraction, 1), 0)
            params["min_data_in_leaf"] = int(round(min_data_in_leaf))
            params['max_depth'] = int(round(max_depth))
            params['min_split_gain'] = min_split_gain      
            params["num_leaves"] = int(round(num_leaves))
            params['lambda_l1'] = max(lambda_l1, 0)
            params['lambda_l2'] = max(lambda_l2, 0)
            
            cv_result = lgb.cv(params, data, nfold=5, seed=2, stratified=True, verbose_eval =50)
            return -(min(cv_result['l1-mean']))
    
    lgb_bo = BayesianOptimization(
            lgb_cv,
            {'feature_fraction': (0.5, 1),
            'bagging_fraction': (0.5, 1),
            'min_data_in_leaf': (1,100),
            'max_depth': (3, 15),
             'min_split_gain': (0, 5),
             'num_leaves': (16, 128),
             'lambda_l1': (0, 100),
             'lambda_l2': (0, 100)}
        )        
    
    lgb_bo.maximize(init_points=21,n_iter=90) #init_points表示初始点,n_iter代表迭代次数(即采样数)
    print (lgb_bo.max)
    
    展开全文
  • LGB 模型保存及应用

    千次阅读 2020-06-09 15:45:26
    gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, early_stopping_rounds=5) # 模型保存 gbm.save_model('model.txt') # 模型加载 gbm = lgb.Booster(model_file='model.txt') #...
    1 原生模式
    # 模型训练
    gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, early_stopping_rounds=5)
     
    # 模型保存
    gbm.save_model('model.txt')
     
    # 模型加载
    gbm = lgb.Booster(model_file='model.txt')
     
    # 模型预测
    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
    
    2 sklearn接口模式
    from lightgbm import LGBMRegressor
    from sklearn.metrics import mean_squared_error
    from sklearn.model_selection import GridSearchCV
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.externals import joblib
     
     
    # 模型训练
    gbm = LGBMRegressor(objective='regression', num_leaves=31, learning_rate=0.05, n_estimators=20)
    gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric='l1', early_stopping_rounds=5)
     
    # 模型存储
    joblib.dump(gbm, 'loan_model.pkl')
    # 模型加载
    gbm = joblib.load('loan_model.pkl')
     
    # 模型预测
    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration_)

     

    展开全文
  • Lgb模型搭建

    千次阅读 2018-10-31 20:35:10
    lgb_eval = lgb.Dataset(valid_X, label=valid_Y, reference=lgb_train) print("Training...") bst = lgb.train( params, lgb_train, categorical_feature=list(range(1, 17)), # 指明哪些特征的分类特征 ...

    LightGBM是微软开源的项目,最大的优点是:速度快!!!

    之前参加某个竞赛时是使用LightGBM完成的,效果还不错,也很简单,放出来供大家参考~

    官方文档:http://lightgbm.apachecn.org/cn/latest/index.html

    本文采用的训练集,第一列是Y标签,之后每列是处理好的特征,具体选用哪些特征根据实际情况选择即可。
     

    # -*- coding: utf-8 -*-
    # author: Yu Sun
     
    import pandas as pd
    import lightgbm as lgb
    import matplotlib.pyplot as plt
    from sklearn.model_selection import train_test_split
     
    params = {
        'task': 'train',
        'boosting_type': 'gbdt',  # GBDT算法为基础
        'objective': 'binary',  # 因为要完成预测用户是否买单行为,所以是binary,不买是0,购买是1
        'metric': 'auc',  # 评判指标
        'max_bin': 255,  # 大会有更准的效果,更慢的速度
        'learning_rate': 0.1,  # 学习率
        'num_leaves': 64,  # 大会更准,但可能过拟合
        'max_depth': -1,  # 小数据集下限制最大深度可防止过拟合,小于0表示无限制
        'feature_fraction': 0.8,  # 防止过拟合
        'bagging_freq': 5,  # 防止过拟合
        'bagging_fraction': 0.8,  # 防止过拟合
        'min_data_in_leaf': 21,  # 防止过拟合
        'min_sum_hessian_in_leaf': 3.0,  # 防止过拟合
        'header': True  # 数据集是否带表头
    }
     
     
    # 训练模型并预测
    def train_predict_model(model_file='./model.txt'):
        dataset = pd.read_csv("./data/train.csv")  # 训练集
        d_x = dataset.iloc[:, 2:].values
        d_y = dataset['type'].values
        dataset_future = pd.read_csv("./data/test.csv")  # 测试集(用于在线提交结果)
        d_future_x = dataset_future.iloc[:, 2:].values
        train_X, valid_X, train_Y, valid_Y = train_test_split(
            d_x, d_y, test_size=0.2, random_state=2)  # 将训练集分为训练集+验证集
        lgb_train = lgb.Dataset(train_X, label=train_Y)
        lgb_eval = lgb.Dataset(valid_X, label=valid_Y, reference=lgb_train)
        print("Training...")
        bst = lgb.train(
            params,
            lgb_train,
            categorical_feature=list(range(1, 17)),  # 指明哪些特征的分类特征
            valid_sets=[lgb_eval],
            num_boost_round=500,
            early_stopping_rounds=200)
        print("Saving Model...")
        bst.save_model(model_file)  # 保存模型
        print("Predicting...")
        predict_result = bst.predict(d_future_x)  # 预测的结果在0-1之间,值越大代表预测用户购买的可能性越大
     
        return predict_result
     
     
    # 评估选取的各特征的重要度(画图显示)
    def plot_feature_importance(dataset, model_bst):
        list_feature_name = list(dataset.columns[2:])
        list_feature_importance = list(model_bst.feature_importance(
            importance_type='split', iteration=-1))
        dataframe_feature_importance = pd.DataFrame(
            {'feature_name': list_feature_name, 'importance': list_feature_importance})
        print(dataframe_feature_importance)
        x = range(len(list_feature_name))
        plt.xticks(x, list_feature_name, rotation=90, fontsize=14)
        plt.plot(x, list_feature_importance)
        for i in x:
            plt.axvline(i)
        plt.show()
     
     
    if __name__ == "__main__":
        train_predict_model()
    

     

    展开全文
  • lgb模型和n折验证的使用

    千次阅读 2019-07-20 15:49:18
    #lgb参数 lgb_params = { "boosting_type": "gbdt", "objective": "binary", 'metric': {'binary_logloss', 'auc'}, #二进制对数损失 "learning_rate": 0.01, "max_depth": 7, "num_leaves": 105, ...
    #lgb参数
    lgb_params = {
        "boosting_type": "gbdt",
        "objective": "binary",
        'metric': {'binary_logloss', 'auc'},  #二进制对数损失
        "learning_rate": 0.01,
        "max_depth": 7,
        "num_leaves": 105,
        "feature_fraction": 1,
        "bagging_fraction": 1,     
        'min_data_in_leaf': 100,  
        'bagging_freq': 6, 
        "nthread":30
        
    }
    
    
    def pred_select(label_name,column_name):
        if label_name == 'favorite':
            labels = train.favorite
        else:
            labels = train.purchase
        x = train_feature.values
        y = labels
        y_val = np.zeros((train_feature.shape[0]))#创建测试集
        y_test = np.zeros((test_feature.shape[0]))#创建测试集
        score_valid=[]
        
        skf = StratifiedKFold(n_splits=5,shuffle=True,random_state=42)
        for train_index,valid_index in skf.split(x,y):
            x_train,x_valid,y_train,y_valid = x[train_index],x[valid_index],y[train_index],y[valid_index]#
            #传入测试集数据 ,skf.split将数据集划分,划分后结果为索引,通过train_feature.values[索引值]访问
            
            train_data = lgb.Dataset(x_train,label=y_train)
            valid_data = lgb.Dataset(x_valid,label=y_valid)
            model = lgb.train(lgb_params,train_data,valid_sets=[valid_data],verbose_eval=1)
            y_val[valid_index] = model.predict(x_valid)
            score_valid.append(roc_auc_score(y_valid,y_val[valid_index]))
            y_test += np.array(model.predict(test_feature)/5)
        score_valid = np.array(score_valid)
        y_test = pd.DataFrame(y_test,columns=[column_name])
        print(label_name+'验证集结果:{}'.format(score_valid.mean()))
        return y_test  
    
    fav_test = pred_select('favorite','pred_favorite')   

     

    展开全文
  • 机器学习--LGB模型学习

    万次阅读 2018-05-16 16:23:12
    转自我的简书
  • 没有用深度模型,用的传统的lgb当成分类做的,这里的代码只用了一个非常基本的tfidf特征,模型构造好了,大家可以自己遵循自己的想法体现特征。 想先做实体的部分,就没做情感,可以加一个文件features / emo_...
  • lgb 模型调参 以回归为例子 可以试试贝叶斯调参
  • 模型训练lgb

    2021-08-25 09:46:47
    import lightgbm as lgb from sklearn import metrics from woe.eval import eval_segment_metrics # 一般这样,不需改动 params = { 'boosting_type': 'gbdt', 'objective': 'binary', 'metric': 'au
  • 本文主要讲解:lgb训练模型,保存模型,将训练特征作为需要调优的参数,使用AutoML-NNI对训练特征参数调参优化 主要思路: 使用lgb训练模型,保存模型 将训练特征作为需要调优的参数,使用保存的模型对这个参数进行...
  • 本文主要讲解:lgb训练模型后使用AutoML-NNI对注塑工艺参数调参优化 主要思路: 使用lgb训练注塑品的三维尺寸预测模型,保存模型 将训练特征作为需要调优的参数,使用保存的模型对这个参数进行预测 将预测的结果...
  • lgb预测泰坦尼克

    2021-07-28 16:32:46
    翻了一下csdn,发现没有用lgb模型处理泰坦尼克的,所以就去写了一下,代码里面注释点的部分是我为了解决报错的各种尝试,就不删除了,哈哈,代码拿走不谢 import numpy as np import pandas as pd import lightgbm ...
  • gbdt、xgb、lgb决策树模型

    千次阅读 2020-06-08 19:18:43
    本文主要对gbdt、xgboost、lightgbm、catboost进行简述和整理,包括模型原理、优劣性等内容 2.embedding表示方法0 embedding的制作方法比较常见的有word2vec embedding,neural network embedding和graph embedding
  • lgb

    2020-10-10 14:28:49
    #importing libraries ...import lightgbm as lgb import joblib from sklearn.datasets import load_breast_cancer,load_boston,load_wine from sklearn.model_selection import train_test_split fro.
  • 三十五、GBDT、XGB、LGB、RF模型比较汇总 由于本人喜欢在纸上手推原理,所以附上照片,欢迎提出建议
  • 数据比赛常用预测模型LGB、XGB与ANN

    万次阅读 多人点赞 2018-06-19 21:09:33
    现在的比赛,想要拿到一个好的名次,就一定要进行模型融合,这里总结一下三种基础的模型: - lightgbm:由于现在的比赛数据越来越大,想要获得一个比较高的预测精度,同时又要减少内存占用以及提升训练速度,light...
  • 模型调参】lgb的参数调节

    千次阅读 2020-08-13 21:50:16
    模型挑选3  模型调参3.1  设立初始参数3.2  调解n_estimators3.3  max_depth/num_leaves3.4  min_child_samples/min_child_weight3.5  subsample/...
  • IJCAI-18 阿里妈妈搜索广告转化预测亚军解决方案 -1 赛题介绍 -2 数据下载 初赛数据链接:https://share.weiyun.com/56y91Fx ...-3 file文件中包含 特征重要性,特征群线下测试结果,比赛攻略,答辩ppt -4 代码讲解
  • MNIST2_LGB_XGB训练预测

    2020-08-20 02:45:19
    针对MNIST数据集进行XGB\LGB模型训练和预测 部分脚本如下: 完整脚本见笔者github lgb_param = { 'boosting': 'gbdt', 'num_iterations': 145, 'num_threads' : 8, 'verbosity': 0, 'learning_rate': 0.2, '...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 4,541
精华内容 1,816
关键字:

lgb模型