• 2021-06-16 15:42:21

Breast Cancer Dataset数据集为例实现基于python的lightgbm开发实现模型训练：

import numpy as np
from collections import Counter
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error,roc_auc_score,precision_score
pd.options.display.max_columns = 999

Y=X.target

#train_test_split
X_train,X_test,y_train,y_test=train_test_split(X.data,Y,test_size=0.3,random_state=0)

#converting the dataset into proper LGB format
d_train=lgb.Dataset(X_train, label=y_train)
d_test= lgb.Dataset(X_test, label=y_test)

#Specifying the parameter
params={}
params['learning_rate']=0.03 #学习率
params['objective']='binary' #这里选择二分类
params['metric']='binary_logloss' #二分类损失函数
params['max_depth']=8 #树的最大深度
params['num_leaves']=256
params['feature_fraction']=0.8
params['bagging_fraction']=0.8
params['verbosity']=20
params['early_stopping_round']=5
#train the model
clf=lgb.train(params,d_train,200,valid_sets=[d_train, d_test]) #train the model on 100 epocs

#prediction on the test set
y_pred=clf.predict(X_test, num_iteration=clf.best_iteration)
clf.save_model('model.txt')
threshold = 0.5
pred_result = []
for mypred in y_pred:
if mypred > threshold:
pred_result.append(1)
else:
pred_result.append(0)
pred_result=np.array(pred_result)
print(np.sum(pred_result == y_test)/(y_test.shape))

训练结果如下：

[LightGBM] [Info] Number of positive: 249, number of negative: 149
[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.003518
[LightGBM] [Debug] init for col-wise cost 0.000009 seconds, init for row-wise cost 0.000308 seconds
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000582 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 3978
[LightGBM] [Info] Number of data points in the train set: 398, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.625628 -> initscore=0.513507
[LightGBM] [Info] Start training from score 0.513507
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 6 and depth = 4
[1]	training's binary_logloss: 0.637099	valid_1's binary_logloss: 0.635929
Training until validation scores don't improve for 5 rounds
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 12 and depth = 7
[2]	training's binary_logloss: 0.614854	valid_1's binary_logloss: 0.614422
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[3]	training's binary_logloss: 0.593595	valid_1's binary_logloss: 0.594355
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[4]	training's binary_logloss: 0.573574	valid_1's binary_logloss: 0.576194
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[5]	training's binary_logloss: 0.554915	valid_1's binary_logloss: 0.558234
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[6]	training's binary_logloss: 0.537236	valid_1's binary_logloss: 0.541329
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[7]	training's binary_logloss: 0.520261	valid_1's binary_logloss: 0.525344
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[8]	training's binary_logloss: 0.50415	valid_1's binary_logloss: 0.510902
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[9]	training's binary_logloss: 0.489039	valid_1's binary_logloss: 0.496493
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[10]	training's binary_logloss: 0.474772	valid_1's binary_logloss: 0.482647
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[11]	training's binary_logloss: 0.46088	valid_1's binary_logloss: 0.469516
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[12]	training's binary_logloss: 0.447623	valid_1's binary_logloss: 0.457724
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[13]	training's binary_logloss: 0.435023	valid_1's binary_logloss: 0.4458
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[14]	training's binary_logloss: 0.422836	valid_1's binary_logloss: 0.434193
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[15]	training's binary_logloss: 0.411056	valid_1's binary_logloss: 0.422647
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[16]	training's binary_logloss: 0.400121	valid_1's binary_logloss: 0.412698
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[17]	training's binary_logloss: 0.389486	valid_1's binary_logloss: 0.402639
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[18]	training's binary_logloss: 0.379253	valid_1's binary_logloss: 0.393552
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[19]	training's binary_logloss: 0.368862	valid_1's binary_logloss: 0.383841
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[20]	training's binary_logloss: 0.358789	valid_1's binary_logloss: 0.374633
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[21]	training's binary_logloss: 0.349929	valid_1's binary_logloss: 0.366389
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[22]	training's binary_logloss: 0.340764	valid_1's binary_logloss: 0.357705
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[23]	training's binary_logloss: 0.33238	valid_1's binary_logloss: 0.349571
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[24]	training's binary_logloss: 0.324111	valid_1's binary_logloss: 0.342498
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[25]	training's binary_logloss: 0.315687	valid_1's binary_logloss: 0.334863
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[26]	training's binary_logloss: 0.308073	valid_1's binary_logloss: 0.327378
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[27]	training's binary_logloss: 0.300772	valid_1's binary_logloss: 0.320921
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[28]	training's binary_logloss: 0.293271	valid_1's binary_logloss: 0.314473
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[29]	training's binary_logloss: 0.285343	valid_1's binary_logloss: 0.307776
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[30]	training's binary_logloss: 0.278194	valid_1's binary_logloss: 0.301608
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[31]	training's binary_logloss: 0.271275	valid_1's binary_logloss: 0.295728
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[32]	training's binary_logloss: 0.265241	valid_1's binary_logloss: 0.289948
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[33]	training's binary_logloss: 0.259076	valid_1's binary_logloss: 0.283951
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[34]	training's binary_logloss: 0.252781	valid_1's binary_logloss: 0.279285
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[35]	training's binary_logloss: 0.247148	valid_1's binary_logloss: 0.274897
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[36]	training's binary_logloss: 0.241241	valid_1's binary_logloss: 0.269207
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[37]	training's binary_logloss: 0.235034	valid_1's binary_logloss: 0.262967
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[38]	training's binary_logloss: 0.229855	valid_1's binary_logloss: 0.258816
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[39]	training's binary_logloss: 0.224032	valid_1's binary_logloss: 0.254141
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[40]	training's binary_logloss: 0.218463	valid_1's binary_logloss: 0.248447
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[41]	training's binary_logloss: 0.213459	valid_1's binary_logloss: 0.243892
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[42]	training's binary_logloss: 0.20825	valid_1's binary_logloss: 0.238368
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[43]	training's binary_logloss: 0.203643	valid_1's binary_logloss: 0.234943
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[44]	training's binary_logloss: 0.198778	valid_1's binary_logloss: 0.230943
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[45]	training's binary_logloss: 0.194485	valid_1's binary_logloss: 0.226699
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[46]	training's binary_logloss: 0.190259	valid_1's binary_logloss: 0.223144
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[47]	training's binary_logloss: 0.185813	valid_1's binary_logloss: 0.218368
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[48]	training's binary_logloss: 0.181977	valid_1's binary_logloss: 0.214225
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[49]	training's binary_logloss: 0.178122	valid_1's binary_logloss: 0.211552
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 13 and depth = 8
[50]	training's binary_logloss: 0.174413	valid_1's binary_logloss: 0.20768
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[51]	training's binary_logloss: 0.170787	valid_1's binary_logloss: 0.204712
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[52]	training's binary_logloss: 0.167322	valid_1's binary_logloss: 0.20178
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[53]	training's binary_logloss: 0.163606	valid_1's binary_logloss: 0.198179
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[54]	training's binary_logloss: 0.160253	valid_1's binary_logloss: 0.194558
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[55]	training's binary_logloss: 0.156964	valid_1's binary_logloss: 0.191423
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[56]	training's binary_logloss: 0.153701	valid_1's binary_logloss: 0.188552
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[57]	training's binary_logloss: 0.150384	valid_1's binary_logloss: 0.18526
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[58]	training's binary_logloss: 0.147348	valid_1's binary_logloss: 0.182651
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[59]	training's binary_logloss: 0.144237	valid_1's binary_logloss: 0.17956
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[60]	training's binary_logloss: 0.141067	valid_1's binary_logloss: 0.177025
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[61]	training's binary_logloss: 0.138078	valid_1's binary_logloss: 0.174932
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[62]	training's binary_logloss: 0.135064	valid_1's binary_logloss: 0.172666
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[63]	training's binary_logloss: 0.132053	valid_1's binary_logloss: 0.169564
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[64]	training's binary_logloss: 0.129503	valid_1's binary_logloss: 0.167971
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[65]	training's binary_logloss: 0.126807	valid_1's binary_logloss: 0.166067
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[66]	training's binary_logloss: 0.124472	valid_1's binary_logloss: 0.164573
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[67]	training's binary_logloss: 0.121803	valid_1's binary_logloss: 0.162297
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[68]	training's binary_logloss: 0.119218	valid_1's binary_logloss: 0.160165
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[69]	training's binary_logloss: 0.116834	valid_1's binary_logloss: 0.158635
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[70]	training's binary_logloss: 0.114698	valid_1's binary_logloss: 0.157089
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[71]	training's binary_logloss: 0.11267	valid_1's binary_logloss: 0.155189
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[72]	training's binary_logloss: 0.110581	valid_1's binary_logloss: 0.154139
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[73]	training's binary_logloss: 0.108455	valid_1's binary_logloss: 0.151963
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[74]	training's binary_logloss: 0.106526	valid_1's binary_logloss: 0.150431
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[75]	training's binary_logloss: 0.104603	valid_1's binary_logloss: 0.148599
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 13 and depth = 8
[76]	training's binary_logloss: 0.102858	valid_1's binary_logloss: 0.147094
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[77]	training's binary_logloss: 0.100941	valid_1's binary_logloss: 0.145486
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[78]	training's binary_logloss: 0.0990606	valid_1's binary_logloss: 0.144956
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[79]	training's binary_logloss: 0.0972208	valid_1's binary_logloss: 0.143309
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[80]	training's binary_logloss: 0.0955612	valid_1's binary_logloss: 0.141914
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[81]	training's binary_logloss: 0.0936662	valid_1's binary_logloss: 0.140107
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[82]	training's binary_logloss: 0.0918074	valid_1's binary_logloss: 0.13822
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[83]	training's binary_logloss: 0.0902194	valid_1's binary_logloss: 0.136428
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[84]	training's binary_logloss: 0.0887508	valid_1's binary_logloss: 0.135742
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[85]	training's binary_logloss: 0.0868116	valid_1's binary_logloss: 0.133592
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[86]	training's binary_logloss: 0.0854032	valid_1's binary_logloss: 0.132641
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[87]	training's binary_logloss: 0.0839658	valid_1's binary_logloss: 0.131172
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[88]	training's binary_logloss: 0.0821433	valid_1's binary_logloss: 0.129201
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[89]	training's binary_logloss: 0.0806435	valid_1's binary_logloss: 0.12819
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[90]	training's binary_logloss: 0.0790972	valid_1's binary_logloss: 0.126917
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[91]	training's binary_logloss: 0.0776276	valid_1's binary_logloss: 0.125677
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[92]	training's binary_logloss: 0.0759841	valid_1's binary_logloss: 0.123773
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[93]	training's binary_logloss: 0.0744986	valid_1's binary_logloss: 0.122142
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[94]	training's binary_logloss: 0.0731048	valid_1's binary_logloss: 0.120618
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[95]	training's binary_logloss: 0.0715668	valid_1's binary_logloss: 0.11887
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[96]	training's binary_logloss: 0.0702457	valid_1's binary_logloss: 0.117539
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[97]	training's binary_logloss: 0.0687897	valid_1's binary_logloss: 0.115905
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[98]	training's binary_logloss: 0.0673354	valid_1's binary_logloss: 0.11401
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[99]	training's binary_logloss: 0.0658814	valid_1's binary_logloss: 0.112051
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[100]	training's binary_logloss: 0.0646956	valid_1's binary_logloss: 0.11111
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[101]	training's binary_logloss: 0.0633999	valid_1's binary_logloss: 0.110353
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[102]	training's binary_logloss: 0.0621868	valid_1's binary_logloss: 0.108952
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[103]	training's binary_logloss: 0.060923	valid_1's binary_logloss: 0.10819
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[104]	training's binary_logloss: 0.0595603	valid_1's binary_logloss: 0.106797
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[105]	training's binary_logloss: 0.0583284	valid_1's binary_logloss: 0.105589
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[106]	training's binary_logloss: 0.0571175	valid_1's binary_logloss: 0.104555
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[107]	training's binary_logloss: 0.0558811	valid_1's binary_logloss: 0.103554
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[108]	training's binary_logloss: 0.054666	valid_1's binary_logloss: 0.102321
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[109]	training's binary_logloss: 0.0534981	valid_1's binary_logloss: 0.101348
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[110]	training's binary_logloss: 0.0522027	valid_1's binary_logloss: 0.100143
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[111]	training's binary_logloss: 0.0511505	valid_1's binary_logloss: 0.0990465
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[112]	training's binary_logloss: 0.050289	valid_1's binary_logloss: 0.0984799
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[113]	training's binary_logloss: 0.0492806	valid_1's binary_logloss: 0.0976036
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[114]	training's binary_logloss: 0.0481638	valid_1's binary_logloss: 0.0970437
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[115]	training's binary_logloss: 0.0472125	valid_1's binary_logloss: 0.0959391
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[116]	training's binary_logloss: 0.0462155	valid_1's binary_logloss: 0.0952578
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[117]	training's binary_logloss: 0.0450958	valid_1's binary_logloss: 0.0944288
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[118]	training's binary_logloss: 0.0441526	valid_1's binary_logloss: 0.093485
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[119]	training's binary_logloss: 0.0430816	valid_1's binary_logloss: 0.0927758
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[120]	training's binary_logloss: 0.0420753	valid_1's binary_logloss: 0.0921918
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[121]	training's binary_logloss: 0.0413452	valid_1's binary_logloss: 0.0914901
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[122]	training's binary_logloss: 0.0403758	valid_1's binary_logloss: 0.0905335
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[123]	training's binary_logloss: 0.0394884	valid_1's binary_logloss: 0.0898742
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[124]	training's binary_logloss: 0.038686	valid_1's binary_logloss: 0.0893582
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[125]	training's binary_logloss: 0.037697	valid_1's binary_logloss: 0.0885564
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[126]	training's binary_logloss: 0.0367451	valid_1's binary_logloss: 0.0877966
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[127]	training's binary_logloss: 0.0360618	valid_1's binary_logloss: 0.0872113
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 24 and depth = 8
[128]	training's binary_logloss: 0.0352308	valid_1's binary_logloss: 0.0863429
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[129]	training's binary_logloss: 0.0345178	valid_1's binary_logloss: 0.0856196
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[130]	training's binary_logloss: 0.0337563	valid_1's binary_logloss: 0.0851076
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[131]	training's binary_logloss: 0.0329809	valid_1's binary_logloss: 0.084401
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 22 and depth = 7
[132]	training's binary_logloss: 0.0322279	valid_1's binary_logloss: 0.0835122
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[133]	training's binary_logloss: 0.0315716	valid_1's binary_logloss: 0.0827184
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[134]	training's binary_logloss: 0.0310195	valid_1's binary_logloss: 0.0823835
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[135]	training's binary_logloss: 0.0304041	valid_1's binary_logloss: 0.0816694
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 6
[136]	training's binary_logloss: 0.0298142	valid_1's binary_logloss: 0.0808947
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 6
[137]	training's binary_logloss: 0.0292435	valid_1's binary_logloss: 0.0798641
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 23 and depth = 7
[138]	training's binary_logloss: 0.0286536	valid_1's binary_logloss: 0.0791714
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 22 and depth = 7
[139]	training's binary_logloss: 0.0280507	valid_1's binary_logloss: 0.0785432
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[140]	training's binary_logloss: 0.0274645		valid_1's binary_logloss: 0.0778465
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[141]	training's binary_logloss: 0.0269174	valid_1's binary_logloss: 0.0774328
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 24 and depth = 8
[142]	training's binary_logloss: 0.0263429	valid_1's binary_logloss: 0.0769741
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[143]	training's binary_logloss: 0.0258156	valid_1's binary_logloss: 0.0765868
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[144]	training's binary_logloss: 0.025373	valid_1's binary_logloss: 0.0760364
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 22 and depth = 7
[145]	training's binary_logloss: 0.0248701	valid_1's binary_logloss: 0.0755277
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[146]	training's binary_logloss: 0.0244912	valid_1's binary_logloss: 0.0751147
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 22 and depth = 7
[147]	training's binary_logloss: 0.023934	valid_1's binary_logloss: 0.0744406
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[148]	training's binary_logloss: 0.0234691	valid_1's binary_logloss: 0.0739813
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[149]	training's binary_logloss: 0.0229897	valid_1's binary_logloss: 0.0731026
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[150]	training's binary_logloss: 0.0225124	valid_1's binary_logloss: 0.0729205
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[151]	training's binary_logloss: 0.022092	valid_1's binary_logloss: 0.0725503
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 15 and depth = 8
[152]	training's binary_logloss: 0.0216892	valid_1's binary_logloss: 0.0723016
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 23 and depth = 8
[153]	training's binary_logloss: 0.0212022	valid_1's binary_logloss: 0.0716951
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[154]	training's binary_logloss: 0.0207975	valid_1's binary_logloss: 0.0713447
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[155]	training's binary_logloss: 0.020463	valid_1's binary_logloss: 0.0712453
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[156]	training's binary_logloss: 0.020018	valid_1's binary_logloss: 0.0702304
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[157]	training's binary_logloss: 0.019635	valid_1's binary_logloss: 0.0696673
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[158]	training's binary_logloss: 0.0192115	valid_1's binary_logloss: 0.069452
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[159]	training's binary_logloss: 0.0188222	valid_1's binary_logloss: 0.0692833
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[160]	training's binary_logloss: 0.0183735	valid_1's binary_logloss: 0.0687872
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[161]	training's binary_logloss: 0.0180016	valid_1's binary_logloss: 0.0685565
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[162]	training's binary_logloss: 0.0176148	valid_1's binary_logloss: 0.068289
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[163]	training's binary_logloss: 0.0171976	valid_1's binary_logloss: 0.0678446
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[164]	training's binary_logloss: 0.0169007	valid_1's binary_logloss: 0.0679637
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 25 and depth = 8
[165]	training's binary_logloss: 0.0164975	valid_1's binary_logloss: 0.0677298
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 24 and depth = 8
[166]	training's binary_logloss: 0.0160983	valid_1's binary_logloss: 0.0674051
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 24 and depth = 8
[167]	training's binary_logloss: 0.0157118	valid_1's binary_logloss: 0.0671737
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[168]	training's binary_logloss: 0.0153781	valid_1's binary_logloss: 0.0663582
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[169]	training's binary_logloss: 0.0151332	valid_1's binary_logloss: 0.0659222
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 21 and depth = 8
[170]	training's binary_logloss: 0.0148225	valid_1's binary_logloss: 0.0652532
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 17 and depth = 8
[171]	training's binary_logloss: 0.0145917	valid_1's binary_logloss: 0.0652072
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 21 and depth = 8
[172]	training's binary_logloss: 0.014312	valid_1's binary_logloss: 0.0653298
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[173]	training's binary_logloss: 0.0139953	valid_1's binary_logloss: 0.0648877
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[174]	training's binary_logloss: 0.0136745	valid_1's binary_logloss: 0.0644962
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[175]	training's binary_logloss: 0.013423	valid_1's binary_logloss: 0.063908
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[176]	training's binary_logloss: 0.0131673	valid_1's binary_logloss: 0.0633065
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[177]	training's binary_logloss: 0.0128575	valid_1's binary_logloss: 0.0629446
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 23 and depth = 8
[178]	training's binary_logloss: 0.0125607	valid_1's binary_logloss: 0.0625482
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 21 and depth = 8
[179]	training's binary_logloss: 0.0123309	valid_1's binary_logloss: 0.0617762
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[180]	training's binary_logloss: 0.0121218	valid_1's binary_logloss: 0.0611003
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[181]	training's binary_logloss: 0.011888	valid_1's binary_logloss: 0.0611477
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[182]	training's binary_logloss: 0.0116615	valid_1's binary_logloss: 0.0607085
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[183]	training's binary_logloss: 0.0114217	valid_1's binary_logloss: 0.0607112
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[184]	training's binary_logloss: 0.0112255	valid_1's binary_logloss: 0.0603716
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[185]	training's binary_logloss: 0.0110376	valid_1's binary_logloss: 0.059733
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 23 and depth = 8
[186]	training's binary_logloss: 0.0107713	valid_1's binary_logloss: 0.0593237
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[187]	training's binary_logloss: 0.0106065	valid_1's binary_logloss: 0.058974
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 22 and depth = 8
[188]	training's binary_logloss: 0.0104208	valid_1's binary_logloss: 0.0586195
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[189]	training's binary_logloss: 0.0102498	valid_1's binary_logloss: 0.0579846
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[190]	training's binary_logloss: 0.0100509	valid_1's binary_logloss: 0.0574762
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[191]	training's binary_logloss: 0.0098558	valid_1's binary_logloss: 0.05762
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 19 and depth = 8
[192]	training's binary_logloss: 0.0096644	valid_1's binary_logloss: 0.0571114
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and depth = 8
[193]	training's binary_logloss: 0.00949739	valid_1's binary_logloss: 0.0571114
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[194]	training's binary_logloss: 0.00935833	valid_1's binary_logloss: 0.056752
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[195]	training's binary_logloss: 0.0091587	valid_1's binary_logloss: 0.0562458
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 18 and depth = 8
[196]	training's binary_logloss: 0.00898332	valid_1's binary_logloss: 0.055768
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 21 and depth = 8
[197]	training's binary_logloss: 0.00878578	valid_1's binary_logloss: 0.055594
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[198]	training's binary_logloss: 0.00858836	valid_1's binary_logloss: 0.0555947
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 20 and depth = 8
[199]	training's binary_logloss: 0.00844921	valid_1's binary_logloss: 0.0551059
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and depth = 8
[200]	training's binary_logloss: 0.00833818	valid_1's binary_logloss: 0.055061
Did not meet early stopping. Best iteration is:
[200]	training's binary_logloss: 0.00833818	valid_1's binary_logloss: 0.055061
[0.97076023]

更多相关内容
• LightGBM 的 MATLAB 包装器
• LightGBM Scikit-Learn API LightGBM 可以作为独立库安装，并且可以使用 scikit-learn API 开发 LightGBM 模型。 第一步是安装 LightGBM 库（如果尚未安装）。这可以在大多数平台上使用 pip python 包管理器来实现...

LightGBM 通过添加一种自动特征选择以及专注于具有更大梯度的提升示例来扩展梯度提升算法。这可以显着加快训练速度并提高预测性能。

因此，当使用表格数据进行回归和分类预测建模任务时，LightGBM 已成为机器学习竞赛的事实上的算法。因此，它应为梯度提升方法以及极限梯度提升 (XGBoost) 的普及和广泛采用负有一部分责任。

在本教程中，您将了解如何开发用于分类和回归的 Light Gradient Boosted Machine 集成。

完成本教程后，您将了解：

• Light Gradient Boosted Machine (LightGBM) 是随机梯度提升集成算法的高效开源实现。

• 如何使用 scikit-learn API 开发用于分类和回归的 LightGBM 集成。

• 如何探索 LightGBM 模型超参数对模型性能的影响。

使用我的新书Ensemble Learning Algorithms With Python开始您的项目，包括_分步教程_和所有示例的_Python 源代码_文件。

让我们开始吧。

## 教程概述

本教程分为三个部分；他们是：

1. Light梯度提升机算法

2. LightGBM Scikit-Learn API

3. 用于分类的 LightGBM 集成

4. 用于回归的 LightGBM 集成

5. LightGBM 超参数

6. 探索树的数量

7. 探索树深度

8. 探索学习率

9. 探索提升类型

## Light梯度提升机算法

梯度提升是指一类可用于分类或回归预测建模问题的集成机器学习算法。

集成是从决策树模型构建的。一次将一棵树添加到集成中并进行拟合以纠正先前模型产生的预测错误。这是一种称为 boosting 的集成机器学习模型。

使用任何任意可微损失函数和梯度下降优化算法来拟合模型。这使该技术得名“梯度提升”，因为随着模型的拟合，损失梯度被最小化，很像神经网络。

有关梯度提升的更多信息，请参阅教程：

• 机器学习梯度提升算法的简单介绍

因此，LightGBM指的是开源项目、软件库和机器学习算法。这样，它与Extreme Gradient Boosting 或 XGBoost 技术非常相似。

LightGBM 由郭林柯等人描述。在 2017 年题为“ LightGBM：一种高效的梯度提升决策树”的论文中。该实现引入了两个关键思想：GOSS 和 EFB。

基于梯度的单边采样，简称 GOSS，是对梯度提升方法的一种修改，将注意力集中在那些导致更大梯度的训练示例上，从而加快学习速度并降低方法的计算复杂度。

使用 GOSS，我们排除了很大一部分具有小梯度的数据实例，仅使用其余部分来估计信息增益。我们证明，由于具有较大梯度的数据实例在信息增益的计算中起着更重要的作用，因此 GOSS 可以以更小的数据量获得相当准确的信息增益估计。

— LightGBM：一种高效的梯度提升决策树，2017 年。

Exclusive Feature Bundling，简称 EFB，是一种捆绑稀疏（大部分为零）互斥特征的方法，例如已进行单热编码的分类变量输入。因此，它是一种自动特征选择。

……我们捆绑了互斥的特征（即，它们很少同时取非零值），以减少特征的数量。

— LightGBM：一种高效的梯度提升决策树，2017 年。

这两个变化一起可以将算法的训练时间加快多达 20 倍。因此，LightGBM 可以被视为添加 GOSS 和 EFB 的梯度提升决策树 (GBDT)。

我们将新的 GBDT 实现称为 GOSS 和 EFB LightGBM。我们在多个公共数据集上的实验表明，LightGBM 将传统 GBDT 的训练过程加快了 20 多倍，同时达到了几乎相同的精度

— LightGBM：一种高效的梯度提升决策树，2017 年。

## LightGBM Scikit-Learn API

LightGBM 可以作为独立库安装，并且可以使用 scikit-learn API 开发 LightGBM 模型。

第一步是安装 LightGBM 库（如果尚未安装）。这可以在大多数平台上使用 pip python 包管理器来实现；例如：

 sudo pip install lightgbm

然后，您可以确认 LightGBM 库已正确安装并且可以通过运行以下脚本来使用。

 # check lightgbm versionimport lightgbmprint(lightgbm.__version__)

运行该脚本将打印您已安装的 LightGBM 库的版本。

您的版本应该相同或更高。如果没有，您必须升级 LightGBM 库的版本。

如果您需要针对您的开发环境的特定说明，请参阅教程：

• LightGBM 安装指南

LightGBM 库有自己的自定义 API，尽管我们将通过 scikit-learn 包装类使用该方法：LGBMRegressor和LGBMClassifier。这将使我们能够使用 scikit-learn 机器学习库中的全套工具来准备数据和评估模型。

两个模型以相同的方式运行，并采用相同的参数来影响决策树的创建和添加到集成的方式。

随机性用于构建模型。这意味着每次在相同的数据上运行算法时，都会产生一个略有不同的模型。

当使用具有随机学习算法的机器学习算法时，通过在多次运行或重复交叉验证中平均它们的性能来评估它们是一种很好的做法。在拟合最终模型时，可能需要增加树的数量直到模型的方差在重复评估中减少，或者拟合多个最终模型并平均它们的预测。

让我们来看看如何为分类和回归开发 LightGBM 集成。

### 用于分类的 LightGBM 集成

在本节中，我们将研究使用 LightGBM 解决分类问题。

首先，我们可以使用make_classification() 函数创建一个包含 1,000 个示例和 20 个输入特征的合成二元分类问题。

下面列出了完整的示例。

 # test classification datasetfrom sklearn.datasets import make_classification# define datasetX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)# summarize the datasetprint(X.shape, y.shape)

运行示例会创建数据集并总结输入和输出组件的形状。

接下来，我们可以在这个数据集上评估 LightGBM 算法。

我们将使用重复分层 k 折交叉验证来评估模型，其中包含 3 次重复和 10 次重复。我们将报告模型在所有重复和折叠中的准确性的平均值和标准偏差。

 # evaluate lightgbm algorithm for classificationfrom numpy import meanfrom numpy import stdfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import cross_val_scorefrom sklearn.model_selection import RepeatedStratifiedKFoldfrom lightgbm import LGBMClassifier# define datasetX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)# define the modelmodel = LGBMClassifier()# evaluate the modelcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)# report performanceprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

运行示例报告模型的均值和标准偏差准确度。

注意：您的结果可能会因算法或评估程序的随机性或数值精度的差异而有所不同。考虑多次运行该示例并比较平均结果。

在这种情况下，我们可以看到具有默认超参数的 LightGBM 集成在此测试数据集上实现了约 92.5% 的分类准确率。

我们还可以使用 LightGBM 模型作为最终模型并进行分类预测。

首先，LightGBM 集成适合所有可用数据，然后可以调用_predict()_函数对新数据进行预测。

下面的示例在我们的二进制分类数据集上演示了这一点。

 # make predictions using lightgbm for classificationfrom sklearn.datasets import make_classificationfrom lightgbm import LGBMClassifier# define datasetX, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)# define the modelmodel = LGBMClassifier()# fit the model on the whole datasetmodel.fit(X, y)# make a single predictionrow = [0.2929949,-4.21223056,-1.288332,-2.17849815,-0.64527665,2.58097719,0.28422388,-7.1827928,-1.91211104,2.73729512,0.81395695,3.96973717,-2.66939799,3.34692332,4.19791821,0.99990998,-0.30201875,-4.43170633,-2.82646737,0.44916808]yhat = model.predict([row])print('Predicted Class: %d' % yhat[0])

运行示例在整个数据集上拟合 LightGBM 集成模型，然后用于对新数据行进行预测，就像我们在应用程序中使用模型时一样。

现在我们熟悉了使用 LightGBM 进行分类，让我们看一下用于回归的 API。

### 用于回归的 LightGBM 集成

在本节中，我们将研究使用 LightGBM 解决回归问题。

首先，我们可以使用make_regression() 函数创建一个包含 1,000 个示例和 20 个输入特征的综合回归问题。

下面列出了完整的示例。

 # test regression datasetfrom sklearn.datasets import make_regression# define datasetX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)# summarize the datasetprint(X.shape, y.shape)

运行示例会创建数据集并总结输入和输出组件的形状。

接下来，我们可以在这个数据集上评估 LightGBM 算法。

正如我们在上一节所做的那样，我们将使用重复的 k 折交叉验证来评估模型，重复 3 次和 10 次。我们将报告模型在所有重复和折叠中的平均绝对误差 (MAE)。scikit-learn 库使 MAE 为负值，使其最大化而不是最小化。这意味着负 MAE 越大越好，完美模型的 MAE 为 0。

下面列出了完整的示例。

 # evaluate lightgbm ensemble for regressionfrom numpy import meanfrom numpy import stdfrom sklearn.datasets import make_regressionfrom sklearn.model_selection import cross_val_scorefrom sklearn.model_selection import RepeatedKFoldfrom lightgbm import LGBMRegressor# define datasetX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)# define the modelmodel = LGBMRegressor()# evaluate the modelcv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')# report performanceprint('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

运行示例报告模型的均值和标准偏差准确度。

注意：您的结果可能会因算法或评估程序的随机性或数值精度的差异而有所不同。考虑多次运行该示例并比较平均结果。

在这种情况下，我们可以看到具有默认超参数的 LightGBM 集成实现了大约 60 的 MAE。

我们还可以使用 LightGBM 模型作为最终模型并对回归进行预测。

首先，LightGBM 集成适合所有可用数据，然后可以调用_predict()_函数对新数据进行预测。

下面的示例在我们的回归数据集上演示了这一点。

 # gradient lightgbm for making predictions for regressionfrom sklearn.datasets import make_regressionfrom lightgbm import LGBMRegressor# define datasetX, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)# define the modelmodel = LGBMRegressor()# fit the model on the whole datasetmodel.fit(X, y)# make a single predictionrow = [0.20543991,-0.97049844,-0.81403429,-0.23842689,-0.60704084,-0.48541492,0.53113006,2.01834338,-0.90745243,-1.85859731,-1.02334791,-0.6877744,0.60984819,-0.70630121,-1.29161497,1.32385441,1.42150747,1.26567231,2.56569098,-0.11154792]yhat = model.predict([row])print('Prediction: %d' % yhat[0])

运行示例在整个数据集上拟合 LightGBM 集成模型，然后用于对新数据行进行预测，就像我们在应用程序中使用模型时一样。

现在我们已经熟悉使用 scikit-learn API 来评估和使用 LightGBM 集成，让我们看一下配置模型。

## LightGBM 超参数

在本节中，我们将仔细研究一些您应该考虑调整 LightGBM 集成的超参数及其对模型性能的影响。

我们可以查看 LightGBM 的许多超参数，尽管在这种情况下，我们将查看树的数量和树的深度、学习率和增强类型。

有关调整 LightGBM 超参数的一般建议，请参阅文档：

• LightGBM 参数调整。

### 探索树的数量

LightGBM 集成算法的一个重要超参数是集成中使用的决策树的数量。

回想一下，决策树按顺序添加到模型中，以纠正和改进先前树所做的预测。因此，更多的树通常更好。

树的数量可以通过“ n_estimators ”参数设置，默认为 100。

下面的示例探讨了值在 10 到 5,000 之间的树数量的影响。

# explore lightgbm number of trees effect on performance

from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from lightgbm import LGBMClassifier

from matplotlib import pyplot

# get the dataset

def get_dataset():

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

return X, y

# get a list of models to evaluate

def get_models():

models = dict()

trees = [10, 50, 100, 500, 1000, 5000]

for n in trees:

models[str(n)] = LGBMClassifier(n_estimators=n)

return models

# evaluate a give model using cross-validation

def evaluate_model(model):

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

return scores

# define dataset

X, y = get_dataset()

# get the models to evaluate

models = get_models()

# evaluate the models and store results

results, names = list(), list()

for name, model in models.items():

scores = evaluate_model(model)

results.append(scores)

names.append(name)

print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))

# plot model performance for comparison

pyplot.boxplot(results, labels=names, showmeans=True)

pyplot.show()


运行示例首先报告每个配置数量的决策树的平均准确度。

注意：您的结果可能会因算法或评估程序的随机性或数值精度的差异而有所不同。考虑多次运行该示例并比较平均结果。

在这种情况下，我们可以看到该数据集的性能有所提高，直到大约 500 棵树，之后性能似乎趋于平稳。

 >10 0.857 (0.033)>50 0.916 (0.032)>100 0.925 (0.031)>500 0.938 (0.026)>1000 0.938 (0.028)>5000 0.937 (0.028)

为每个配置数量的树的准确度分数分布创建了一个箱线图。

我们可以看到增加模型性能和集成大小的总体趋势。

LightGBM 集成大小与分类精度的箱线图

### 探索树深度

改变添加到集成中的每棵树的深度是梯度提升的另一个重要超参数。

梯度提升通常在深度适中的树上表现良好，在技巧和通用性之间找到平衡。

树深度是通过“ max_depth ”参数控制的，默认为一个未指定的值，因为控制树的复杂程度的默认机制是使用叶节点的数量。

控制树的复杂度主要有两种方法：树的最大深度和树中终端节点（叶子）的最大数量。在这种情况下，我们正在探索叶子的数量，因此我们需要通过设置“ num_leaves ”参数来增加叶子的数量以支持更深的树。

下面的示例探讨了 1 到 10 之间的树深度以及对模型性能的影响。

# explore lightgbm tree depth effect on performance

from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from lightgbm import LGBMClassifier

from matplotlib import pyplot

# get the dataset

def get_dataset():

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

return X, y

# get a list of models to evaluate

def get_models():

models = dict()

for i in range(1,11):

models[str(i)] = LGBMClassifier(max_depth=i, num_leaves=2**i)

return models

# evaluate a give model using cross-validation

def evaluate_model(model):

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

return scores

# define dataset

X, y = get_dataset()

# get the models to evaluate

models = get_models()

# evaluate the models and store results

results, names = list(), list()

for name, model in models.items():

scores = evaluate_model(model)

results.append(scores)

names.append(name)

print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))

# plot model performance for comparison

pyplot.boxplot(results, labels=names, showmeans=True)

pyplot.show()


运行示例首先报告每个配置的树深度的平均准确度。

注意：您的结果可能会因算法或评估程序的随机性或数值精度的差异而有所不同。考虑多次运行该示例并比较平均结果。

在这种情况下，我们可以看到性能随着树的深度而提高，可能一直到 10 个级别。探索更深的树可能会很有趣。

 >1 0.833 (0.028)>2 0.870 (0.033)>3 0.899 (0.032)>4 0.912 (0.026)>5 0.925 (0.031)>6 0.924 (0.029)>7 0.922 (0.027)>8 0.926 (0.027)>9 0.925 (0.028)>10 0.928 (0.029)

为每个配置的树深度的准确度分数分布创建了一个盒须图。

我们可以看到模型性能随着树深度增加到五个级别的深度而增加的总体趋势，之后性能开始变得相当平坦。

LightGBM 集成树深度与分类精度的箱线图

### 探索学习率

学习率控制每个模型对集成预测的贡献量。

较小的速率可能需要集成中更多的决策树。

学习率可以通过“ learning_rate ”参数控制，默认为0.1。

下面的示例探讨了学习率并比较了 0.0001 和 1.0 之间的值的影响。

# explore lightgbm learning rate effect on performance

from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from lightgbm import LGBMClassifier

from matplotlib import pyplot

# get the dataset

def get_dataset():

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

return X, y

# get a list of models to evaluate

def get_models():

models = dict()

rates = [0.0001, 0.001, 0.01, 0.1, 1.0]

for r in rates:

key = '%.4f' % r

models[key] = LGBMClassifier(learning_rate=r)

return models

# evaluate a give model using cross-validation

def evaluate_model(model):

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

return scores

# define dataset

X, y = get_dataset()

# get the models to evaluate

models = get_models()

# evaluate the models and store results

results, names = list(), list()

for name, model in models.items():

scores = evaluate_model(model)

results.append(scores)

names.append(name)

print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))

# plot model performance for comparison

pyplot.boxplot(results, labels=names, showmeans=True)

pyplot.show()


运行示例首先报告每个配置的学习率的平均准确度。

注意：您的结果可能会因算法或评估程序的随机性或数值精度的差异而有所不同。考虑多次运行该示例并比较平均结果。

在这种情况下，我们可以看到更大的学习率会在这个数据集上产生更好的性能。我们希望为较小的学习率在集成中添加更多的树将进一步提高性能。

 >0.0001 0.800 (0.038)>0.0010 0.811 (0.035)>0.0100 0.859 (0.035)>0.1000 0.925 (0.031)>1.0000 0.928 (0.025)

为每个配置的学习率的准确度分数分布创建了一个盒须图。

我们可以看到模型性能随着学习率的增加一直增加到 1.0 的大值的总体趋势。

LightGBM 学习率与分类准确率的箱线图

### 探索提升类型

LightGBM 的一个特性是它支持许多不同的提升算法，称为提升类型。

boosting 类型可以通过“ boosting_type ”参数指定，并使用字符串来指定类型。选项包括：

• gbdt '：梯度提升决策树（GDBT）。

• dart '：Dropouts 满足多重加法回归树 (DART)。

• goss '：基于梯度的单边采样 (GOSS)。

默认是GDBT，这是经典的梯度提升算法。

DART 在 2015 年题为“ DART：Dropouts meet Multiple Additive Regression Trees ”的论文中有所描述，顾名思义，将深度学习的dropout概念添加到了多重加法回归树 (MART) 算法中，这是梯度提升决策的前身树木。

这个算法有很多名字，包括梯度树增强、提升树和多重加性回归树 (MART)。我们用后者来指代这个算法。

— DART：Dropouts meet Multiple Additive Regression Trees，2015。

GOSS 是随 LightGBM 论文和图书馆一起引入的。该方法试图仅使用导致大误差梯度的实例来更新模型并丢弃其余实例。

……我们排除了很大一部分具有小梯度的数据实例，仅使用其余部分来估计信息增益。

— LightGBM：一种高效的梯度提升决策树，2017 年。

下面的示例将合成分类数据集上的 LightGBM 与三个关键的提升技术进行了比较。

# explore lightgbm boosting type effect on performance

from numpy import arange

from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from lightgbm import LGBMClassifier

from matplotlib import pyplot

# get the dataset

def get_dataset():

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

return X, y

# get a list of models to evaluate

def get_models():

models = dict()

types = ['gbdt', 'dart', 'goss']

for t in types:

models[t] = LGBMClassifier(boosting_type=t)

return models

# evaluate a give model using cross-validation

def evaluate_model(model):

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

return scores

# define dataset

X, y = get_dataset()

# get the models to evaluate

models = get_models()

# evaluate the models and store results

results, names = list(), list()

for name, model in models.items():

scores = evaluate_model(model)

results.append(scores)

names.append(name)

print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))

# plot model performance for comparison

pyplot.boxplot(results, labels=names, showmeans=True)

pyplot.show()


运行示例首先报告每个配置的提升类型的平均准确度。

注意：您的结果可能会因算法或评估程序的随机性或数值精度的差异而有所不同。考虑多次运行该示例并比较平均结果。

在这种情况下，我们可以看到默认的 boosting 方法比其他两种被评估的技术表现得更好。

 >gbdt 0.925 (0.031)>dart 0.912 (0.028)>goss 0.918 (0.027)

为每个配置的提升方法的准确度分数分布创建了一个箱线图，允许直接比较这些技术。

LightGBM Boosting 类型与分类精度的箱线图

## 概括

在本教程中，您了解了如何开发用于分类和回归的 Light Gradient Boosted Machine 集成。

具体来说，你学到了：

• Light Gradient Boosted Machine (LightGBM) 是随机梯度提升集成算法的高效开源实现。

• 如何使用 scikit-learn API 开发用于分类和回归的 LightGBM 集成。

• 如何探索 LightGBM 模型超参数对模型性能的影响。

lightgbm知识就为大家介绍到这里了，欢迎各位同学报名<python风控建模实战lendingclub>,学习更多集成树算法相关知识 ：https://edu.csdn.net/course/detail/30742

展开全文
• 一个实验性的Python软件包，使用LightGBM和Optuna重新实现AutoGBT
• 资源分类：Python库 所属语言：Python 资源全名：lightgbm-2.3.0-py2.py3-none-manylinux1_x86_64.whl 资源来源：官方 安装方法：https://lanzao.blog.csdn.net/article/details/101784059
• LightGBM是2017年由微软推出的可扩展机器学习系统，是微软旗下DMKT的一个开源项目，由2014年首届阿里巴巴大数据竞赛获胜者之一柯国霖老师带领开发。它是一款基于GBDT（梯度提升决策树）算法的分布式梯度提升框架，...

## 1 LightGBM的介绍

LightGBM是2017年由微软推出的可扩展机器学习系统，是微软旗下DMKT的一个开源项目，由2014年首届阿里巴巴大数据竞赛获胜者之一柯国霖老师带领开发。它是一款基于GBDT（梯度提升决策树）算法的分布式梯度提升框架，为了满足缩短模型计算时间的需求，LightGBM的设计思路主要集中在减小数据对内存与计算性能的使用，以及减少多机器并行计算时的通讯代价。
LightGBM可以看作是XGBoost的升级豪华版，在获得与XGBoost近似精度的同时，又提供了更快的训练速度与更少的内存消耗。正如其名字中的Light所蕴含的那样，LightGBM在大规模数据集上跑起来更加优雅轻盈，一经推出便成为各种数据竞赛中刷榜夺冠的神兵利器。
LightGBM的主要优点：

1. 简单易用。提供了主流的Python\C++\R语言接口，用户可以轻松使用LightGBM建模并获得相当不错的效果；
2. 高效可扩展。在处理大规模数据集时高效迅速、高准确度，对内存等硬件资源要求不高；
3. 鲁棒性强。相较于深度学习模型不需要精细调参便能取得近似的效果；
4. LightGBM直接支持缺失值与类别特征，无需对数据额外进行特殊处理。
LightGBM的主要缺点：
5. 相对于深度学习模型无法对时空位置建模，不能很好地捕获图像、语音、文本等高维数据；
6. 在拥有海量训练数据，并能找到合适的深度学习模型时，深度学习的精度可以遥遥领先LightGBM。

## 2 LightGBM的应用

LightGBM在机器学习与数据挖掘领域有着极为广泛的应用。据统计LightGBM模型自2016到2019年在Kaggle平台上累积获得数据竞赛前三名三十余次，其中包括CIKM2017 AnalytiCup、IEEE Fraud Detection等知名竞赛。这些竞赛来源于各行各业的真实业务，这些竞赛成绩表明LightGBM具有很好的可扩展性，在各类不同问题上都可以取得非常好的效果。

同时，LightGBM还被成功应用在工业界与学术界的各种问题中。例如金融风控、购买行为识别、交通流量预测、环境声音分类、基因分类、生物成分分析等诸多领域。虽然领域相关的数据分析和特性工程在这些解决方案中也发挥了重要作用，但学习者与实践者对LightGBM的一致选择表明了这一软件包的影响力与重要性。

## 3 LightGBM的重要参数

### 3.1 基本参数调整

1. num_leaves参数 这是控制树模型复杂度的主要参数，一般的我们会使num_leaves小于（2的max_depth次方），以防止过拟合。由于LightGBM是leaf-wise建树与XGBoost的depth-wise建树方法不同，num_leaves比depth有更大的作用。、
2. min_data_in_leaf 这是处理过拟合问题中一个非常重要的参数. 它的值取决于训练数据的样本个树和 num_leaves参数. 将其设置的较大可以避免生成一个过深的树, 但有可能导致欠拟合. 实际应用中, 对于大数据集, 设置其为几百或几千就足够了.
3. max_depth 树的深度，depth 的概念在 leaf-wise 树中并没有多大作用, 因为并不存在一个从 leaves 到 depth 的合理映射。

### 3.2 针对训练速度的参数调整

1. 通过设置 bagging_fraction 和 bagging_freq 参数来使用 bagging 方法。
2. 通过设置 feature_fraction 参数来使用特征的子抽样。
3. 选择较小的 max_bin 参数。
4. 使用 save_binary 在未来的学习过程对数据加载进行加速。

### 3.3 针对准确率的参数调整

1. 使用较大的 max_bin （学习速度可能变慢）
2. 使用较小的 learning_rate 和较大的 num_iterations
3. 使用较大的 num_leaves （可能导致过拟合）
4. 使用更大的训练数据
5. 尝试 dart 模式

### 3.4 针对过拟合的参数调整

1. 使用较小的 max_bin
2. 使用较小的 num_leaves
3. 使用 min_data_in_leaf 和 min_sum_hessian_in_leaf
4. 通过设置 bagging_fraction 和 bagging_freq 来使用 bagging
5. 通过设置 feature_fraction 来使用特征子抽样
6. 使用更大的训练数据
7. 使用 lambda_l1, lambda_l2 和 min_gain_to_split 来使用正则
8. 尝试 max_depth 来避免生成过深的树

## 4 LightGBM原理

LightGBM底层实现了GBDT算法，并且添加了一系列的新特性

1. 基于直方图算法进行优化，使数据存储更加方便、运算更快、鲁棒性强、模型更加稳定等。
2. 提出了带深度限制的 Leaf-wise 算法，抛弃了大多数GBDT工具使用的按层生长 (level-wise) 的决策树生长策略，而使用了带有深度限制的按叶子生长策略，可以降低误差，得到更好的精度。
3. 提出了单边梯度采样算法，排除大部分小梯度的样本，仅用剩下的样本计算信息增益，它是一种在减少数据量和保证精度上平衡的算法。
4. 提出了互斥特征捆绑算法，高维度的数据往往是稀疏的，这种稀疏性启发我们设计一种无损的方法来减少特征的维度。通常被捆绑的特征都是互斥的（即特征不会同时为非零值，像one-hot），这样两个特征捆绑起来就不会丢失信息。

LightGBM是基于CART树的集成模型，它的思想是串联多个决策树模型共同进行决策。

那么如何串联呢？LightGBM采用迭代预测误差的方法串联。举个通俗的例子，我们现在需要预测一辆车价值3000元。我们构建决策树1训练后预测为2600元，我们发现有400元的误差，那么决策树2的训练目标为400元，但决策树2的预测结果为350元，还存在50元的误差就交给第三棵树……以此类推，每一颗树用来估计之前所有树的误差，最后所有树预测结果的求和就是最终预测结果！
LightGBM的基模型是CART回归树，它有两个特点：（1）CART树，是一颗二叉树。（2）回归树，最后拟合结果是连续值。
LightGBM模型可以表示为以下形式，我们约定ft(x)表示前t颗树的和，ht(x)表示第t颗决策树，模型定义如下：

由于模型递归生成，第步的模型由第步的模型形成，可以写成：

每次需要加上的树是之前树求和的误差：

我们每一步只要拟合一颗输出为的CART树加到就可以了。

展开全文
• LightGBM是2017年初Microsoft开源的高效快速、分布式学习的梯度提升树算法，可以用于分类、回归和排序。相对比陈天奇开发的XGBoost算法，更加快速、内存消耗少。 将分几个部分来介绍： 监督学习基础 梯度提升树算法...

LightGBM是2017年初Microsoft开源的高效快速、分布式学习的梯度提升树算法，可以用于分类、回归和排序。相对比陈天奇开发的XGBoost算法，更加快速、内存消耗少。
将分几个部分来介绍：

• 监督学习基础
• 梯度提升树算法
• LightGBM与Xgboost对比
• 深入理解LightGBM的直方图算法及直方图作差加速
• LightGBM的树生长方式——leaf-wise
• LightGBM的参数说明及使用方法

# 监督学习基础

对监督学习来说，通过训练数据和标签学习一个模型，再用模型来预测未知类型的样本。

• 训练数据

D = { ( X i , y i ) } ( ∣ D ∣ = n , X i ∈ R d , y i ∈ R ) \mathcal D = \{(X_i,y_i)\} \quad (|\mathcal D|=n,X_i \in \R^d,y_i \in \R)

• 模型 model

线性模型： y ^ = ∑ j w j x i j \hat{y}=\sum_jw_jx_{ij} ，其中 $x \in R^d$

• 参数是 w w 是每个特征对应的权重： Θ = { w j ∣ j = 1 , 2 , . . . d } \Theta = \{w_j|j=1,2,...d\} ，参数是我们要在训练数据中学到的，也就是要求的未知量。

• 损失函数（loss function）

• 平方损失： l ( y i , y i ^ ) = ( y i − y i ^ ) 2 l(y_i,\hat{y_i})=(y_i-\hat{y_i})^2

• Logistic loss： l ( y i , y ^ ) = y i l n ( 1 + e − y i ^ ) ) + ( 1 − y i ) l n ( 1 + e y i ^ ) l(y_i,\hat{y})=y_iln(1+e^{-\hat{y_i}}))+(1-y_i)ln(1+e^{\hat{y_i}}) ，实质上是通过sigmoid函数转化过来的。

• 正则化

• L2正则化： Ω ( w ) = λ ∣ ∣ w ∣ ∣ 2 \Omega(w) = \lambda||w||^2
• L1正则化： Ω ( w ) = λ ∣ ∣ w ∣ ∣ 1 \Omega(w) = \lambda||w||_1

对于监督学习，定义一个目标函数Obj是参数 Θ \Theta 的函数损失函数和正则化项的和，损失函数表示模型对训练数据的拟合程度，正则化项则表示模型的复杂程度：
O b j ( Θ ) = L ( Θ ) + Ω ( Θ ) Obj(\Theta) = L(\Theta) + \Omega(\Theta)

# 二、梯度提升树算法

对于集成回归树，定义模型，假设一共有K棵树：每棵树都属于回归树空间：
y i ^ = ∑ k = 1 K f k ( x i ) , f k ∈ F w h e r e F = { f ( X ) = w q ( X ) } \hat{y_i} = \sum_{k=1}^Kf_k(x_i), \quad f_k \in \mathcal{F} \\ where \quad \mathcal F= \{f(X)=w_{q(X)}\}
参数是由两部分组成，一个是树结构（Tree structure）另一个是每个叶子节点的权重（预测值），这两个参数统一到用一个 f k f_k 表示，所以并不学习单独学习权重 w j ∈ R d w_j \in R^d
Θ = { f 1 , f 2 , . . . , f K } \Theta = \{f_1,f_2,...,f_K \}
再定义目标函数：
O b j = ∑ i = 1 n l ( y i , y i ^ ) + ∑ k = 1 K Ω ( f k ) Obj = \sum_{i=1}^nl(y_i,\hat{y_i})+ \sum_{k=1}^K\Omega(f_k)
接下来就是如果定义一棵树的复杂度了，我们知道决策树是启发式的，比如通过信息增益来进行分割(split by Information gain)、限制最大树深(max depth)、给决策树剪枝(Prune tree)及平滑叶节点(smooth leaf nodes)，这些启发式的是可以对应训练目标的，以上四个分别对应：training loss、限制函数空间的大小、通过节点个数来正则化、L2正则化。所以可以借鉴决策树，来定义集成树的正则化项，最终也是使用叶节点个数和L2正则化两项来定义，当然也可以有其它的定义方法。

目标函数我们已经有了，接下来是如何学习的问题，我们并没有使用SGD（LR模型的训练就是使用SGD的），这是由于参数是一个树模型，而不仅仅是一个向量。学习树的结构比传统的优化要难，传统的优化可以用梯度的方法。并且一下学习完所有的树是非常困难的。所以我们使用加法的方式，修正已经学习的结果，一次添加一个树。

对于Gain理论上是正值，如果是负值，说明损失减小量已经小于正则化项了，这个时候就stop early。

实际中，在加法模型中： y i ^ ( t ) = y i ^ ( t − 1 ) + ϵ f t ( x i ) \hat{y_i}^{(t)} = \hat{y_i}^{(t-1)} + \epsilon f_t(x_i) ，这个 ϵ \epsilon 就是step-size或shinkage，通常是0.1，每一步并不是完全优化，调小可以防止过拟合，也就是每次迭代只增加一点点。

总结：

• 在每次迭代时都增加一个新树

• 在迭代前计算每个样本的 g i g_i h i h_i 这里有个问题是初始化的是 y i ^ \hat{y_i} 还是 w i w_i ??

• 采用贪婪的方式生成一棵树

展开全文
• M1芯片mac安装xgboost和lightgbm 首先需要配置罗赛塔2的环境（否则的话不能安装默认为Intel芯片的库） mac终端下执行命令： /usr/sbin/softwareupdate --install-rosetta --agree-to-license 然后基于国内源安装...
• ## python lightgbm使用

千次阅读 2021-12-04 09:41:58
前言： 之前在做信贷逾期的...实验效显示，lightgbm模型的效果优于xgboost模型，在此记录lightgbm模型。 心得体会： 在参数是正常范围内的前提下，模型调参，不会显著模型的预测的结果。个人认为，解决方案大...
• LightGBM于2016年10月17日发布，是微软分布式机器学习工具包（DMTK）项目的一部分。它快速且分布式的设计使得训练速度更快且内存使用率更低。它具有支持GPU、采用并行学习以及能够处理大型数据集等能力。LightGBM 在...
• 久前微软 DMTK (分布式机器学习工具包)团队在 GitHub 上开源了性能超越其他 boosting 工具的 LightGBM知乎上有近千人关注“如何看待微软开源的 LightGBM？”问题，被评价为“速度惊人”，“非常有启发”，“支持...
• 资源来自pypi官网。 资源全名：lightgbm-2.2.3-py2.py3-none-win_amd64.whl
• lightgbm调参
• Lightgbm 是微软一款开源的随机森林学习库，该库 的牛逼之处就不用我们说明 啦。但是 目前网上关于Lightgbm的教程都是基于python或者R语言的，关于如何使用Lightgbm的C/C++接口却少有资料。刚好手上有个项目需要在线...
• selection import train_test_split from sklearn.metrics import roc_auc_score,mean_squared_error import numpy as np from matplotlib import pyplot as plt import lightgbm as lgb import pickle from scipy....
• LigthGBM是boosting集合模型中的新进成员，由微软...LightGBM在很多方面会比XGBoost表现的更为优秀。它有以下优势： 更快的训练效率 低内存使用 更高的准确率 支持并行化学习 可处理大规模数据 支持直接使用catego...
• 炼丹笔记：记录我们的成长轨迹LightGBM如何保存模型？用lgb保存模型遇到了几个坑，在这里记录一下。在用Lightgbm.LGBMClassifier训练生成模型时，Scikit-learn 官网上建议的两种方式：1.pickle方式这里我写了保存和...
• 什么是 LightGBM，如何实现它？ 如何微调参数？
• 本篇详细讲解LightGBM的工程应用方法。LightGBM是微软开发的boosting集成模型，和XGBoost一样是对GBDT的优化和高效实现，但它很多方面比XGBoost有着更为优秀的表现。
• ## LightGBM安装教程

万次阅读 多人点赞 2017-12-14 23:24:30
LightGBM安装教程，使用教程
• 在这篇文章中，我将演示如何将 Focal Loss 合并到 LightGBM 分类器中以进行多类分类。代码可在GitHub上找到。 二元分类 对于二元分类问题（标签 0/1），Focal Loss 函数定义如下： Eq.1 焦点损失函数 其中_pₜ_是...
• LightGBM是GBDT的进化版本，在效率、内存、准确率方面表现优秀。本文讲解LightGBM的动机、优缺点及优化点、决策树算法及生长策略、类别性特征支持、并行支持与优化等重要知识点。
• 大家好，在100天搞定机器学习|Day63 彻底掌握 LightGBM一文中，...最近我发现Huggingface与Streamlit好像更配，所以就开发了一个简易的 LightGBM 可视化调参的小工具，旨在让大家可以更深入地理解 LightGBM。 网址： ...
• lightGBM自定义损失函数loss和metric 转载于：https://www.cnblogs.com/kayy/p/10824392.html def self_loss(preds, train_data): labels = train_data.get_label() k = labels - preds #对labels求导 grad = np....
• ​ 本文主要针对LightGBM进行介绍及代码解读。主要包含学习资料有 LightGBM 中文文档中文文档：https://lightgbm.apachecn.org/#/ 天池学习笔记：AI训练营机器学习-阿里云天池 ​
• 安装pip install lightgbmgitup网址：https://github.com/Microsoft/LightGBM中文教程http://lightgbm.apachecn.org/cn/latest/index.htmllightGBM简介xgboost的出现，让数据民工们告别了传统的机器学习算法们：RF、...
• import lightgbm as lgb from sklearn import datasets from sklearn.model_selection import train_test_split iris = datasets.load_iris() X = iris.data y = iris.target X_train, X_test, y_train, y_test =...
• 文章目录前言一、学习知识点概要二、学习内容2.1 LightGBM的应用及优缺点2.1.1 LightGBM的应用2.1.2 LightGBM的优缺点2.2 基于英雄联盟数据集的LightGBM分类实战Step1：下载数据集，导入库函数Step2：数据读取/载入...

...