Chapter 5: Contents
- 5.1 Installation & Quick Start
- 5.2 Core Parameters — Complete Reference
- 5.3 The Bias-Variance Tradeoff in LightGBM
- 5.4 Early Stopping — The Key to Optimal M
- 5.5 Categorical Feature Handling
- 5.6 Feature Importance
- 5.7 Hyperparameter Tuning Strategy
- 5.8 Cross-Validation
- 5.9 sklearn API
- 5.10 Saving and Loading Models
- 5.11 Common Pitfalls & Fixes
- 5.12 Complete Parameter Cheat Sheet
5.1 Installation & Quick Start
pip install lightgbm
import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# LightGBM Dataset format (memory-efficient, stores histograms)
train_data = lgb.Dataset(X_train, label=y_train)
val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)
# Parameters
params = {
'objective': 'binary',
'metric': 'binary_logloss',
'num_leaves': 31,
'learning_rate': 0.05,
'n_estimators': 100,
'verbose': -1
}
# Train
model = lgb.train(
params,
train_data,
valid_sets=[val_data],
callbacks=[lgb.early_stopping(stopping_rounds=20)]
)
# Predict
y_pred = (model.predict(X_val) > 0.5).astype(int)
print(f"Accuracy: {accuracy_score(y_val, y_pred):.4f}")
5.2 Core Parameters — Complete Reference
Task / Objective
| Parameter | Value | Use Case |
|---|---|---|
objective | 'regression' | Regression (MSE loss) |
'regression_l1' | Regression (MAE loss) | |
'binary' | Binary classification | |
'multiclass' | Multi-class classification | |
'lambdarank' | Learning to rank | |
num_class | integer | Required for multiclass |
Tree Structure
| Parameter | Default | Meaning | Tuning Advice |
|---|---|---|---|
num_leaves | 31 | Max leaves per tree | Primary complexity control. ≤ 2^{max_depth}. Start at 31. |
max_depth | -1 (no limit) | Max tree depth | Set if num_leaves is too permissive. |
min_data_in_leaf | 20 | Min samples in a leaf | Increase for large datasets or noisy data. |
min_sum_hessian_in_leaf | 1e-3 | Min sum of H_j in a leaf | Equivalent to min sample weight. |
max_bin | 255 | Histogram bins B | Higher = more accurate splits, slower. |
min_gain_to_split | 0 | γ in gain formula | Increase to prune low-gain splits. |
Learning
| Parameter | Default | Meaning | Tuning Advice |
|---|---|---|---|
learning_rate ν | 0.1 | Shrinkage per tree | Lower = better generalization, needs more trees. |
n_estimators | 100 | Number of trees M | Use early stopping instead of fixed value. |
early_stopping_rounds | — | Stop if no improvement for k rounds | Set to ~50–100. |
Regularization
| Parameter | Default | Meaning | Effect |
|---|---|---|---|
lambda_l1 | 0 | L1 regularization on weights | Sparse leaf weights |
lambda_l2 | 0 | L2 regularization (λ) | Smooths leaf weights, reduces overfitting |
min_gain_to_split γ | 0 | Minimum split gain | Prunes unprofitable splits |
num_leaves | 31 | — | Lower = simpler model |
Sampling (GOSS & Bagging)
| Parameter | Default | Meaning |
|---|---|---|
bagging_fraction | 1.0 | Fraction of data sampled per tree (like random forest) |
bagging_freq | 0 | Bagging frequency (0 = disabled) |
feature_fraction | 1.0 | Fraction of features per tree (column subsampling) |
top_rate | 0.2 | GOSS a: top gradient fraction kept |
other_rate | 0.1 | GOSS b: random fraction from the rest |
boosting | 'gbdt' | Set to 'goss' to enable GOSS |
5.3 The Bias-Variance Tradeoff in LightGBM
🔧 To reduce overfitting (high variance):
- Decrease
num_leaves - Increase
min_data_in_leaf - Decrease
learning_rate+ increasen_estimators - Increase
lambda_l1,lambda_l2 - Decrease
feature_fraction,bagging_fraction
🚀 To reduce underfitting (high bias):
- Increase
num_leaves - Decrease
min_data_in_leaf - Increase
learning_rate - Decrease regularization
- Increase
feature_fraction,bagging_fractionto 1.0
5.4 Early Stopping — The Key to Optimal M
Choosing the number of trees M is critical: Too few → underfitting, Too many → overfitting.
model = lgb.train(
params,
train_data,
num_boost_round=2000, # Upper bound on trees
valid_sets=[train_data, val_data],
valid_names=['train', 'val'],
callbacks=[
lgb.early_stopping(stopping_rounds=50), # Stop after 50 rounds without improvement
lgb.log_evaluation(period=50) # Print every 50 rounds
]
)
print(f"Best iteration: {model.best_iteration}")
print(f"Best val score: {model.best_score}")
The optimal M is
model.best_iteration. This is the most important practical trick for LightGBM.
5.5 Categorical Feature Handling
# Specify categorical columns
train_data = lgb.Dataset(X_train, label=y_train,
categorical_feature=[0, 3, 7]) # column indices
# Or with pandas DataFrames
import pandas as pd
df = pd.DataFrame(X_train)
df[0] = df[0].astype('category') # Mark as categorical type
train_data = lgb.Dataset(df, label=y_train)
Internally, LightGBM uses an optimal split for categoricals. Instead of thresholding (feature ≤ s), it groups categories C ⊆ {0,1,...,K-1} into left vs right:
\[
\text{Gain}(C) = \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R + \lambda} - \frac{G^2}{H + \lambda}
\]
LightGBM finds the optimal C using a sorted-gradient heuristic in O(K log K) time.
5.6 Feature Importance
# 'split': how many times feature was used to split # 'gain': total gain from all splits using this feature importance_split = model.feature_importance(importance_type='split') importance_gain = model.feature_importance(importance_type='gain') # Plot lgb.plot_importance(model, importance_type='gain', max_num_features=20)
Prefer
gain importance — it weights features by how much they improve the objective, not just how often they're used (split counts can favor low-cardinality features).
5.7 Hyperparameter Tuning Strategy
Phase 1 — Set learning rate high, find rough M: learning_rate=0.1, use early stopping.
Phase 2 — Tune tree structure: num_leaves ∈ [15,31,63,127], min_data_in_leaf ∈ [10,20,50,100]
Phase 3 — Tune regularization: lambda_l2 ∈ [0,0.1,1,10], lambda_l1 ∈ [0,0.1,1], min_gain_to_split ∈ [0,0.1,1]
Phase 4 — Tune sampling: feature_fraction, bagging_fraction, bagging_freq
Phase 5 — Lower learning rate, retrain: learning_rate=0.01 or 0.05
Using Optuna for Automated Tuning
import optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)
def objective(trial):
params = {
'objective': 'binary',
'metric': 'auc',
'num_leaves': trial.suggest_int('num_leaves', 20, 300),
'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
'lambda_l2': trial.suggest_float('lambda_l2', 1e-3, 10.0, log=True),
'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
'verbose': -1
}
model = lgb.train(params, train_data,
valid_sets=[val_data],
callbacks=[lgb.early_stopping(50), lgb.log_evaluation(-1)])
return model.best_score['valid_0']['auc']
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
print(study.best_params)
5.8 Cross-Validation
cv_results = lgb.cv(
params,
train_data,
num_boost_round=1000,
nfold=5,
stratified=True, # For classification
callbacks=[lgb.early_stopping(50)]
)
best_rounds = len(cv_results['valid binary_logloss-mean'])
print(f"Best rounds: {best_rounds}")
print(f"CV score: {cv_results['valid binary_logloss-mean'][-1]:.4f} "
f"± {cv_results['valid binary_logloss-stdv'][-1]:.4f}")
5.9 sklearn API
from lightgbm import LGBMClassifier, LGBMRegressor
clf = LGBMClassifier(
n_estimators=1000,
learning_rate=0.05,
num_leaves=31,
early_stopping_rounds=50,
verbose=-1
)
clf.fit(X_train, y_train,
eval_set=[(X_val, y_val)])
y_pred = clf.predict(X_val)
y_prob = clf.predict_proba(X_val)[:, 1]
5.10 Saving and Loading Models
# Save
model.save_model('model.lgb')
# Load
model_loaded = lgb.Booster(model_file='model.lgb')
y_pred = model_loaded.predict(X_val)
# Convert to JSON (human-readable, useful for inspection)
model.dump_model('model.json')
5.11 Common Pitfalls & Fixes
| Problem | Symptom | Fix |
|---|---|---|
| Overfitting | Train loss ↓, val loss ↑ | Reduce `num_leaves`, increase `min_data_in_leaf`, add regularization |
| Underfitting | Both losses high | Increase `num_leaves`, more trees, decrease regularization |
| Slow training | Long wall time | Enable GOSS, reduce `max_bin`, use `feature_fraction < 1` |
| Memory error | OOM | Reduce `max_bin`, use `two_round_loading=True` for large files |
| Categoricals not working | High error on cat features | Ensure `categorical_feature` param set, or use pandas category dtype |
| NaN predictions | NaN in output | Check for NaN in input features; LightGBM treats NaN as a separate bin |
5.12 Complete Parameter Cheat Sheet
params = {
# Task
'objective': 'binary', # or 'regression', 'multiclass'
'metric': 'auc', # or 'rmse', 'logloss', 'multi_logloss'
'num_class': 1, # only for multiclass
# Tree structure
'num_leaves': 31, # ↑ more complex, ↑ overfit risk
'max_depth': -1, # -1 = no limit
'min_data_in_leaf': 20, # ↑ more regularization
'min_sum_hessian_in_leaf': 1e-3,
# Learning
'learning_rate': 0.05, # ↓ = better generalization
'n_estimators': 1000, # use early stopping
'early_stopping_rounds': 50,
# Regularization
'lambda_l1': 0.0, # L1 on leaf weights
'lambda_l2': 0.0, # L2 on leaf weights
'min_gain_to_split': 0.0, # γ: minimum gain to split
# Sampling
'feature_fraction': 0.8, # column subsampling per tree
'bagging_fraction': 0.8, # row subsampling
'bagging_freq': 5, # bagging every 5 iterations
# Speed & hardware
'max_bin': 255, # histogram bins
'num_threads': -1, # -1 = use all cores
'device_type': 'gpu', # 'cpu' or 'gpu'
'boosting': 'gbdt', # 'gbdt', 'goss', 'dart', 'rf'
# Output
'verbose': -1, # suppress output
'seed': 42,
}
📘 Next Chapter: Mathematical Appendix & Summary
References & Further Reading
- Ke, G., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS 2017.
- LightGBM Official Documentation
- Optuna: Hyperparameter Optimization Framework
- My GitHub: Machine Learning repository