LightGBM: In Practice — Parameters, Training & Tuning

LightGBM Practical Guide — LightGBM in Practice: Parameters, Training & Tuning

Chapter 5: Contents

5.1 Installation & Quick Start
5.2 Core Parameters — Complete Reference
5.3 The Bias-Variance Tradeoff in LightGBM
5.4 Early Stopping — The Key to Optimal M
5.5 Categorical Feature Handling
5.6 Feature Importance
5.7 Hyperparameter Tuning Strategy
5.8 Cross-Validation
5.9 sklearn API
5.10 Saving and Loading Models
5.11 Common Pitfalls & Fixes
5.12 Complete Parameter Cheat Sheet

5.1 Installation & Quick Start

pip install lightgbm

import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# LightGBM Dataset format (memory-efficient, stores histograms)
train_data = lgb.Dataset(X_train, label=y_train)
val_data   = lgb.Dataset(X_val,   label=y_val, reference=train_data)

# Parameters
params = {
    'objective':    'binary',
    'metric':       'binary_logloss',
    'num_leaves':   31,
    'learning_rate': 0.05,
    'n_estimators': 100,
    'verbose':      -1
}

# Train
model = lgb.train(
    params,
    train_data,
    valid_sets=[val_data],
    callbacks=[lgb.early_stopping(stopping_rounds=20)]
)

# Predict
y_pred = (model.predict(X_val) > 0.5).astype(int)
print(f"Accuracy: {accuracy_score(y_val, y_pred):.4f}")

5.2 Core Parameters — Complete Reference

Task / Objective

Parameter	Value	Use Case
`objective`	`'regression'`	Regression (MSE loss)
	`'regression_l1'`	Regression (MAE loss)
	`'binary'`	Binary classification
	`'multiclass'`	Multi-class classification
	`'lambdarank'`	Learning to rank
`num_class`	integer	Required for `multiclass`

Tree Structure

Parameter	Default	Meaning	Tuning Advice
`num_leaves`	31	Max leaves per tree	Primary complexity control. ≤ 2^{max_depth}. Start at 31.
`max_depth`	-1 (no limit)	Max tree depth	Set if `num_leaves` is too permissive.
`min_data_in_leaf`	20	Min samples in a leaf	Increase for large datasets or noisy data.
`min_sum_hessian_in_leaf`	1e-3	Min sum of H_j in a leaf	Equivalent to min sample weight.
`max_bin`	255	Histogram bins B	Higher = more accurate splits, slower.
`min_gain_to_split`	0	γ in gain formula	Increase to prune low-gain splits.

Learning

Parameter	Default	Meaning	Tuning Advice
`learning_rate` ν	0.1	Shrinkage per tree	Lower = better generalization, needs more trees.
`n_estimators`	100	Number of trees M	Use early stopping instead of fixed value.
`early_stopping_rounds`	—	Stop if no improvement for k rounds	Set to ~50–100.

Regularization

Parameter	Default	Meaning	Effect
`lambda_l1`	0	L1 regularization on weights	Sparse leaf weights
`lambda_l2`	0	L2 regularization (λ)	Smooths leaf weights, reduces overfitting
`min_gain_to_split` γ	0	Minimum split gain	Prunes unprofitable splits
`num_leaves`	31	—	Lower = simpler model

Sampling (GOSS & Bagging)

Parameter	Default	Meaning
`bagging_fraction`	1.0	Fraction of data sampled per tree (like random forest)
`bagging_freq`	0	Bagging frequency (0 = disabled)
`feature_fraction`	1.0	Fraction of features per tree (column subsampling)
`top_rate`	0.2	GOSS a: top gradient fraction kept
`other_rate`	0.1	GOSS b: random fraction from the rest
`boosting`	`'gbdt'`	Set to `'goss'` to enable GOSS

5.3 The Bias-Variance Tradeoff in LightGBM

🔧 To reduce overfitting (high variance):

Decrease num_leaves
Increase min_data_in_leaf
Decrease learning_rate + increase n_estimators
Increase lambda_l1, lambda_l2
Decrease feature_fraction, bagging_fraction

🚀 To reduce underfitting (high bias):

Increase num_leaves
Decrease min_data_in_leaf
Increase learning_rate
Decrease regularization
Increase feature_fraction, bagging_fraction to 1.0

5.4 Early Stopping — The Key to Optimal M

Choosing the number of trees M is critical: Too few → underfitting, Too many → overfitting.

model = lgb.train(
    params,
    train_data,
    num_boost_round=2000,           # Upper bound on trees
    valid_sets=[train_data, val_data],
    valid_names=['train', 'val'],
    callbacks=[
        lgb.early_stopping(stopping_rounds=50),  # Stop after 50 rounds without improvement
        lgb.log_evaluation(period=50)             # Print every 50 rounds
    ]
)

print(f"Best iteration: {model.best_iteration}")
print(f"Best val score: {model.best_score}")

The optimal M is model.best_iteration. This is the most important practical trick for LightGBM.

5.5 Categorical Feature Handling

# Specify categorical columns
train_data = lgb.Dataset(X_train, label=y_train,
                          categorical_feature=[0, 3, 7])  # column indices

# Or with pandas DataFrames
import pandas as pd
df = pd.DataFrame(X_train)
df[0] = df[0].astype('category')  # Mark as categorical type
train_data = lgb.Dataset(df, label=y_train)

Internally, LightGBM uses an optimal split for categoricals. Instead of thresholding (feature ≤ s), it groups categories C ⊆ {0,1,...,K-1} into left vs right:

\[ \text{Gain}(C) = \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R + \lambda} - \frac{G^2}{H + \lambda} \]

LightGBM finds the optimal C using a sorted-gradient heuristic in O(K log K) time.

5.6 Feature Importance

# 'split': how many times feature was used to split
# 'gain': total gain from all splits using this feature
importance_split = model.feature_importance(importance_type='split')
importance_gain  = model.feature_importance(importance_type='gain')

# Plot
lgb.plot_importance(model, importance_type='gain', max_num_features=20)

Prefer gain importance — it weights features by how much they improve the objective, not just how often they're used (split counts can favor low-cardinality features).

5.7 Hyperparameter Tuning Strategy

Phase 1 — Set learning rate high, find rough M: learning_rate=0.1, use early stopping.

Phase 2 — Tune tree structure: num_leaves ∈ [15,31,63,127], min_data_in_leaf ∈ [10,20,50,100]

Phase 3 — Tune regularization: lambda_l2 ∈ [0,0.1,1,10], lambda_l1 ∈ [0,0.1,1], min_gain_to_split ∈ [0,0.1,1]

Phase 4 — Tune sampling: feature_fraction, bagging_fraction, bagging_freq

Phase 5 — Lower learning rate, retrain: learning_rate=0.01 or 0.05

Using Optuna for Automated Tuning

import optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)

def objective(trial):
    params = {
        'objective':          'binary',
        'metric':             'auc',
        'num_leaves':         trial.suggest_int('num_leaves', 20, 300),
        'min_data_in_leaf':   trial.suggest_int('min_data_in_leaf', 10, 100),
        'learning_rate':      trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'lambda_l2':          trial.suggest_float('lambda_l2', 1e-3, 10.0, log=True),
        'feature_fraction':   trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction':   trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq':       trial.suggest_int('bagging_freq', 1, 10),
        'verbose':            -1
    }
    model = lgb.train(params, train_data,
                      valid_sets=[val_data],
                      callbacks=[lgb.early_stopping(50), lgb.log_evaluation(-1)])
    return model.best_score['valid_0']['auc']

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
print(study.best_params)

5.8 Cross-Validation

cv_results = lgb.cv(
    params,
    train_data,
    num_boost_round=1000,
    nfold=5,
    stratified=True,        # For classification
    callbacks=[lgb.early_stopping(50)]
)

best_rounds = len(cv_results['valid binary_logloss-mean'])
print(f"Best rounds: {best_rounds}")
print(f"CV score: {cv_results['valid binary_logloss-mean'][-1]:.4f} "
      f"± {cv_results['valid binary_logloss-stdv'][-1]:.4f}")

5.9 sklearn API

from lightgbm import LGBMClassifier, LGBMRegressor

clf = LGBMClassifier(
    n_estimators=1000,
    learning_rate=0.05,
    num_leaves=31,
    early_stopping_rounds=50,
    verbose=-1
)
clf.fit(X_train, y_train,
        eval_set=[(X_val, y_val)])

y_pred = clf.predict(X_val)
y_prob = clf.predict_proba(X_val)[:, 1]

5.10 Saving and Loading Models

# Save
model.save_model('model.lgb')

# Load
model_loaded = lgb.Booster(model_file='model.lgb')
y_pred = model_loaded.predict(X_val)

# Convert to JSON (human-readable, useful for inspection)
model.dump_model('model.json')

5.11 Common Pitfalls & Fixes

Problem	Symptom	Fix
Overfitting	Train loss ↓, val loss ↑	Reduce `num_leaves`, increase `min_data_in_leaf`, add regularization
Underfitting	Both losses high	Increase `num_leaves`, more trees, decrease regularization
Slow training	Long wall time	Enable GOSS, reduce `max_bin`, use `feature_fraction < 1`
Memory error	OOM	Reduce `max_bin`, use `two_round_loading=True` for large files
Categoricals not working	High error on cat features	Ensure `categorical_feature` param set, or use pandas category dtype
NaN predictions	NaN in output	Check for NaN in input features; LightGBM treats NaN as a separate bin

5.12 Complete Parameter Cheat Sheet

params = {
    # Task
    'objective':             'binary',      # or 'regression', 'multiclass'
    'metric':                'auc',         # or 'rmse', 'logloss', 'multi_logloss'
    'num_class':             1,             # only for multiclass

    # Tree structure
    'num_leaves':            31,            # ↑ more complex, ↑ overfit risk
    'max_depth':             -1,            # -1 = no limit
    'min_data_in_leaf':      20,            # ↑ more regularization
    'min_sum_hessian_in_leaf': 1e-3,

    # Learning
    'learning_rate':         0.05,          # ↓ = better generalization
    'n_estimators':          1000,          # use early stopping
    'early_stopping_rounds': 50,

    # Regularization
    'lambda_l1':             0.0,           # L1 on leaf weights
    'lambda_l2':             0.0,           # L2 on leaf weights
    'min_gain_to_split':     0.0,           # γ: minimum gain to split

    # Sampling
    'feature_fraction':      0.8,           # column subsampling per tree
    'bagging_fraction':      0.8,           # row subsampling
    'bagging_freq':          5,             # bagging every 5 iterations

    # Speed & hardware
    'max_bin':               255,           # histogram bins
    'num_threads':           -1,            # -1 = use all cores
    'device_type':           'gpu',         # 'cpu' or 'gpu'
    'boosting':              'gbdt',        # 'gbdt', 'goss', 'dart', 'rf'

    # Output
    'verbose':               -1,            # suppress output
    'seed':                  42,
}

📘 Next Chapter: Mathematical Appendix & Summary

References & Further Reading

Ke, G., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS 2017.
LightGBM Official Documentation
Optuna: Hyperparameter Optimization Framework
My GitHub: Machine Learning repository

Arun