4 min readfrom Machine Learning

[P] ibu-boost: a GBDT library where splits are *absolutely* rejected, not just relatively ranked[P]

I built a small gradient-boosted tree library based on the screening transform from "Screening Is Enough" (Nakanishi 2026, arXiv:2604.01178). The paper was originally written for Transformers, but the core idea — replacing relative comparison with absolute-threshold rejection — maps naturally onto GBDT split selection.

Disclaimer: I'm not affiliated with the paper's author. This is an independent implementation that applies the screening idea to GBDTs.

The idea in one paragraph

Every GBDT implementation picks the split with the highest gain among all candidates. This means the tree always splits, even if the best candidate is nearly useless. min_gain_to_split is the standard workaround, but it's an arbitrary hyperparameter that needs tuning per dataset.

ibu-boost replaces this with a screening transform:

raw_gain = G_L^2/(H_L+λ) + G_R^2/(H_R+λ) - G_total^2/(H_total+λ) norm_gain = raw_gain / H_total # N-invariant, O(1) regardless of dataset size s = 1 - exp(-norm_gain / τ) # bounded similarity in [0, 1) ρ = max(1 - r*(1-s), 0)^2 # Trim-and-Square 

If max(ρ) == 0 across all (feature, bin) candidates, the node becomes a leaf automatically — no split is issued. There is no min_gain_to_split to tune.

The threshold behaviour is controlled by s_w (temperature) and s_r (acceptance width), both stored in log-space, and will become learnable in a future release.

What's implemented

  • Two tree types: non-oblivious (standard per-node splits) and oblivious (CatBoost-style symmetric splits — all nodes at the same depth share one split)
  • Gradient boosting with MSE regression and binary log-loss
  • Missing value handling: XGBoost-style learned default direction per split
  • Triton GPU kernels: fused histogram scatter + screening transform, batched multi-node dispatch, full on-device gradient normalisation
  • ScreeningDiagnostics: accept_rate per round — a built-in health check for over/under-rejection
  • ScreeningParamSearch: K-fold grid search over (s_w, s_r)

Benchmark (California Housing, 100 rounds, oblivious tree)

Model RMSE Train time
LightGBM (default) 0.4711 ± 0.0042
ibu-boost (CPU) 0.5286 ± 0.0039 5.34 s
ibu-boost (RTX 4060 Ti) 0.5286 ± 0.0039 1.70 s (3.15x)

Gap to LightGBM is ~12% RMSE. Honest take: this is an early alpha. Part of the gap comes from s_w/s_r being fixed scalars — once they become learnable (Phase 2), the threshold should adapt per dataset. But I also suspect the gap will persist on small, clean datasets like California Housing where over-splitting isn't a real problem. The hypothesis is that absolute rejection pays off more on high-dimensional or noisy data where standard GBDTs tend to overfit via spurious splits. I haven't tested this rigorously yet — if you have a go-to tabular benchmark suite, I'd love to hear about it.

Kernel-level speedup (N=65536, F=8, B=255): 51x over NumPy reference.

Install

pip install ibu-boost # NumPy reference only pip install "ibu-boost[triton]" # + Triton GPU kernels (Linux / Windows CUDA) 

Quick start

from ibu_boost import ScreeningBooster model = ScreeningBooster( n_estimators=100, learning_rate=0.1, max_depth=6, tree_type="oblivious", # CatBoost-style symmetric splits device="cuda", # requires [triton] extra ) model.fit(X_train, y_train) print(f"Accept rate: {model.mean_accept_rate():.1%}") # screening health check 

Links

What I'd like feedback on

  • Screening calibration: Does the absolute-rejection idea feel useful in practice, or does it just move the tuning problem from min_gain_to_split to (s_w, s_r)?
  • Benchmark suggestions: Which tabular datasets or benchmark suites would best stress-test the "auto-stop on noise" property?
  • Triton kernel design: The histogram scatter uses sample-parallel atomic_add, which is non-deterministic. Any tips on deterministic alternatives that don't kill throughput?

Happy to discuss the theory or implementation details.

submitted by /u/Pleasant_Yard_8879
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#financial modeling with spreadsheets
#Excel alternatives for data analysis
#generative AI for data analysis
#large dataset processing
#rows.com
#real-time data collaboration
#no-code spreadsheet solutions
#natural language processing for spreadsheets
#real-time collaboration
#big data management in spreadsheets
#enterprise-level spreadsheet solutions
#conversational data analysis
#cloud-based spreadsheet applications
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#Excel alternatives
#data analysis tools
#data cleaning solutions