Boruta -- Shadow-Feature Statistical Test

Boruta is a statistically principled, "all-relevant" feature selection method. Unlike most other modules that select a fixed-size subset, Boruta asks a different question: which features are genuinely more important than random noise? It answers this by creating "shadow" copies of every feature, training a model on both real and shadow features, and using a statistical test to decide which real features carry true signal.

How it works

Hyperparameter optimization. A GridSearchCV tunes the tree-based model (RandomForest, ExtraTrees, or XGBoost) on the full feature set. Only tree-based models are supported because Boruta relies on feature_importances_ from the trained model.
Shadow feature generation. For every real feature, a "shadow" copy is created by randomly permuting its values across samples. This destroys any relationship with the target while preserving the marginal distribution.
Iterative importance comparison. Over multiple rounds:
- A model is trained on the combined real + shadow feature set.
- The maximum importance among all shadow features in this round is recorded (the "shadow max").
- Each real feature's importance is compared to the shadow max.
- A hit counter tracks how often each real feature exceeds the shadow max.
Statistical testing. After all rounds, a binomial test (with Bonferroni correction for multiple testing) is applied to each real feature's hit count:
- Confirmed: The feature is significantly more important than random noise at the alpha significance level.
- Tentative: The evidence is inconclusive.
- Rejected: The feature is not significantly better than noise.
Only Confirmed features are returned.
Post-selection evaluation. The selected features are evaluated on dev (cross-validated) and test sets using both a refit and a grid-search + refit strategy, matching the pattern used by RFE and SFS.

Key parameters

Parameter	Default	Description
`model`	`None` (auto)	Tree-based model only (`RandomForest`, `ExtraTrees`, or `XGB`)
`n_inner_splits`	`5`	Cross-validation splits for hyperparameter tuning
`threshold`	`100`	Percentile threshold for shadow-feature comparison (100 = max shadow importance)
`alpha`	`0.05`	Significance level for the statistical test

When to use

Boruta is particularly well-suited when:

You want to find all relevant features rather than a fixed-size subset. This is valuable for interpretability or when downstream models benefit from having every informative feature available.
The dataset has many noise features and you want a principled way to separate signal from noise.
You are uncertain about how many features to keep and prefer letting a statistical test decide.

Limitations

Only supports tree-based models (RandomForest, ExtraTrees, XGBoost). CatBoost is not supported because the BorutaPy implementation requires sklearn-style feature_importances_.
Runtime grows with the number of features (shadow features double the feature space) and the number of Boruta iterations.
The threshold parameter (percentile of shadow importances) can affect sensitivity: lowering it below 100 makes the test more conservative.
Does not support time-to-event targets.

Note

There is currently no runnable example for Boruta in the documentation. See the Workflow & Modules overview for how to include Boruta in a pipeline.