Boruta -- Shadow-Feature Statistical Test
Boruta is a statistically principled, "all-relevant" feature selection method. Unlike most other modules that select a fixed-size subset, Boruta asks a different question: which features are genuinely more important than random noise? It answers this by creating "shadow" copies of every feature, training a model on both real and shadow features, and using a statistical test to decide which real features carry true signal.
How it works
-
Hyperparameter optimization. A
GridSearchCVtunes the tree-based model (RandomForest, ExtraTrees, or XGBoost) on the full feature set. Only tree-based models are supported because Boruta relies onfeature_importances_from the trained model. -
Shadow feature generation. For every real feature, a "shadow" copy is created by randomly permuting its values across samples. This destroys any relationship with the target while preserving the marginal distribution.
-
Iterative importance comparison. Over multiple rounds:
- A model is trained on the combined real + shadow feature set.
- The maximum importance among all shadow features in this round is recorded (the "shadow max").
- Each real feature's importance is compared to the shadow max.
- A hit counter tracks how often each real feature exceeds the shadow max.
-
Statistical testing. After all rounds, a binomial test (with Bonferroni correction for multiple testing) is applied to each real feature's hit count:
- Confirmed: The feature is significantly more important than random
noise at the
alphasignificance level. - Tentative: The evidence is inconclusive.
- Rejected: The feature is not significantly better than noise.
Only Confirmed features are returned.
- Confirmed: The feature is significantly more important than random
noise at the
-
Post-selection evaluation. The selected features are evaluated on dev (cross-validated) and test sets using both a refit and a grid-search + refit strategy, matching the pattern used by RFE and SFS.
Key parameters
| Parameter | Default | Description |
|---|---|---|
model |
None (auto) |
Tree-based model only (RandomForest, ExtraTrees, or XGB) |
n_inner_splits |
5 |
Cross-validation splits for hyperparameter tuning |
threshold |
100 |
Percentile threshold for shadow-feature comparison (100 = max shadow importance) |
alpha |
0.05 |
Significance level for the statistical test |
When to use
Boruta is particularly well-suited when:
- You want to find all relevant features rather than a fixed-size subset. This is valuable for interpretability or when downstream models benefit from having every informative feature available.
- The dataset has many noise features and you want a principled way to separate signal from noise.
- You are uncertain about how many features to keep and prefer letting a statistical test decide.
Limitations
- Only supports tree-based models (RandomForest, ExtraTrees, XGBoost). CatBoost
is not supported because the BorutaPy implementation requires sklearn-style
feature_importances_. - Runtime grows with the number of features (shadow features double the feature space) and the number of Boruta iterations.
- The
thresholdparameter (percentile of shadow importances) can affect sensitivity: lowering it below 100 makes the test more conservative. - Does not support time-to-event targets.
Note
There is currently no runnable example for Boruta in the documentation. See the Workflow & Modules overview for how to include Boruta in a pipeline.