Octo
Octo is the core ML module of Octopus. It combines Optuna-based hyperparameter optimization, cross-validated model training, optional ensemble selection, and multiple feature-importance methods into a single task. Most workflows begin and end with an Octo task -- the first to establish a baseline and produce feature importances, the last to train a final model on the refined feature set.
How it works
Hyperparameter optimization
-
Inner cross-validation setup. The train+dev data is divided into inner splits (controlled by
n_inner_splitsandinner_split_seeds). Each Optuna trial trains a Bag of models, one per inner split, and evaluates them on the held-out dev splits. See Nested Cross-Validation for the full picture. -
Optuna optimization. A TPE (Tree-structured Parzen Estimator) sampler explores the hyperparameter space over
n_trialstrials. The firstn_startup_trialsuse random sampling; the rest use multivariate TPE with grouping and constant-liar parallelism. The optimization target is either the combined or averaged dev-set performance across inner splits (controlled byscoring_method). -
MRMR feature subsets (optional). When
n_mrmr_featuresis set, Octo pre-computes MRMR feature subsets of various sizes. Optuna can then sample from these subsets during optimization, effectively searching over both hyperparameters and feature counts simultaneously. -
Constrained HPO (optional). When
max_features > 0, the optimization penalizes trials that use more features than the constraint. Thepenalty_factorcontrols how aggressively excess features are penalized. Only models flagged aschpo_compatiblesupport this mode.
Best bag construction
-
Build the best bag. After optimization, the best trial's hyperparameters are used to train a fresh bag of models (one per inner split) on the full train+dev data. This "best bag" is the primary output model.
-
Feature importance calculation. Feature importances are computed on the best bag using the methods specified in
fi_methods:"permutation": Permutation importance on the dev partition."shap": SHAP values on the dev partition."constant": A baseline method that returns equal importance for all features.
-
Feature selection. Features are selected based on the computed importances, typically those with positive permutation importance.
Ensemble selection (optional)
- Ensemble selection. When
ensemble_selection=True, the topn_ensemble_candidatestrial bags are used as candidates. An ensemble optimization procedure (hill-climbing with replacement) finds the combination of trial bags that maximizes dev-set performance. The resulting ensemble bag replaces the best bag as the primary output.
Key parameters
| Parameter | Default | Description |
|---|---|---|
models |
["ExtraTreesClassifier"] |
Models to train (e.g., ExtraTrees, RandomForest, XGB, CatBoost) |
n_trials |
200 |
Number of Optuna hyperparameter optimization trials |
n_inner_splits |
5 |
Inner cross-validation splits |
inner_split_seeds |
[0] |
Seeds for inner splits; more seeds = more robust |
max_features |
0 |
Constrain maximum features during HPO (0 = no constraint) |
penalty_factor |
1.0 |
Penalty for exceeding max_features |
ensemble_selection |
False |
Enable ensemble selection over top trials |
n_ensemble_candidates |
50 |
Number of top trials saved for ensemble selection |
fi_methods |
["permutation"] |
Feature importance methods: "permutation", "shap", "constant" |
n_startup_trials |
15 |
Random trials before TPE sampler kicks in |
max_outliers |
3 |
Maximum outlier samples to optimize/remove |
n_mrmr_features |
[] |
Feature counts for integrated MRMR feature selection |
scoring_method |
"combined" |
Bag performance mode: "combined" or "average" |
When to use
Octo is the workhorse of Octopus and should be used:
- As the first task to get a baseline and feature importances that downstream modules (e.g., MRMR) can consume.
- As the last task to train a final model on a refined feature set.
- When you need ensemble selection over multiple Optuna trials for maximum performance.
- When you want constrained HPO to limit the number of features used during optimization.
Limitations
- Computationally expensive:
n_trialsxn_inner_splitsmodel fits, plus feature importance computation. - The constrained HPO mode requires models with
chpo_compatible=Truein their model configuration.
Examples
- Basic Classification — simplest Octo setup with a single task.
- Multi-Step Classification — Octo as first and last task in a multi-step pipeline.
- Use Own Hyperparameters — overriding Octo's default search space.