AutoGluon
Based on: AutoGluon
AutoGluon wraps the AutoGluon TabularPredictor to provide fully automated model selection, hyperparameter tuning, and stacking/ensembling within an Octopus workflow. Unlike Octo, which exposes fine-grained control over optimization, AutoGluon aims for a hands-off experience: you configure a quality preset and a time budget, and AutoGluon handles the rest.
How it works
-
Initialize the TabularPredictor. A
TabularPredictoris created with the target column, evaluation metric (mapped from Octopus metric names to AutoGluon scorers), and verbosity level. -
Fit on training data. AutoGluon's
fit()method is called with the combined feature + target DataFrame. Internally, AutoGluon:- Performs automatic feature engineering (type inference, missing value handling, encoding).
- Trains a portfolio of model types (controlled by
included_model_typesor the full default set). - Tunes hyperparameters using the strategy defined by the
presets. - Builds multi-layer stacking ensembles when using higher-quality presets
(
"good_quality"and above). - Uses
n_bag_splitsfor bagging/cross-validation within each model.
-
Evaluate performance. After training, the module evaluates on train, dev (out-of-split), and test partitions. Scores are computed using both AutoGluon's built-in metrics and Octopus's metric implementations for cross-comparison.
-
Feature importance. Permutation feature importance is computed on the test set using AutoGluon's
feature_importance()method with confidence bands (15 shuffle sets, 95% confidence). If feature groups are defined, group-level importances are also calculated. -
Sklearn-compatible model. The fitted AutoGluon predictor is wrapped in a sklearn-compatible class (
SklearnClassifierorSklearnRegressor) so that downstream Octopus code (e.g., feature importance methods) can use it seamlessly. -
No feature selection. AutoGluon does not perform feature selection -- it returns all input features. To select features, place AutoGluon after a feature-selection module in the workflow.
Supported model types
When included_model_types is not set, AutoGluon considers all available
model families:
| Code | Model |
|---|---|
GBM |
LightGBM |
CAT |
CatBoost |
XGB |
XGBoost |
RF |
Random Forest |
XT |
Extra Trees |
KNN |
K-Nearest Neighbors |
LR |
Linear/Logistic Regression |
NN_TORCH |
PyTorch Neural Network |
FASTAI |
FastAI Neural Network |
Key parameters
| Parameter | Default | Description |
|---|---|---|
presets |
["medium_quality"] |
Quality presets: "best_quality", "high_quality", "good_quality", "medium_quality" |
time_limit |
None |
Total training time in seconds |
infer_limit |
None |
Per-row inference time limit in seconds |
n_bag_splits |
5 |
Bagging splits |
included_model_types |
None |
Restrict to specific model types (see table above) |
memory_limit |
"auto" |
Memory limit in GB |
When to use
AutoGluon is ideal when:
- You want a fully automated baseline with minimal configuration effort.
- You want to compare Octo's manually-configured pipeline against an AutoML approach.
- You need access to model types not available in Octo (e.g., neural networks, KNN, linear models, LightGBM).
- Time-constrained scenarios where setting a
time_limitand apresetslevel is sufficient.
Limitations
- AutoGluon does not perform feature selection. All input features are passed through. Combine it with upstream feature-selection modules if needed.
- Requires the
autogluonoptional dependency (pip install octopus[autogluon]). - Higher-quality presets (
"best_quality","high_quality") use multi-layer stacking which is memory-intensive and can be slow. - The module integrates with Ray for resource management, which can conflict with Octo's own Ray usage if not configured carefully.
Examples
- Octo & AutoGluon — runs Octo and AutoGluon side by side on the same dataset.