Octopus
Octopus is a lightweight AutoML framework specifically designed for small datasets (<1k samples) and with high dimensionality (number of features). The goal of Octopus is to speed up machine learning projects and to increase the reliability of results in the context of small datasets.
Why Octopus?
| Nested cross-validation | Separates hyperparameter tuning from performance estimation, giving you honest metrics even on 100-sample datasets. Learn more |
| No information leakage | Feature selection, imputation, and scaling happen inside each CV fold. Correlated observations are automatically grouped. |
| Multi-step workflows | Chain feature-selection modules (ROC, MRMR, Boruta) with ML modules (Octo, AutoGluon) into pipelines that progressively refine the feature set. Learn more |
| Ensembling for small data | Combines models across inner CV splits and Optuna trials into robust ensembles, optimized for the nested CV setting. |
| Classification, regression & survival | Supports binary/multiclass classification, regression, and time-to-event analysis out of the box. |
Where to go from here?
- Getting Started — Install Octopus and run your first study in five minutes.
- User Guide — Hands-on, step-by-step guides that show you how to configure and run each task type (classification, regression, survival analysis) with all available options.
- Concepts — Understand why Octopus works the way it does: nested CV, workflows, information leakage prevention, and feature importance.
- Examples — Runnable end-to-end workflows from basic to advanced.
- API Reference — Auto-generated reference for all public classes and functions.