Octopus

Octopus is a lightweight AutoML framework specifically designed for small datasets (<1k samples) and with high dimensionality (number of features). The goal of Octopus is to speed up machine learning projects and to increase the reliability of results in the context of small datasets.

Why Octopus?


Nested cross-validation	Separates hyperparameter tuning from performance estimation, giving you honest metrics even on 100-sample datasets. Learn more
No information leakage	Feature selection, imputation, and scaling happen inside each CV fold. Correlated observations are automatically grouped.
Multi-step workflows	Chain feature-selection modules (ROC, MRMR, Boruta) with ML modules (Octo, AutoGluon) into pipelines that progressively refine the feature set. Learn more
Ensembling for small data	Combines models across inner CV splits and Optuna trials into robust ensembles, optimized for the nested CV setting.
Classification, regression & survival	Supports binary/multiclass classification, regression, and time-to-event analysis out of the box.

Where to go from here?

Getting Started — Install Octopus and run your first study in five minutes.
User Guide — Hands-on, step-by-step guides that show you how to configure and run each task type (classification, regression, survival analysis) with all available options.
Concepts — Understand why Octopus works the way it does: nested CV, workflows, information leakage prevention, and feature importance.
Examples — Runnable end-to-end workflows from basic to advanced.
API Reference — Auto-generated reference for all public classes and functions.