Skip to content

Octopus

Octopus is a lightweight AutoML framework specifically designed for small datasets (<1k samples) and with high dimensionality (number of features). The goal of Octopus is to speed up machine learning projects and to increase the reliability of results in the context of small datasets.

What distinguishes Octopus from others

  • Nested cross-validation (CV)
  • Performance on small datasets
  • No information leakage
  • No data split mistakes
  • Constrained regularization
  • Ensembling, optimized for (nested) CV
  • Simplicity
  • Time to event
  • Testing system (branching workflows)
  • Reporting based on nested CV
  • Test predictions over all samples

Hardware

For maximum speed it is recommended to run Octopus on a compute node with $n\times m$ CPUS for a $n \times m$ nested cross validation. Octopus development is done, for example, on a c5.9xlarge EC2 instance.

Installation

Package Installation works via pip or any other standard Python package manager:

# Install with recommended dependencies (includes optional packages such as AutoGluon)
pip install "octopus-automl[recommended]"

# Explicitly specify optional dependencies
pip install "octopus-automl[autogluon]"     # AutoGluon
pip install "octopus-automl[boruta]"        # Boruta feature selection
pip install "octopus-automl[sfs]"           # SequentialFeatureSelector feature selection
pip install "octopus-automl[survival]"      # Support time-to-event / survival analysis
pip install "octopus-automl[examples]"      # Dependencies for running examples

# Install with more than one extras, e.g.
pip install "octopus-automl[autogluon,examples]"

For contributors / octopus developers, a specific dependency group exists. It contains code sanitization and quality tools.

pip install "octopus-automl[dev]"