Understanding the Output

After study.fit() completes, Octopus writes all results to a timestamped directory. This page explains the directory layout, what each file contains, and how to load results programmatically.

Output directory

The root output directory is created inside studies_directory (default: ./studies/) with the pattern:

{studies_directory}/{study_name}-{YYYYMMDD_HHMMSS}/

You can access the path after fitting via study.output_path.

Directory tree

A typical study with 5 outer splits and a two-task workflow (e.g., ROC → Octo) produces the following structure:

my_study-20260409_143000/
├── study_config.json
├── study_meta.json
├── data_raw.parquet
├── data_prepared.parquet
├── health_check_report.csv
├── study.log
│
├── outersplit0/
│   ├── split_row_ids.json
│   ├── task0/
│   │   ├── config/
│   │   │   ├── task_config.json
│   │   │   ├── feature_cols.json
│   │   │   └── feature_groups.json
│   │   └── results/
│   │       └── best/
│   │           ├── selected_features.json
│   │           ├── scores.parquet
│   │           ├── predictions.parquet
│   │           ├── feature_importances.parquet
│   │           └── model/
│   │               ├── model.joblib
│   │               └── predictor.json
│   └── task1/
│       ├── config/
│       │   └── ...
│       └── results/
│           ├── best/
│           │   └── ...
│           ├── ensemble_selection/
│           │   └── ...
│           └── optuna_results.parquet
├── outersplit1/
│   └── ...
└── outersplit4/
    └── ...

Study-level files

File	Description
`study_config.json`	Complete study configuration (parameters, workflow definition, feature columns, target assignments). Useful for reproducibility.
`study_meta.json`	Metadata: Octopus version, Python version, and creation timestamp.
`data_raw.parquet`	The original input DataFrame as passed to `fit()`.
`data_prepared.parquet`	Cleaned DataFrame after deduplication and internal column additions (`row_id`).
`health_check_report.csv`	Data quality report with one row per issue found. See Data Health Check.
`study.log`	Full execution log.

Outer split level

Each outersplitN/ directory corresponds to one iteration of the outer cross-validation loop.

split_row_ids.json contains the train+dev and test row IDs for this split:

{
  "row_id_col": "row_id",
  "traindev_row_ids": [0, 1, 3, 5, ...],
  "test_row_ids": [2, 4, 7, ...]
}

Task level

Each taskN/ directory holds the configuration and results for one workflow task.

config/

File	Description
`task_config.json`	Task configuration: module type, `task_id`, `depends_on`, description.
`feature_cols.json`	Input feature columns received by this task.
`feature_groups.json`	Correlation-based feature groups (if applicable).

results/

Results are organized by result type:

best/ — the best single-trial result (always present for Octo tasks).
ensemble_selection/ — the ensemble result when ensemble_selection=True is set on the Octo task.

Each result directory contains:

File	Description
`selected_features.json`	Features selected by this task (JSON list).
`scores.parquet`	Performance metrics across inner CV folds (columns include `outer_split_id`, `inner_split_id`, `task_id`, and one column per metric).
`predictions.parquet`	Predictions on train and test partitions. For classification: `pred_proba_0`, `pred_proba_1`, `pred_class`. For regression: `prediction`.
`feature_importances.parquet`	Feature importance scores (columns: `feature`, `importance_score`, plus metadata).
`model/model.joblib`	Serialized fitted model (a Bag of inner-split models).
`model/predictor.json`	Predictor metadata (selected features).

optuna_results.parquet (Octo tasks only) sits directly under results/ and contains all Optuna trial results: trial number, objective value, model type, and hyperparameter values.

Loading results programmatically

StudyDiagnostics

The simplest way to access results across all outer splits:

from octopus.diagnostics import StudyDiagnostics

diag = StudyDiagnostics("studies/my_study-20260409_143000")

# Aggregated DataFrames across all outer splits and tasks
diag.predictions       # all predictions
diag.fi                # all feature importances
diag.optuna_trials     # all Optuna trial results
diag.scores            # all performance scores

TaskPredictorTest

To load predictions and models for a specific task:

from octopus.predict import TaskPredictorTest

predictor = TaskPredictorTest(
    study_path="studies/my_study-20260409_143000",
    task_id=1,
)

See the API Reference for full details.