Skip to content

Time to Event (Survival Analysis)

Octopus supports time-to-event (survival analysis) modeling as a first-class task type. This guide covers how to set up and run a survival analysis study.

Installation note Time-to-event modeling depends on optional survival dependencies (including lifelines). Install them with:

pip install "octopus-automl[survival]"

Overview

Time-to-event analysis models the time until an event of interest occurs (e.g., disease progression, equipment failure, customer churn), while accounting for censored observations — subjects where the event has not yet been observed.

Octopus provides two gradient boosting models with Cox proportional hazards objectives:

Model Description
CatBoostCoxSurvival CatBoost with Cox loss function. Supports native categoricals.
XGBoostCoxSurvival XGBoost with Cox survival objective.

Both models output risk scores, where higher values indicate higher risk (shorter expected survival). The exact scale of these scores (e.g., log-hazard ratio vs hazard ratio) may differ by implementation but is monotonic in risk.

Data Format

Your dataset must contain:

  • Feature columns: Numeric or categorical predictors
  • Duration column: Non-negative numeric, the time to event or censoring
  • Event column: Binary (0/1 or True/False), where 1 = event observed, 0 = censored
  • Sample ID column: Sample identifier column
import pandas as pd

df = pd.DataFrame({
    "patient_id": [1, 2, 3, 4, 5],
    "age": [55, 62, 48, 71, 59],
    "biomarker": [1.2, 0.8, 1.5, 0.3, 1.1],
    "duration": [12.5, 8.3, 24.0, 5.1, 18.7],
    "event": [1, 1, 0, 1, 0],  # 1=event, 0=censored
})

Basic Usage

from octopus.study import OctoTimeToEvent
from octopus.modules import Octo
from octopus.types import ModelName

study = OctoTimeToEvent(
    name="my_survival_study",
    target_metric="CI",
    feature_cols=["age", "biomarker"],
    duration_col="duration",
    event_col="event",
    sample_id_col="patient_id",
    metrics=["CI"],
    path="./results",
    workflow=[
        Octo(
            task_id=0,
            depends_on=None,
            description="survival_step",
            models=[ModelName.CatBoostCoxSurvival],
            n_trials=20,
            max_features=5,
            ensemble_selection=True,
            ensel_n_save_trials=10,
        )
    ],
)

study.fit(data=df)

Available Models

CatBoostCoxSurvival

CatBoost gradient boosting with Cox proportional hazards loss. Handles categorical features natively.

Tunable hyperparameters:

Parameter Range Scale
learning_rate [0.001, 0.1] log
depth [3, 10] linear
l2_leaf_reg [2, 10] linear
random_strength [2, 10] linear
rsm [0.1, 1] linear

Fixed: iterations=500, logging_level="Silent", task_type="CPU"

XGBoostCoxSurvival

XGBoost gradient boosting with Cox partial likelihood objective.

Tunable hyperparameters:

Parameter Range Scale
learning_rate [0.0001, 0.3] log
min_child_weight [2, 15] linear
subsample [0.15, 1.0] linear
n_estimators [30, 500] linear
max_depth [3, 9] linear

Available Metrics

Metric Key Description Direction
CI Harrell's concordance index maximize
CI_UNO Uno's concordance index (IPCW-corrected) maximize

Harrell's C-index (CI) measures discrimination — how well the model ranks subjects by risk. A value of 1.0 means perfect ranking, 0.5 means random.

Uno's C-index (CI_UNO) applies Inverse Probability of Censoring Weighting to correct for bias under heavy or informative censoring.

Using Multiple Models

Octo(
    task_id=0,
    depends_on=None,
    description="compare_models",
    models=[ModelName.CatBoostCoxSurvival, ModelName.XGBoostCoxSurvival],
    n_trials=20,
    max_features=5,
    ensemble_selection=True,
    ensel_n_save_trials=10,
)

Feature Importance

The following feature importance methods are supported for T2E models via fi_methods_bestbag:

  • permutation — Permutation importance using concordance index as scoring
  • shap — SHAP-based feature importance
  • constant — Constant (baseline) feature importance

Additionally, tree-based internal feature importances are always computed automatically by the underlying models.

Octo(
    ...,
    fi_methods_bestbag=["permutation"],
)