Modules
Init modules.
AutoGluon
AutoGluon module placeholder when AutoGluon is not installed.
Source code in octopus/modules/__init__.py
Boruta
Bases: Task
Boruta module for feature selection.
Uses the Boruta algorithm to identify all relevant features by comparing importance scores with shadow features.
Configuration
model: Model to use for Boruta (defaults based on ml_type) n_inner_splits: Number of CV folds threshold: Percentile threshold for shadow feature comparison (0-100) alpha: Significance level for p-values (0-1)
Source code in octopus/modules/boruta/module.py
alpha = field(default=0.05, validator=[validators.instance_of(float), validators.gt(0), validators.lt(1)])
class-attribute
instance-attribute
Significance level at which the corrected p-values will get rejected (0-1).
model = field(default=None, converter=(lambda v: ModelName(v) if v is not None else None))
class-attribute
instance-attribute
Model used by Boruta. If None, defaults are resolved at fit time based on ml_type.
n_inner_splits = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of inner folds.
threshold = field(default=100, validator=[validators.instance_of(int), validators.ge(0), validators.le(100)])
class-attribute
instance-attribute
Percentile threshold for comparison between shadow and real features (0-100).
create_module()
Create BorutaModule execution instance.
Source code in octopus/modules/boruta/module.py
DataPartition
FIResultLabel
Bases: StrEnum
Labels used in feature-importance result DataFrames.
Every module writes a fi_method column into its result DataFrame.
Use these members as the column values so downstream code can filter
and aggregate results reliably.
Source code in octopus/types.py
ModuleExecution
Bases: ABC
Base execution class. Created on worker via config.create_module().
Source code in octopus/modules/base.py
fit(*, data_traindev, data_test, feature_cols, study_context, outer_split_id, results_dir, scratch_dir, n_assigned_cpus, feature_groups, dependency_results, **kwargs)
abstractmethod
Fit the module. Returns dict mapping ResultType to ModuleResult.
Source code in octopus/modules/base.py
ModuleResult
Unified result container for a single result type from a module.
Carries all 5 artifacts (selected_features, scores, predictions, fi, model) and knows how to save/load itself. Each result_type gets its own directory on disk.
Source code in octopus/modules/result.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
load(result_dir, result_type, module)
classmethod
Load a ModuleResult from a saved directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result_dir
|
UPath
|
Directory containing saved result files |
required |
result_type
|
ResultType
|
The ResultType for this directory |
required |
module
|
str
|
Module name |
required |
Returns:
| Type | Description |
|---|---|
ModuleResult
|
Reconstructed ModuleResult instance |
Source code in octopus/modules/result.py
save(result_dir)
Save this result to a directory.
Stamps module + result_type columns on DataFrames, saves parquets, selected_features.json, and model/ subdirectory if model is not None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result_dir
|
UPath
|
Directory to save into (e.g. task0/best/) |
required |
Source code in octopus/modules/result.py
Mrmr
Bases: Task
MRMR module for feature selection based on mutual information and redundancy.
Uses the maximum relevance minimum redundancy algorithm to select features that are maximally relevant to the target while minimizing redundancy among selected features.
Configuration
n_features: Number of features to select correlation_type: Type of correlation to measure redundancy relevance_type: Method to calculate relevance (MRMRRelevance.FROM_DEPENDENCY or MRMRRelevance.F_STATISTICS) feature_importance_type: FI aggregation type (only used with FROM_DEPENDENCY relevance) feature_importance_method: FI method to filter from dependency task (only used with FROM_DEPENDENCY relevance)
Source code in octopus/modules/mrmr/module.py
correlation_type = field(converter=CorrelationType, validator=(validators.in_([CorrelationType.PEARSON, CorrelationType.SPEARMAN, CorrelationType.RDC])), default=(CorrelationType.SPEARMAN))
class-attribute
instance-attribute
Selection of correlation type.
feature_importance_method = field(converter=FIComputeMethod, validator=(validators.in_([FIComputeMethod.PERMUTATION, FIComputeMethod.SHAP, FIComputeMethod.LOFO])), default=(FIComputeMethod.PERMUTATION))
class-attribute
instance-attribute
FI method to filter from the dependency task's results. Only used when relevance_type is FROM_DEPENDENCY.
feature_importance_type = field(converter=MRMRFIAggregation, validator=(validators.in_(list(MRMRFIAggregation))), default=(MRMRFIAggregation.MEAN))
class-attribute
instance-attribute
FI aggregation type. Only used when relevance_type is FROM_DEPENDENCY.
n_features = field(validator=[validators.instance_of(int)], default=(Factory(lambda: 30)))
class-attribute
instance-attribute
Number of features selected by MRMR.
relevance_type = field(converter=MRMRRelevance, validator=(validators.in_(list(MRMRRelevance))), default=(MRMRRelevance.FROM_DEPENDENCY))
class-attribute
instance-attribute
Method to calculate relevance (permutation or f-statistics).
create_module()
Create MrmrModule execution instance.
ResultType
Roc
Bases: Task
ROC module for removing correlated features.
This module identifies groups of correlated features and selects the most informative feature from each group, removing the rest. Uses correlation analysis (Spearman or RDC) combined with feature relevance scoring (mutual information or F-statistics) to determine which features to keep.
Configuration
correlation_threshold: Correlation threshold above which features are considered correlated correlation_type: Type of correlation measure (CorrelationType.SPEARMAN or CorrelationType.RDC) relevance_method: Method to select best feature in group (RelevanceMethod.MUTUAL_INFO or RelevanceMethod.F_STATISTICS)
Source code in octopus/modules/roc/module.py
correlation_threshold = field(validator=[validators.instance_of(float)], default=0.8)
class-attribute
instance-attribute
Correlation threshold for feature removal (features with correlation > threshold are grouped).
correlation_type = field(converter=CorrelationType, validator=(validators.in_([CorrelationType.SPEARMAN, CorrelationType.RDC])), default=(CorrelationType.SPEARMAN))
class-attribute
instance-attribute
Selection of correlation type.
relevance_method = field(converter=RelevanceMethod, validator=(validators.in_([RelevanceMethod.MUTUAL_INFO, RelevanceMethod.F_STATISTICS])), default=(RelevanceMethod.F_STATISTICS))
class-attribute
instance-attribute
Method to score feature relevance within correlated groups.
create_module()
Create RocModule execution instance.
StudyContext
Immutable runtime context passed to modules during fit().
Contains only the finalized/prepared values needed by modules. No OctoStudy dependency - only attrs + upath.
Source code in octopus/modules/context.py
feature_cols
instance-attribute
Prepared feature columns (from PreparedData.feature_cols).
log_dir
instance-attribute
Directory where logs are stored.
ml_type
instance-attribute
MLType enum (e.g. MLType.BINARY, MLType.REGRESSION, MLType.TIMETOEVENT).
output_path
instance-attribute
Full output path for this study.
positive_class
instance-attribute
Positive class label for binary classification. None for regression/multiclass.
row_id_col
instance-attribute
Prepared row identifier (from PreparedData.row_id_col).
sample_id_col
instance-attribute
Identifier for sample instances.
stratification_col
instance-attribute
Column used for stratification during data splitting.
target_assignments
instance-attribute
Target column assignments (e.g. {'default': 'target'} or {'duration': ..., 'event': ...}).
target_metric
instance-attribute
Primary metric for model evaluation.
Tako
Bases: Task
Tako module for feature selection and model optimization.
Uses Optuna for hyperparameter optimization with cross-validation, supporting: - Multiple ML models - MRMR feature selection - Ensemble selection - Bag-based model ensembling
Configuration
models: List of model names to optimize n_inner_splits: Number of inner CV splits n_trials: Number of Optuna trials ensemble_selection: Whether to perform ensemble selection n_mrmr_features: Number-of-feature options for MRMR-based Optuna search
Source code in octopus/modules/tako/module.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
ensemble_selection = field(validator=[validators.in_([True, False])], default=False)
class-attribute
instance-attribute
Whether to perform ensemble selection.
fi_methods = field(default=(Factory(lambda: [FIComputeMethod.PERMUTATION])), converter=(lambda vs: [(FIComputeMethod(v)) for v in vs]), validator=(validators.deep_iterable(member_validator=(validators.in_([FIComputeMethod.PERMUTATION, FIComputeMethod.SHAP, FIComputeMethod.CONSTANT])), iterable_validator=(validators.instance_of(list)))))
class-attribute
instance-attribute
Feature importance methods for best bag.
hyperparameters = field(validator=[validators.instance_of(dict)], default=(Factory(dict)))
class-attribute
instance-attribute
Bring own hyperparameter space.
inner_split_seeds = field(default=(Factory(lambda: [0])), validator=(validators.deep_iterable(member_validator=(validators.instance_of(int)), iterable_validator=(validators.instance_of(list)))))
class-attribute
instance-attribute
List of integers used as seeds for data splitting.
max_features = field(validator=[validators.instance_of(int)], default=0)
class-attribute
instance-attribute
Maximum features to constrain hyperparameter optimization. Default is zero (off).
max_outliers = field(validator=[validators.instance_of(int)], default=3)
class-attribute
instance-attribute
Maximum number of outliers, optimized by Optuna
models = field(default=None, converter=_convert_models)
class-attribute
instance-attribute
Models for ML. If None, defaults are resolved at fit time based on ml_type.
n_ensemble_candidates = field(validator=[validators.instance_of(int), validators.ge(1)], default=50)
class-attribute
instance-attribute
Number of top-performing bags to keep as candidates for ensemble selection.
n_inner_splits = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of inner splits.
n_mrmr_features = field(validator=[validators.instance_of(list)], default=(Factory(list)))
class-attribute
instance-attribute
Number-of-feature options for MRMR pre-selection during Optuna optimization.
Each integer specifies a number of top features to pre-select via MRMR (Max-Relevance Min-Redundancy). The resulting subsets become an additional Optuna hyperparameter, so each trial may use a different subset size. The full feature set is always included as an option.
Example: [10, 20, 50] pre-computes the top-10, top-20, and top-50
MRMR features; Optuna then chooses among these three subsets plus all
features. An empty list (default) disables MRMR and uses all features
in every trial.
n_startup_trials = field(validator=[validators.instance_of(int)], default=15)
class-attribute
instance-attribute
Number of Optuna startup trials (random sampler)
n_trials = field(validator=[validators.instance_of(int)], default=(200 if not _RUNNING_IN_TESTSUITE else 3))
class-attribute
instance-attribute
Number of Optuna trials.
penalty_factor = field(validator=[validators.instance_of(float)], default=1.0)
class-attribute
instance-attribute
Penalty multiplier for the feature-count constraint in Optuna optimization.
When max_features > 0, Optuna penalises trials that use more features
than allowed::
penalty = penalty_factor * excess_features / total_features
This penalty is subtracted from the optimisation target in the same numeric
space as the target metric. The default of 1.0 works well for metrics
bounded between 0 and 1 (AUCROC, ACCBAL, R2, …). For metrics on a larger
scale (MAE, MSE, RMSE, …) the penalty becomes negligible relative to the
score and feature constraining has no effect. In that case, increase
penalty_factor to match the metric's magnitude — e.g. if MAE ≈ 100,
try penalty_factor=100.0.
scoring_method = field(default=(ScoringMethod.COMBINED), converter=ScoringMethod, validator=(validators.in_(list(ScoringMethod))))
class-attribute
instance-attribute
How to calculate the bag performance for the optuna optimization target.
create_module()
Create TakoModule execution instance.
Source code in octopus/modules/tako/module.py
Task
Bases: ABC
Base config class for all workflow tasks.
Source code in octopus/modules/base.py
module
property
Module name derived from class name.
create_module()
abstractmethod
rdc_correlation_matrix(df)
Calculate RDC correlation matrix.