octopus.modules
Init modules.
AutoGluon
AutoGluon module placeholder when AutoGluon is not installed.
Source code in octopus/modules/__init__.py
Boruta
Bases: Task
Boruta module for feature selection.
Uses the Boruta algorithm to identify all relevant features by comparing importance scores with shadow features.
Configuration
model: Model to use for Boruta (defaults based on ml_type) cv: Number of CV folds perc: Percentile threshold for shadow feature comparison alpha: Significance level for p-values
Source code in octopus/modules/boruta/module.py
alpha = field(validator=[validators.instance_of(float)], default=0.05)
class-attribute
instance-attribute
Level at which the corrected p-values will get rejected.
cv = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of folds for CV.
model = field(validator=[validators.instance_of(str)], default='')
class-attribute
instance-attribute
Model used by Boruta.
perc = field(validator=[validators.instance_of(int)], default=100)
class-attribute
instance-attribute
Percentile (threshold) for comparison between shadow and real features.
create_module()
Create BorutaModule execution instance.
Source code in octopus/modules/boruta/module.py
DataPartition
Efs
Bases: Task
EFS module for ensemble feature selection.
Creates multiple models on random feature subsets and uses ensemble optimization to select the best combination of models.
Configuration
model: Model to use for EFS (defaults to CatBoost based on ml_type) subset_size: Number of features in each random subset n_subsets: Number of random subsets to create cv: Number of CV folds max_n_iterations: Maximum iterations for ensemble optimization max_n_models: Maximum number of models to consider
Source code in octopus/modules/efs/module.py
cv = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of CV folds for EFS.
max_n_iterations = field(validator=[validators.instance_of(int)], default=50)
class-attribute
instance-attribute
Number of iterations for ensemble optimization.
max_n_models = field(validator=[validators.instance_of(int)], default=30)
class-attribute
instance-attribute
Maximum number of models used in optimization, pruning.
model = field(validator=[validators.instance_of(str)], default='')
class-attribute
instance-attribute
Model used by EFS (empty string uses default for ml_type).
n_subsets = field(validator=[validators.instance_of(int)], default=100)
class-attribute
instance-attribute
Number of subsets.
subset_size = field(validator=[validators.instance_of(int)], default=30)
class-attribute
instance-attribute
Number of features in the subset.
create_module()
Create EfsModule execution instance.
FIResultLabel
Bases: StrEnum
Labels used in feature-importance result DataFrames.
Every module writes a fi_method column into its result DataFrame.
Use these members as the column values so downstream code can filter
and aggregate results reliably.
Source code in octopus/types.py
ModuleExecution
Bases: ABC
Base execution class. Created on worker via config.create_module().
Source code in octopus/modules/base.py
fit(*, data_traindev, data_test, feature_cols, study_context, outersplit_id, results_dir, scratch_dir, num_assigned_cpus, feature_groups, prior_results)
abstractmethod
Fit the module. Returns dict mapping ResultType to ModuleResult.
Source code in octopus/modules/base.py
ModuleResult
Unified result container for a single result type from a module.
Carries all 5 artifacts (selected_features, scores, predictions, feature_importances, model) and knows how to save/load itself. Each result_type gets its own directory on disk.
Source code in octopus/modules/result.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
load(result_dir, result_type, module)
classmethod
Load a ModuleResult from a saved directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result_dir
|
UPath
|
Directory containing saved result files |
required |
result_type
|
ResultType
|
The ResultType for this directory |
required |
module
|
str
|
Module name |
required |
Returns:
| Type | Description |
|---|---|
ModuleResult
|
Reconstructed ModuleResult instance |
Source code in octopus/modules/result.py
save(result_dir)
Save this result to a directory.
Stamps module + result_type columns on DataFrames, saves parquets, selected_features.json, and model/ subdirectory if model is not None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result_dir
|
UPath
|
Directory to save into (e.g. task0/best/) |
required |
Source code in octopus/modules/result.py
Mrmr
Bases: Task
MRMR module for feature selection based on mutual information and redundancy.
Uses the maximum relevance minimum redundancy algorithm to select features that are maximally relevant to the target while minimizing redundancy among selected features.
Configuration
n_features: Number of features to select correlation_type: Type of correlation to measure redundancy relevance_type: Method to calculate relevance (MRMRRelevance.PERMUTATION or MRMRRelevance.INTERNAL) results_module: Module name to filter prior results' feature importances (for permutation relevance) feature_importance_type: Type of FI aggregation (MRMRFIAggregation.MEAN or MRMRFIAggregation.COUNT) feature_importance_method: FI calculation method (FIComputeMethod.PERMUTATION, FIComputeMethod.SHAP, FIComputeMethod.INTERNAL, FIComputeMethod.LOFO)
Source code in octopus/modules/mrmr/module.py
correlation_type = field(converter=CorrelationType, validator=(validators.in_([CorrelationType.PEARSON, CorrelationType.SPEARMAN, CorrelationType.RDC])), default=(CorrelationType.SPEARMAN))
class-attribute
instance-attribute
Selection of correlation type.
feature_importance_method = field(converter=FIComputeMethod, validator=(validators.in_([FIComputeMethod.PERMUTATION, FIComputeMethod.SHAP, FIComputeMethod.INTERNAL, FIComputeMethod.LOFO])), default=(FIComputeMethod.PERMUTATION))
class-attribute
instance-attribute
Selection of feature importance method.
feature_importance_type = field(converter=MRMRFIAggregation, validator=(validators.in_(list(MRMRFIAggregation))), default=(MRMRFIAggregation.MEAN))
class-attribute
instance-attribute
Selection of feature importance type.
n_features = field(validator=[validators.instance_of(int)], default=(Factory(lambda: 30)))
class-attribute
instance-attribute
Number of features selected by MRMR.
relevance_type = field(converter=MRMRRelevance, validator=(validators.in_(list(MRMRRelevance))), default=(MRMRRelevance.PERMUTATION))
class-attribute
instance-attribute
Selection of relevance measure.
results_module = field(validator=(validators.instance_of(str)), default='octo')
class-attribute
instance-attribute
Module name from which feature importances were created.
create_module()
Create MrmrModule execution instance.
Octo
Bases: Task
Octo module for feature selection and model optimization.
Uses Optuna for hyperparameter optimization with cross-validation, supporting: - Multiple ML models - MRMR feature selection - Ensemble selection - Bag-based model ensembling
Configuration
models: List of model names to optimize n_folds_inner: Number of inner CV folds n_trials: Number of Optuna trials ensemble_selection: Whether to perform ensemble selection mrmr_feature_numbers: Feature counts for MRMR feature selection
Source code in octopus/modules/octo/module.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
datasplit_seeds_inner = field(default=(Factory(lambda: [0])), validator=(validators.deep_iterable(member_validator=(validators.instance_of(int)), iterable_validator=(validators.instance_of(list)))))
class-attribute
instance-attribute
List of integers used as seeds for data splitting.
ensel_n_save_trials = field(validator=[validators.instance_of(int)], default=50)
class-attribute
instance-attribute
Number of top trials to be saved for ensemble selection (bags).
ensemble_selection = field(validator=[validators.in_([True, False])], default=False)
class-attribute
instance-attribute
Whether to perform ensemble selection.
fi_methods_bestbag = field(default=(Factory(lambda: [FIComputeMethod.PERMUTATION])), converter=(lambda vs: [(FIComputeMethod(v)) for v in vs]), validator=(validators.deep_iterable(member_validator=(validators.in_([FIComputeMethod.PERMUTATION, FIComputeMethod.SHAP, FIComputeMethod.CONSTANT])), iterable_validator=(validators.instance_of(list)))))
class-attribute
instance-attribute
Feature importance methods for best bag.
hyperparameters = field(validator=[validators.instance_of(dict)], default=(Factory(dict)))
class-attribute
instance-attribute
Bring own hyperparameter space.
inner_parallelization = field(validator=[validators.instance_of(bool)], default=True)
class-attribute
instance-attribute
Enable inner parallelization. Defaults is True.
max_features = field(validator=[validators.instance_of(int)], default=0)
class-attribute
instance-attribute
Maximum features to constrain hyperparameter optimization. Default is zero (off).
max_outl = field(validator=[validators.instance_of(int)], default=3)
class-attribute
instance-attribute
Maximum number of outliers, optimized by Optuna
model_seed = field(validator=[validators.instance_of(int)], default=0)
class-attribute
instance-attribute
Model seed.
models = field(default=None, converter=_convert_models)
class-attribute
instance-attribute
Models for ML. If None, defaults are resolved at fit time based on ml_type.
mrmr_feature_numbers = field(validator=[validators.instance_of(list)], default=(Factory(list)))
class-attribute
instance-attribute
List of feature numbers to be investigated by mrmr.
n_folds_inner = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of inner folds.
n_jobs = field(validator=[validators.instance_of(int)], default=1)
class-attribute
instance-attribute
Number of CPUs used for every model training.
n_optuna_startup_trials = field(validator=[validators.instance_of(int)], default=15)
class-attribute
instance-attribute
Number of Optuna startup trials (random sampler)
n_trials = field(validator=[validators.instance_of(int)], default=(200 if not _RUNNING_IN_TESTSUITE else 3))
class-attribute
instance-attribute
Number of Optuna trials.
n_workers = field(default=None)
class-attribute
instance-attribute
Number of workers.
optuna_return = field(default=(OptunaReturnType.POOL), converter=OptunaReturnType, validator=(validators.in_(list(OptunaReturnType))))
class-attribute
instance-attribute
How to calculate the bag performance for the optuna optimization target.
optuna_seed = field(validator=[validators.instance_of(int)], default=0)
class-attribute
instance-attribute
Seed for Optuna TPESampler, default=0
penalty_factor = field(validator=[validators.instance_of(float)], default=1.0)
class-attribute
instance-attribute
Factor to penalize optuna target related to feature constraint.
create_module()
Create OctoModule execution instance.
Source code in octopus/modules/octo/module.py
ResultType
Rfe
Bases: Task
RFE module for recursive feature elimination.
Uses sklearn's RFECV with hyperparameter optimization to recursively eliminate features based on feature importances.
Configuration
model: Model to use for RFE (defaults to CatBoost based on ml_type) step: Number of features to remove at each iteration min_features_to_select: Minimum number of features to keep cv: Number of CV folds for RFECV mode: RFEMode.FIXED (use optimized model) or RFEMode.REFIT (reoptimize at each step)
Source code in octopus/modules/rfe/module.py
cv = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of CV folds for RFE_CV.
min_features_to_select = field(validator=[validators.instance_of(int)], default=1)
class-attribute
instance-attribute
Minimum number of features to be selected.
mode = field(converter=RFEMode, validator=(validators.in_(list(RFEMode))), default=(RFEMode.FIXED))
class-attribute
instance-attribute
Mode used by RFE: fixed=optimized model, refit=reoptimize each step.
model = field(validator=[validators.instance_of(str)], default='')
class-attribute
instance-attribute
Model used by RFE (empty string uses default for ml_type).
step = field(validator=[validators.instance_of(int)], default=1)
class-attribute
instance-attribute
Number of features to remove at each iteration.
create_module()
Create RfeModule execution instance.
Rfe2
Bases: Octo
Rfe2 module for recursive feature elimination with Octo optimization.
Extends Octo to add RFE functionality. First runs Octo optimization to get a best bag, then iteratively removes features based on feature importances.
Configuration
(inherits all Octo configuration) min_features_to_select: Minimum number of features to keep fi_method_rfe: Feature importance method for RFE selection_method: Method to select best solution (best or parsimonious) abs_on_fi: Convert negative feature importances to positive
Source code in octopus/modules/rfe2/module.py
abs_on_fi = field(validator=[validators.instance_of(bool)], default=False)
class-attribute
instance-attribute
Convert negative feature importances to positive (abs()).
fi_method_rfe = field(converter=FIComputeMethod, validator=(validators.in_([FIComputeMethod.PERMUTATION, FIComputeMethod.SHAP])), default=(FIComputeMethod.PERMUTATION))
class-attribute
instance-attribute
Feature importance method for RFE.
min_features_to_select = field(validator=[validators.instance_of(int)], default=1)
class-attribute
instance-attribute
Minimum number of features to be selected.
selection_method = field(converter=RFE2SelectionMethod, validator=(validators.in_(list(RFE2SelectionMethod))), default=(RFE2SelectionMethod.BEST))
class-attribute
instance-attribute
Method to select best solution. Parsimonious: smallest solutions within sem.
create_module()
Create Rfe2Module execution instance.
Roc
Bases: Task
ROC module for removing correlated features.
This module identifies groups of correlated features and selects the most informative feature from each group, removing the rest. Uses correlation analysis (Spearman or RDC) combined with feature filtering (mutual information or F-statistics) to determine which features to keep.
Configuration
threshold: Correlation threshold above which features are considered correlated correlation_type: Type of correlation measure (CorrelationType.SPEARMAN or CorrelationType.RDC) filter_type: Method to select best feature in group (ROCFilterMethod.MUTUAL_INFO or ROCFilterMethod.F_STATISTICS)
Source code in octopus/modules/roc/module.py
correlation_type = field(converter=CorrelationType, validator=(validators.in_([CorrelationType.SPEARMAN, CorrelationType.RDC])), default=(CorrelationType.SPEARMAN))
class-attribute
instance-attribute
Selection of correlation type.
filter_type = field(converter=ROCFilterMethod, validator=(validators.in_([ROCFilterMethod.MUTUAL_INFO, ROCFilterMethod.F_STATISTICS])), default=(ROCFilterMethod.F_STATISTICS))
class-attribute
instance-attribute
Selection of filter type for correlated features.
threshold = field(validator=[validators.instance_of(float)], default=0.8)
class-attribute
instance-attribute
Threshold for feature removal (features with correlation > threshold are grouped).
create_module()
Create RocModule execution instance.
Sfs
Bases: Task
SFS module for sequential feature selection.
Uses sequential feature selection (forward, backward, or floating variants) to find the optimal feature subset.
Configuration
model: Model to use for SFS (defaults based on ml_type) cv: Number of CV folds sfs_type: Type of SFS (forward, backward, floating_forward, floating_backward)
Source code in octopus/modules/sfs/module.py
cv = field(validator=[validators.instance_of(int)], default=5)
class-attribute
instance-attribute
Number of CV folds for SFS.
model = field(validator=[validators.instance_of(str)], default='')
class-attribute
instance-attribute
Model used by SFS.
sfs_type = field(converter=SFSDirection, validator=(validators.in_(list(SFSDirection))), default=(SFSDirection.BACKWARD))
class-attribute
instance-attribute
SFS type used.
create_module()
Create SfsModule execution instance.
StudyContext
Immutable runtime context passed to modules during fit().
Contains only the finalized/prepared values needed by modules. No OctoStudy dependency - only attrs + upath.
Source code in octopus/modules/context.py
feature_cols
instance-attribute
Prepared feature columns (from PreparedData.feature_cols).
log_dir
instance-attribute
Directory where logs are stored.
ml_type
instance-attribute
MLType enum (e.g. MLType.BINARY, MLType.REGRESSION, MLType.TIMETOEVENT).
output_path
instance-attribute
Full output path for this study.
positive_class
instance-attribute
Positive class label for binary classification. None for regression/multiclass.
row_id_col
instance-attribute
Prepared row identifier (from PreparedData.row_id_col).
sample_id_col
instance-attribute
Identifier for sample instances.
stratification_col
instance-attribute
Column used for stratification during data splitting.
target_assignments
instance-attribute
Target column assignments (e.g. {'default': 'target'} or {'duration': ..., 'event': ...}).
target_metric
instance-attribute
Primary metric for model evaluation.
Task
Bases: ABC
Base config class for all workflow tasks.
Source code in octopus/modules/base.py
module
property
Module name derived from class name.
create_module()
abstractmethod
rdc_correlation_matrix(df)
Calculate RDC correlation matrix.