Adding Analyses and Artifacts

This page describes the current extension path implemented in mdfactory.analysis.

Current registry shape

Two registries drive execution:

mdfactory.analysis.simulation.ANALYSIS_REGISTRY
mdfactory.analysis.artifacts.ARTIFACT_REGISTRY

Both registries are keyed by simulation_type. In the current codebase, entries exist for bilayer and mixedbox.

Add a new analysis

An analysis function should accept a Simulation object as its first argument and return a pandas.DataFrame.

Minimal shape:

def my_analysis(simulation, **kwargs) -> pd.DataFrame:
    ...
    return df

Then:

Implement the function in the appropriate analysis module.
Register it in ANALYSIS_REGISTRY[simulation_type].
Add tests for direct execution and saved output.

Example registration pattern:

ANALYSIS_REGISTRY["bilayer"]["my_analysis"] = my_analysis

What the framework does for you

When Simulation.run_analysis("my_analysis") is called, MDFactory:

Resolves the function from ANALYSIS_REGISTRY
Filters unsupported keyword arguments based on the function signature
Executes the function
Writes .analysis/my_analysis.parquet
Updates .analysis/metadata.json through AnalysisRegistry

Any keyword arguments that survive filtering are stored in the registry entry under extras.

Add a new artifact

An artifact producer should accept a Simulation object as its first argument and return either:

Path
list[Path]

Minimal shape:

def my_artifact(simulation, **kwargs) -> Path | list[Path]:
    ...
    return output_path

Then:

Implement the producer.
Register it in ARTIFACT_REGISTRY[simulation_type].
Add tests for file creation and registry metadata.

Example registration pattern:

ARTIFACT_REGISTRY["bilayer"]["my_artifact"] = my_artifact

What the framework does for you

When Simulation.run_artifact("my_artifact") is called, MDFactory:

Resolves the producer from ARTIFACT_REGISTRY
Executes it
Moves the returned files into .analysis/artifacts/my_artifact/
Computes checksums
Records file metadata in .analysis/metadata.json

Good test targets

Use the existing tests as patterns:

mdfactory/tests/test_simulation.py
mdfactory/tests/test_analysis_registry.py
mdfactory/tests/test_submit.py

The CLI forwards a shared set of analysis parameters such as start_ns, end_ns, last_ns, stride, and max_residues. Analyses that do not declare one of those arguments simply ignore it because Simulation.run_analysis() filters unsupported kwargs before calling the function.

Adding Analyses and Artifacts

Current registry shape

Add a new analysis

What the framework does for you

Add a new artifact

What the framework does for you

Good test targets

Notes on CLI integration

Next steps

Architecture

Executing Analyses

On this page