Adding Analyses and Artifacts
Extend the analysis registry with code paths that exist today
This page describes the current extension path implemented in mdfactory.analysis.
Current registry shape
Two registries drive execution:
mdfactory.analysis.simulation.ANALYSIS_REGISTRYmdfactory.analysis.artifacts.ARTIFACT_REGISTRY
Both registries are keyed by simulation_type. In the current codebase, entries exist for bilayer and mixedbox.
Add a new analysis
An analysis function should accept a Simulation object as its first argument and return a pandas.DataFrame.
Minimal shape:
def my_analysis(simulation, **kwargs) -> pd.DataFrame:
...
return dfThen:
- Implement the function in the appropriate analysis module.
- Register it in
ANALYSIS_REGISTRY[simulation_type]. - Add tests for direct execution and saved output.
Example registration pattern:
ANALYSIS_REGISTRY["bilayer"]["my_analysis"] = my_analysisWhat the framework does for you
When Simulation.run_analysis("my_analysis") is called, MDFactory:
- Resolves the function from
ANALYSIS_REGISTRY - Filters unsupported keyword arguments based on the function signature
- Executes the function
- Writes
.analysis/my_analysis.parquet - Updates
.analysis/metadata.jsonthroughAnalysisRegistry
Any keyword arguments that survive filtering are stored in the registry entry under extras.
Add a new artifact
An artifact producer should accept a Simulation object as its first argument and return either:
Pathlist[Path]
Minimal shape:
def my_artifact(simulation, **kwargs) -> Path | list[Path]:
...
return output_pathThen:
- Implement the producer.
- Register it in
ARTIFACT_REGISTRY[simulation_type]. - Add tests for file creation and registry metadata.
Example registration pattern:
ARTIFACT_REGISTRY["bilayer"]["my_artifact"] = my_artifactWhat the framework does for you
When Simulation.run_artifact("my_artifact") is called, MDFactory:
- Resolves the producer from
ARTIFACT_REGISTRY - Executes it
- Moves the returned files into
.analysis/artifacts/my_artifact/ - Computes checksums
- Records file metadata in
.analysis/metadata.json
Good test targets
Use the existing tests as patterns:
mdfactory/tests/test_simulation.pymdfactory/tests/test_analysis_registry.pymdfactory/tests/test_submit.py
Notes on CLI integration
The CLI forwards a shared set of analysis parameters such as start_ns, end_ns, last_ns, stride, and max_residues. Analyses that do not declare one of those arguments simply ignore it because Simulation.run_analysis() filters unsupported kwargs before calling the function.
