MDFactoryMDFactory

SimulationStore

Discovers and manages multiple simulations across root directories.

Provides discovery, caching, and aggregation capabilities for working with multiple simulation directories.

Attributes

attributerootslist[Path]
= [Path(roots)]

List of root paths to search

attributetrajectory_filestr
= trajectory_file

Trajectory filename to discover

attributestructure_filestr
= structure_file

Structure filename to discover

attribute_simulationsdict[str, Simulation]
= {}

Dict mapping hash -> Simulation instance (cache)

attribute_discovery_dfpd.DataFrame | None
= None

Cached discovery DataFrame

Functions

func__init__(self, roots, trajectory_file='prod.xtc', structure_file='system.pdb')

Initialize store with one or more root paths.

paramself
paramrootslist[Path | str] | Path | str

Single path or list of paths to search

paramtrajectory_filestr
= 'prod.xtc'

Trajectory filename to discover

paramstructure_filestr
= 'system.pdb'

Structure filename to discover

Returns

None
funcdiscover(self, refresh=False)pd.DataFrame

Discover simulations under all roots using existing discover_simulations().

Returns DataFrame with columns: ["hash", "path", "simulation"]

paramself
paramrefreshbool
= False

If True, re-discover; if False, use cached result

Returns

pandas.DataFrame

DataFrame with discovered simulations

funcget_simulation(self, hash)Simulation

Get Simulation instance for given hash.

Returns cached Simulation instance from discovery DataFrame.

paramself
paramhashstr

Simulation hash (primary identifier)

Returns

Simulation

Simulation instance

funclist_simulations(self)list[str]

Return sorted list of discovered simulation hashes.

paramself

Returns

list

Sorted list of hash strings

funcbuild_metadata_table(self, flatten_fn)pd.DataFrame

Build flattened metadata table across all simulations.

Applies user-provided flatten function to each BuildInput to extract desired fields.

paramself
paramflatten_fnCallable[[BuildInput], dict]

Function that takes BuildInput and returns flat dict

Returns

pandas.DataFrame

DataFrame with one row per simulation, including:

  • hash, path (from discovery)
  • flattened fields (from flatten_fn)
funcload_analysis_with_metadata(self, analysis_name, flatten_fn, missing_ok=False)pd.DataFrame

Eager load: join analysis data with metadata across simulations.

Loads the specified analysis from all simulations and joins with flattened metadata. Adds 'simulation_hash' column to track source.

paramself
paramanalysis_namestr

Name of analysis to load

paramflatten_fnCallable[[BuildInput], dict]

Function to flatten BuildInput for each simulation

parammissing_okbool
= False

If True, skip simulations without this analysis; if False, raise error

Returns

pandas.DataFrame

DataFrame with analysis data joined with metadata. Includes 'simulation_hash' column.

funcremove_all_analyses(self, simulation_type=None)pd.DataFrame

Remove all analyses from discovered simulations.

Useful for cleaning up storage or resetting analysis state.

paramself
paramsimulation_typestr | None
= None

If specified, only remove analyses for this simulation type.

Returns

pandas.DataFrame

Summary DataFrame with columns:

  • hash: str - Simulation hash
  • simulation_type: str - Type (bilayer, mixedbox)
  • status: str - "success" or "failed"
  • error: str | None - Error message if failed
funclist_analyses_status(self, simulation_type=None)pd.DataFrame

List available and completed analyses across all simulations.

Shows which analyses CAN be run for each simulation type (from ANALYSIS_REGISTRY) and which HAVE been run (from AnalysisRegistry).

paramself
paramsimulation_typestr | None
= None

If specified, filter to only this simulation type

Returns

pandas.DataFrame

Long-form DataFrame with columns:

  • hash: str - Simulation hash
  • simulation_type: str - Type (bilayer, mixedbox)
  • analysis_name: str - Name of analysis
  • status: str - "available" or "completed"
funclist_artifacts_status(self, simulation_type=None)pd.DataFrame

List available and completed artifacts across all simulations.

Shows which artifacts CAN be run for each simulation type (from ARTIFACT_REGISTRY) and which HAVE been run (from AnalysisRegistry).

paramself
paramsimulation_typestr | None
= None

If specified, filter to only this simulation type

Returns

pandas.DataFrame

Long-form DataFrame with columns:

  • hash: str - Simulation hash
  • simulation_type: str - Type (bilayer, mixedbox)
  • artifact_name: str - Name of artifact
  • status: str - "not yet run" or "completed"
funcrun_artifacts_batch(self, artifact_names=None, simulation_type=None, skip_existing=True, output_prefix=None, **kwargs)pd.DataFrame

Run artifacts across multiple simulations in batch mode.

Executes artifact producers for all (or filtered) simulations, with options to skip already-completed artifacts and handle errors gracefully.

paramself
paramartifact_nameslist[str] | None
= None

Specific artifacts to run. If None, run all registered artifacts for each simulation's type.

paramsimulation_typestr | None
= None

If specified, only run for simulations of this type

paramskip_existingbool
= True

If True, skip artifacts that have already been run (default: True)

paramoutput_prefixstr | None
= None
paramkwargs
= {}

Returns

pandas.DataFrame

Summary DataFrame with columns:

  • hash: str - Simulation hash
  • simulation_type: str - Type (bilayer, mixedbox)
  • artifact_name: str - Name of artifact
  • status: str - "success", "skipped", or "failed"
  • error: str | None - Error message if failed
  • files: int | None - Number of files produced (if successful)
  • duration_seconds: float | None - Execution time
funcremove_all_artifacts(self, simulation_type=None)pd.DataFrame

Remove all artifacts across simulations.

paramself
paramsimulation_typestr | None
= None

If specified, only remove artifacts for simulations of this type

Returns

pandas.DataFrame

Summary DataFrame with columns:

  • hash: str - Simulation hash
  • simulation_type: str - Type (bilayer, mixedbox)
  • status: str - "success" or "failed"
  • error: str | None - Error message if failed
funcrun_analyses_batch(self, analysis_names=None, simulation_type=None, skip_existing=True, **kwargs)pd.DataFrame

Run analyses across multiple simulations in batch mode.

Executes analyses for all (or filtered) simulations, with options to skip already-completed analyses and handle errors gracefully.

paramself
paramanalysis_nameslist[str] | None
= None

Specific analyses to run. If None, run all registered analyses for each simulation's type.

paramsimulation_typestr | None
= None

If specified, only run for simulations of this type

paramskip_existingbool
= True

If True, skip analyses that have already been run (default: True)

paramkwargs
= {}

Returns

pandas.DataFrame

Summary DataFrame with columns:

  • hash: str - Simulation hash
  • simulation_type: str - Type (bilayer, mixedbox)
  • analysis_name: str - Name of analysis
  • status: str - "success", "skipped", or "failed"
  • error: str | None - Error message if failed
  • rows: int | None - Number of rows in result (if successful)
  • duration_seconds: float | None - Execution time
func_ensure_discovered(self)None

Ensure discovery has been run at least once.

paramself

Returns

None
funcbuild_lnp_chemistry_table(self)pd.DataFrame

Build table of LNP chemistry for all discovered simulations.

Extracts HL, CHL, and IL (ILN+ILP combined) counts, fractions, and SMILES for each simulation.

paramself

Returns

pandas.DataFrame

DataFrame with columns:

  • hash, path
  • HL_count, HL_fraction, HL_smiles
  • CHL_count, CHL_fraction, CHL_smiles
  • IL_count, IL_fraction, ILN_smiles, ILP_smiles
funcload_analysis_with_lnp_chemistry(self, analysis_name, missing_ok=False)pd.DataFrame

Load analysis data joined with LNP chemistry metadata.

Loads the specified analysis from all simulations and joins with LNP chemistry (HL, CHL, IL counts/fractions/SMILES).

paramself
paramanalysis_namestr

Name of analysis to load

parammissing_okbool
= False

If True, skip simulations without this analysis; if False, raise error

Returns

pandas.DataFrame

DataFrame with analysis data joined with LNP chemistry. Includes 'hash' column for tracking source simulation.

funcbuild_all_species_table(self)pd.DataFrame

Build table of all species for all discovered simulations.

Extracts count, fraction, and SMILES for every species defined in each simulation's YAML file. No filtering or grouping.

paramself

Returns

pandas.DataFrame

DataFrame with columns:

  • hash, path
  • {resname}_count, {resname}_fraction, {resname}_smiles (per species)
  • total_species_count, total_molecule_count
funcload_analysis_with_all_species(self, analysis_name, missing_ok=False)pd.DataFrame

Load analysis data joined with all species metadata.

Loads the specified analysis from all simulations and joins with all species data (count, fraction, SMILES for every species).

paramself
paramanalysis_namestr

Name of analysis to load

parammissing_okbool
= False

If True, skip simulations without this analysis; if False, raise error

Returns

pandas.DataFrame

DataFrame with analysis data joined with all species. Includes 'hash' column for tracking source simulation.

funcbuild_chemistry_table(self, mode='all', species_groups=None)pd.DataFrame

Build chemistry table with configurable extraction mode.

paramself
parammodestr
= 'all'

Extraction mode:

  • "all": Extract all species from YAML (default)
  • "lnp": Use LNP-specific grouping (HL, CHL, IL)
  • "custom": Use custom species_groups
paramspecies_groupsdict[str, list[str]] | None
= None

Required when mode="custom". Mapping of group names to resnames.

Returns

pandas.DataFrame

DataFrame with hash, path, and chemistry columns.

funcload_analysis_with_chemistry(self, analysis_name, mode='all', species_groups=None, missing_ok=False)pd.DataFrame

Load analysis data joined with chemistry metadata.

paramself
paramanalysis_namestr

Name of analysis to load

parammodestr
= 'all'

Extraction mode: "all", "lnp", or "custom"

paramspecies_groupsdict[str, list[str]] | None
= None

Required when mode="custom"

parammissing_okbool
= False

If True, skip simulations without this analysis

Returns

pandas.DataFrame

DataFrame with analysis data joined with chemistry.

On this page