SimulationStore

Discovers and manages multiple simulations across root directories.

Provides discovery, caching, and aggregation capabilities for working with multiple simulation directories.

Attributes

attributerootslist[Path]

= [Path(roots)]

List of root paths to search

attributetrajectory_filestr

= trajectory_file

Trajectory filename to discover

attributestructure_filestr

= structure_file

Structure filename to discover

attribute_simulationsdict[str, Simulation]

= {}

Dict mapping hash -> Simulation instance (cache)

attribute_discovery_dfpd.DataFrame | None

= None

Cached discovery DataFrame

Functions

func__init__(self, roots, trajectory_file='prod.xtc', structure_file='system.pdb')

Initialize store with one or more root paths.

paramself

paramrootslist[Path | str] | Path | str

Single path or list of paths to search

paramtrajectory_filestr

= 'prod.xtc'

Trajectory filename to discover

paramstructure_filestr

= 'system.pdb'

Structure filename to discover

Returns

None

funcdiscover(self, refresh=False) → pd.DataFrame

Discover simulations under all roots using existing discover_simulations().

Returns DataFrame with columns: ["hash", "path", "simulation"]

paramself

paramrefreshbool

= False

If True, re-discover; if False, use cached result

Returns

pandas.DataFrame

DataFrame with discovered simulations

funcget_simulation(self, hash) → Simulation

Get Simulation instance for given hash.

Returns cached Simulation instance from discovery DataFrame.

paramself

paramhashstr

Simulation hash (primary identifier)

Returns

Simulation

Simulation instance

funclist_simulations(self) → list[str]

Return sorted list of discovered simulation hashes.

paramself

Returns

list

Sorted list of hash strings

funcbuild_metadata_table(self, flatten_fn) → pd.DataFrame

Build flattened metadata table across all simulations.

Applies user-provided flatten function to each BuildInput to extract desired fields.

paramself

paramflatten_fnCallable[[BuildInput], dict]

Function that takes BuildInput and returns flat dict

Returns

pandas.DataFrame

DataFrame with one row per simulation, including:

hash, path (from discovery)
flattened fields (from flatten_fn)

funcload_analysis_with_metadata(self, analysis_name, flatten_fn, missing_ok=False) → pd.DataFrame

Eager load: join analysis data with metadata across simulations.

Loads the specified analysis from all simulations and joins with flattened metadata. Adds 'simulation_hash' column to track source.

paramself

paramanalysis_namestr

Name of analysis to load

paramflatten_fnCallable[[BuildInput], dict]

Function to flatten BuildInput for each simulation

parammissing_okbool

= False

If True, skip simulations without this analysis; if False, raise error

Returns

pandas.DataFrame

DataFrame with analysis data joined with metadata. Includes 'simulation_hash' column.

funcremove_all_analyses(self, simulation_type=None) → pd.DataFrame

Remove all analyses from discovered simulations.

Useful for cleaning up storage or resetting analysis state.

paramself

paramsimulation_typestr | None

= None

If specified, only remove analyses for this simulation type.

Returns

pandas.DataFrame

Summary DataFrame with columns:

hash: str - Simulation hash
simulation_type: str - Type (bilayer, mixedbox)
status: str - "success" or "failed"
error: str | None - Error message if failed

funclist_analyses_status(self, simulation_type=None) → pd.DataFrame

List available and completed analyses across all simulations.

Shows which analyses CAN be run for each simulation type (from ANALYSIS_REGISTRY) and which HAVE been run (from AnalysisRegistry).

paramself

paramsimulation_typestr | None

= None

If specified, filter to only this simulation type

Returns

pandas.DataFrame

Long-form DataFrame with columns:

hash: str - Simulation hash
simulation_type: str - Type (bilayer, mixedbox)
analysis_name: str - Name of analysis
status: str - "available" or "completed"

funclist_artifacts_status(self, simulation_type=None) → pd.DataFrame

List available and completed artifacts across all simulations.

Shows which artifacts CAN be run for each simulation type (from ARTIFACT_REGISTRY) and which HAVE been run (from AnalysisRegistry).

paramself

paramsimulation_typestr | None

= None

If specified, filter to only this simulation type

Returns

pandas.DataFrame

Long-form DataFrame with columns:

hash: str - Simulation hash
simulation_type: str - Type (bilayer, mixedbox)
artifact_name: str - Name of artifact
status: str - "not yet run" or "completed"

funcrun_artifacts_batch(self, artifact_names=None, simulation_type=None, skip_existing=True, output_prefix=None, **kwargs) → pd.DataFrame

Run artifacts across multiple simulations in batch mode.

Executes artifact producers for all (or filtered) simulations, with options to skip already-completed artifacts and handle errors gracefully.

paramself

paramartifact_nameslist[str] | None

= None

Specific artifacts to run. If None, run all registered artifacts for each simulation's type.

paramsimulation_typestr | None

= None

If specified, only run for simulations of this type

paramskip_existingbool

= True

If True, skip artifacts that have already been run (default: True)

paramoutput_prefixstr | None

= None

paramkwargs

= {}

Returns

pandas.DataFrame

Summary DataFrame with columns:

hash: str - Simulation hash
simulation_type: str - Type (bilayer, mixedbox)
artifact_name: str - Name of artifact
status: str - "success", "skipped", or "failed"
error: str | None - Error message if failed
files: int | None - Number of files produced (if successful)
duration_seconds: float | None - Execution time

funcremove_all_artifacts(self, simulation_type=None) → pd.DataFrame

Remove all artifacts across simulations.

paramself

paramsimulation_typestr | None

= None

If specified, only remove artifacts for simulations of this type

Returns

pandas.DataFrame

Summary DataFrame with columns:

hash: str - Simulation hash
simulation_type: str - Type (bilayer, mixedbox)
status: str - "success" or "failed"
error: str | None - Error message if failed

funcrun_analyses_batch(self, analysis_names=None, simulation_type=None, skip_existing=True, **kwargs) → pd.DataFrame

Run analyses across multiple simulations in batch mode.

Executes analyses for all (or filtered) simulations, with options to skip already-completed analyses and handle errors gracefully.

paramself

paramanalysis_nameslist[str] | None

= None

Specific analyses to run. If None, run all registered analyses for each simulation's type.

paramsimulation_typestr | None

= None

If specified, only run for simulations of this type

paramskip_existingbool

= True

If True, skip analyses that have already been run (default: True)

paramkwargs

= {}

Returns

pandas.DataFrame

Summary DataFrame with columns:

hash: str - Simulation hash
simulation_type: str - Type (bilayer, mixedbox)
analysis_name: str - Name of analysis
status: str - "success", "skipped", or "failed"
error: str | None - Error message if failed
rows: int | None - Number of rows in result (if successful)
duration_seconds: float | None - Execution time

func_ensure_discovered(self) → None

Ensure discovery has been run at least once.

paramself

Returns

None

funcbuild_lnp_chemistry_table(self) → pd.DataFrame

Build table of LNP chemistry for all discovered simulations.

Extracts HL, CHL, and IL (ILN+ILP combined) counts, fractions, and SMILES for each simulation.

paramself

Returns

pandas.DataFrame

DataFrame with columns:

hash, path
HL_count, HL_fraction, HL_smiles
CHL_count, CHL_fraction, CHL_smiles
IL_count, IL_fraction, ILN_smiles, ILP_smiles

funcload_analysis_with_lnp_chemistry(self, analysis_name, missing_ok=False) → pd.DataFrame

Load analysis data joined with LNP chemistry metadata.

Loads the specified analysis from all simulations and joins with LNP chemistry (HL, CHL, IL counts/fractions/SMILES).

paramself

paramanalysis_namestr

Name of analysis to load

parammissing_okbool

= False

If True, skip simulations without this analysis; if False, raise error

Returns

pandas.DataFrame

DataFrame with analysis data joined with LNP chemistry. Includes 'hash' column for tracking source simulation.

funcbuild_all_species_table(self) → pd.DataFrame

Build table of all species for all discovered simulations.

Extracts count, fraction, and SMILES for every species defined in each simulation's YAML file. No filtering or grouping.

paramself

Returns

pandas.DataFrame

DataFrame with columns:

hash, path
{resname}_count, {resname}_fraction, {resname}_smiles (per species)
total_species_count, total_molecule_count

funcload_analysis_with_all_species(self, analysis_name, missing_ok=False) → pd.DataFrame

Load analysis data joined with all species metadata.

Loads the specified analysis from all simulations and joins with all species data (count, fraction, SMILES for every species).

paramself

paramanalysis_namestr

Name of analysis to load

parammissing_okbool

= False

If True, skip simulations without this analysis; if False, raise error

Returns

pandas.DataFrame

DataFrame with analysis data joined with all species. Includes 'hash' column for tracking source simulation.

funcbuild_chemistry_table(self, mode='all', species_groups=None) → pd.DataFrame

Build chemistry table with configurable extraction mode.

paramself

parammodestr

= 'all'

Extraction mode:

"all": Extract all species from YAML (default)
"lnp": Use LNP-specific grouping (HL, CHL, IL)
"custom": Use custom species_groups

paramspecies_groupsdict[str, list[str]] | None

= None

Required when mode="custom". Mapping of group names to resnames.

Returns

pandas.DataFrame

DataFrame with hash, path, and chemistry columns.

funcload_analysis_with_chemistry(self, analysis_name, mode='all', species_groups=None, missing_ok=False) → pd.DataFrame

Load analysis data joined with chemistry metadata.

paramself

paramanalysis_namestr

Name of analysis to load

parammodestr

= 'all'

Extraction mode: "all", "lnp", or "custom"

paramspecies_groupsdict[str, list[str]] | None

= None

Required when mode="custom"

parammissing_okbool

= False

If True, skip simulations without this analysis

Returns

pandas.DataFrame

DataFrame with analysis data joined with chemistry.

Attributes

Functions

On this page