SimulationStore
Discovers and manages multiple simulations across root directories.
Provides discovery, caching, and aggregation capabilities for working with multiple simulation directories.
Attributes
attributerootslist[Path]= [Path(roots)]List of root paths to search
attributetrajectory_filestr= trajectory_fileTrajectory filename to discover
attributestructure_filestr= structure_fileStructure filename to discover
attribute_simulationsdict[str, Simulation]= {}Dict mapping hash -> Simulation instance (cache)
attribute_discovery_dfpd.DataFrame | None= NoneCached discovery DataFrame
Functions
func__init__(self, roots, trajectory_file='prod.xtc', structure_file='system.pdb')Initialize store with one or more root paths.
paramselfparamrootslist[Path | str] | Path | strSingle path or list of paths to search
paramtrajectory_filestr= 'prod.xtc'Trajectory filename to discover
paramstructure_filestr= 'system.pdb'Structure filename to discover
Returns
Nonefuncdiscover(self, refresh=False) → pd.DataFrameDiscover simulations under all roots using existing discover_simulations().
Returns DataFrame with columns: ["hash", "path", "simulation"]
paramselfparamrefreshbool= FalseIf True, re-discover; if False, use cached result
Returns
pandas.DataFrameDataFrame with discovered simulations
funcget_simulation(self, hash) → SimulationGet Simulation instance for given hash.
Returns cached Simulation instance from discovery DataFrame.
paramselfparamhashstrSimulation hash (primary identifier)
Returns
SimulationSimulation instance
funclist_simulations(self) → list[str]Return sorted list of discovered simulation hashes.
paramselfReturns
listSorted list of hash strings
funcbuild_metadata_table(self, flatten_fn) → pd.DataFrameBuild flattened metadata table across all simulations.
Applies user-provided flatten function to each BuildInput to extract desired fields.
paramselfparamflatten_fnCallable[[BuildInput], dict]Function that takes BuildInput and returns flat dict
Returns
pandas.DataFrameDataFrame with one row per simulation, including:
- hash, path (from discovery)
- flattened fields (from flatten_fn)
funcload_analysis_with_metadata(self, analysis_name, flatten_fn, missing_ok=False) → pd.DataFrameEager load: join analysis data with metadata across simulations.
Loads the specified analysis from all simulations and joins with flattened metadata. Adds 'simulation_hash' column to track source.
paramselfparamanalysis_namestrName of analysis to load
paramflatten_fnCallable[[BuildInput], dict]Function to flatten BuildInput for each simulation
parammissing_okbool= FalseIf True, skip simulations without this analysis; if False, raise error
Returns
pandas.DataFrameDataFrame with analysis data joined with metadata. Includes 'simulation_hash' column.
funcremove_all_analyses(self, simulation_type=None) → pd.DataFrameRemove all analyses from discovered simulations.
Useful for cleaning up storage or resetting analysis state.
paramselfparamsimulation_typestr | None= NoneIf specified, only remove analyses for this simulation type.
Returns
pandas.DataFrameSummary DataFrame with columns:
- hash: str - Simulation hash
- simulation_type: str - Type (bilayer, mixedbox)
- status: str - "success" or "failed"
- error: str | None - Error message if failed
funclist_analyses_status(self, simulation_type=None) → pd.DataFrameList available and completed analyses across all simulations.
Shows which analyses CAN be run for each simulation type (from ANALYSIS_REGISTRY) and which HAVE been run (from AnalysisRegistry).
paramselfparamsimulation_typestr | None= NoneIf specified, filter to only this simulation type
Returns
pandas.DataFrameLong-form DataFrame with columns:
- hash: str - Simulation hash
- simulation_type: str - Type (bilayer, mixedbox)
- analysis_name: str - Name of analysis
- status: str - "available" or "completed"
funclist_artifacts_status(self, simulation_type=None) → pd.DataFrameList available and completed artifacts across all simulations.
Shows which artifacts CAN be run for each simulation type (from ARTIFACT_REGISTRY) and which HAVE been run (from AnalysisRegistry).
paramselfparamsimulation_typestr | None= NoneIf specified, filter to only this simulation type
Returns
pandas.DataFrameLong-form DataFrame with columns:
- hash: str - Simulation hash
- simulation_type: str - Type (bilayer, mixedbox)
- artifact_name: str - Name of artifact
- status: str - "not yet run" or "completed"
funcrun_artifacts_batch(self, artifact_names=None, simulation_type=None, skip_existing=True, output_prefix=None, **kwargs) → pd.DataFrameRun artifacts across multiple simulations in batch mode.
Executes artifact producers for all (or filtered) simulations, with options to skip already-completed artifacts and handle errors gracefully.
paramselfparamartifact_nameslist[str] | None= NoneSpecific artifacts to run. If None, run all registered artifacts for each simulation's type.
paramsimulation_typestr | None= NoneIf specified, only run for simulations of this type
paramskip_existingbool= TrueIf True, skip artifacts that have already been run (default: True)
paramoutput_prefixstr | None= Noneparamkwargs= {}Returns
pandas.DataFrameSummary DataFrame with columns:
- hash: str - Simulation hash
- simulation_type: str - Type (bilayer, mixedbox)
- artifact_name: str - Name of artifact
- status: str - "success", "skipped", or "failed"
- error: str | None - Error message if failed
- files: int | None - Number of files produced (if successful)
- duration_seconds: float | None - Execution time
funcremove_all_artifacts(self, simulation_type=None) → pd.DataFrameRemove all artifacts across simulations.
paramselfparamsimulation_typestr | None= NoneIf specified, only remove artifacts for simulations of this type
Returns
pandas.DataFrameSummary DataFrame with columns:
- hash: str - Simulation hash
- simulation_type: str - Type (bilayer, mixedbox)
- status: str - "success" or "failed"
- error: str | None - Error message if failed
funcrun_analyses_batch(self, analysis_names=None, simulation_type=None, skip_existing=True, **kwargs) → pd.DataFrameRun analyses across multiple simulations in batch mode.
Executes analyses for all (or filtered) simulations, with options to skip already-completed analyses and handle errors gracefully.
paramselfparamanalysis_nameslist[str] | None= NoneSpecific analyses to run. If None, run all registered analyses for each simulation's type.
paramsimulation_typestr | None= NoneIf specified, only run for simulations of this type
paramskip_existingbool= TrueIf True, skip analyses that have already been run (default: True)
paramkwargs= {}Returns
pandas.DataFrameSummary DataFrame with columns:
- hash: str - Simulation hash
- simulation_type: str - Type (bilayer, mixedbox)
- analysis_name: str - Name of analysis
- status: str - "success", "skipped", or "failed"
- error: str | None - Error message if failed
- rows: int | None - Number of rows in result (if successful)
- duration_seconds: float | None - Execution time
func_ensure_discovered(self) → NoneEnsure discovery has been run at least once.
paramselfReturns
Nonefuncbuild_lnp_chemistry_table(self) → pd.DataFrameBuild table of LNP chemistry for all discovered simulations.
Extracts HL, CHL, and IL (ILN+ILP combined) counts, fractions, and SMILES for each simulation.
paramselfReturns
pandas.DataFrameDataFrame with columns:
- hash, path
- HL_count, HL_fraction, HL_smiles
- CHL_count, CHL_fraction, CHL_smiles
- IL_count, IL_fraction, ILN_smiles, ILP_smiles
funcload_analysis_with_lnp_chemistry(self, analysis_name, missing_ok=False) → pd.DataFrameLoad analysis data joined with LNP chemistry metadata.
Loads the specified analysis from all simulations and joins with LNP chemistry (HL, CHL, IL counts/fractions/SMILES).
paramselfparamanalysis_namestrName of analysis to load
parammissing_okbool= FalseIf True, skip simulations without this analysis; if False, raise error
Returns
pandas.DataFrameDataFrame with analysis data joined with LNP chemistry. Includes 'hash' column for tracking source simulation.
funcbuild_all_species_table(self) → pd.DataFrameBuild table of all species for all discovered simulations.
Extracts count, fraction, and SMILES for every species defined in each simulation's YAML file. No filtering or grouping.
paramselfReturns
pandas.DataFrameDataFrame with columns:
- hash, path
- {resname}_count, {resname}_fraction, {resname}_smiles (per species)
- total_species_count, total_molecule_count
funcload_analysis_with_all_species(self, analysis_name, missing_ok=False) → pd.DataFrameLoad analysis data joined with all species metadata.
Loads the specified analysis from all simulations and joins with all species data (count, fraction, SMILES for every species).
paramselfparamanalysis_namestrName of analysis to load
parammissing_okbool= FalseIf True, skip simulations without this analysis; if False, raise error
Returns
pandas.DataFrameDataFrame with analysis data joined with all species. Includes 'hash' column for tracking source simulation.
funcbuild_chemistry_table(self, mode='all', species_groups=None) → pd.DataFrameBuild chemistry table with configurable extraction mode.
paramselfparammodestr= 'all'Extraction mode:
- "all": Extract all species from YAML (default)
- "lnp": Use LNP-specific grouping (HL, CHL, IL)
- "custom": Use custom species_groups
paramspecies_groupsdict[str, list[str]] | None= NoneRequired when mode="custom". Mapping of group names to resnames.
Returns
pandas.DataFrameDataFrame with hash, path, and chemistry columns.
funcload_analysis_with_chemistry(self, analysis_name, mode='all', species_groups=None, missing_ok=False) → pd.DataFrameLoad analysis data joined with chemistry metadata.
paramselfparamanalysis_namestrName of analysis to load
parammodestr= 'all'Extraction mode: "all", "lnp", or "custom"
paramspecies_groupsdict[str, list[str]] | None= NoneRequired when mode="custom"
parammissing_okbool= FalseIf True, skip simulations without this analysis
Returns
pandas.DataFrameDataFrame with analysis data joined with chemistry.
