utils
Shared analysis utilities for simulation discovery and metadata extraction.
attributeLNP_SPECIES_GROUPS= {'HL': ['HL'], 'CHL': ['CHL'], 'IL': ['ILN', 'ILP']}attributeextract_lnp_chemistry= make_chemistry_extractor(LNP_SPECIES_GROUPS)funcdiscover_simulations(base_dir, trajectory_file='prod.xtc', structure_file='system.pdb', min_status=None) → pd.DataFrameDiscover simulation directories and create Simulation instances.
Scans a base directory for subdirectories containing simulation files (structure and YAML BuildInput) and creates Simulation instances for each discovered simulation.
parambase_dirPath | strBase directory to scan for simulations
paramtrajectory_filestr= 'prod.xtc'Name of trajectory file (default: prod.xtc)
paramstructure_filestr= 'system.pdb'Name of structure file (default: system.pdb)
parammin_statusstr | None= NoneMinimum status to include. One of: "build", "equilibrated", "production", "completed". If None, defaults to "production" for backward compatibility (requires trajectory to exist).
Returns
pandas.DataFrameDataFrame with columns ['hash', 'path', 'simulation', 'status'] where:
- hash: Simulation hash (primary identifier)
- path: Absolute path to simulation directory
- simulation: Simulation instance
- status: Simulation status string
funcflatten_species_composition(build_input, prefix='') → dict[str, Any]Flatten species composition into a flat dictionary.
Extracts species counts and fractions as separate columns with naming convention: {prefix}{resname}_{metric}
parambuild_inputBuildInputBuildInput instance to extract metadata from
paramprefixstr= ''Optional prefix for column names (default: "")
Returns
dictFlattened dict with keys:
- simulation_type: str
- total_count: int
- {prefix}{resname}_count: int
- {prefix}{resname}_fraction: float
funcflatten_system_parameters(build_input) → dict[str, Any]Flatten system-specific parameters (z_padding, target_density, etc.).
Useful for analyses focusing on system configuration rather than composition.
parambuild_inputBuildInputBuildInput instance to extract metadata from
Returns
dictFlattened dict with system-specific parameters
funcmake_chemistry_extractor(species_groups) → CallableCreate a flatten function that extracts chemistry based on species groupings.
Allows flexible mapping of resnames to output groups, with support for merging multiple resnames into one group (e.g., protonation states).
paramspecies_groupsdict[str, list[str]]Mapping of output group name to list of resnames to include. Single-item lists extract that resname directly. Multi-item lists merge those resnames (summing counts/fractions).
Example
{ "HL": ["HL"], # single resname "CHL": ["CHL"], # single resname "IL": ["ILN", "ILP"], # merged group }
Returns
callableA function(build_input) -> dict that extracts chemistry.
Output columns for single-resname groups:
- {group}_count, {group}_fraction, {group}_smiles
Output columns for merged groups:
- {group}_count, {group}_fraction (summed totals)
- {resname}_count, {resname}_fraction, {resname}_smiles (per member)
funcextract_all_species(build_input) → dict[str, Any]Extract all species data from BuildInput without any grouping or filtering.
Automatically extracts count, fraction, and SMILES for every species defined in the YAML file. No configuration needed.
parambuild_inputBuildInputBuildInput instance to extract chemistry from
Returns
dictDict with keys for each species resname:
- {resname}_count: int
- {resname}_fraction: float
- {resname}_smiles: str | None
- total_species_count: int (number of species types)
- total_molecule_count: int (sum of all counts)
funcget_chemistry_extractor(mode='all', species_groups=None) → CallableGet a chemistry extractor function based on mode.
Convenience function for selecting between extraction modes.
parammodestr= 'all'Extraction mode:
- "all": Extract all species from YAML (no filtering/grouping)
- "lnp": Use LNP-specific grouping (HL, CHL, IL with ILN+ILP merged)
- "custom": Use custom species_groups (requires species_groups param)
paramspecies_groupsdict[str, list[str]] | None= NoneRequired when mode="custom". Mapping of group names to resnames.
Returns
callableA function(build_input) -> dict that extracts chemistry.
funcsystem_chemistry(simulation, **kwargs) → pd.DataFrameExtract species composition as a long-format DataFrame.
One row per species in the simulation. Reads only from BuildInput metadata (no trajectory data required).
paramsimulationSimulationSimulation instance (only build_input is accessed)
paramkwargs= {}Returns
pandas.DataFrameDataFrame with columns:
- resname: str - Residue name
- smiles: str | None - SMILES string (None for base Species)
- count: int - Molecule count
- fraction: float - Mole fraction
- simulation_type: str - e.g. "bilayer", "mixedbox"
