MDFactoryMDFactory

utils

Shared analysis utilities for simulation discovery and metadata extraction.

attributeLNP_SPECIES_GROUPS
= {'HL': ['HL'], 'CHL': ['CHL'], 'IL': ['ILN', 'ILP']}
attributeextract_lnp_chemistry
= make_chemistry_extractor(LNP_SPECIES_GROUPS)
funcdiscover_simulations(base_dir, trajectory_file='prod.xtc', structure_file='system.pdb', min_status=None)pd.DataFrame

Discover simulation directories and create Simulation instances.

Scans a base directory for subdirectories containing simulation files (structure and YAML BuildInput) and creates Simulation instances for each discovered simulation.

parambase_dirPath | str

Base directory to scan for simulations

paramtrajectory_filestr
= 'prod.xtc'

Name of trajectory file (default: prod.xtc)

paramstructure_filestr
= 'system.pdb'

Name of structure file (default: system.pdb)

parammin_statusstr | None
= None

Minimum status to include. One of: "build", "equilibrated", "production", "completed". If None, defaults to "production" for backward compatibility (requires trajectory to exist).

Returns

pandas.DataFrame

DataFrame with columns ['hash', 'path', 'simulation', 'status'] where:

  • hash: Simulation hash (primary identifier)
  • path: Absolute path to simulation directory
  • simulation: Simulation instance
  • status: Simulation status string
funcflatten_species_composition(build_input, prefix='')dict[str, Any]

Flatten species composition into a flat dictionary.

Extracts species counts and fractions as separate columns with naming convention: {prefix}{resname}_{metric}

parambuild_inputBuildInput

BuildInput instance to extract metadata from

paramprefixstr
= ''

Optional prefix for column names (default: "")

Returns

dict

Flattened dict with keys:

  • simulation_type: str
  • total_count: int
  • {prefix}{resname}_count: int
  • {prefix}{resname}_fraction: float
funcflatten_system_parameters(build_input)dict[str, Any]

Flatten system-specific parameters (z_padding, target_density, etc.).

Useful for analyses focusing on system configuration rather than composition.

parambuild_inputBuildInput

BuildInput instance to extract metadata from

Returns

dict

Flattened dict with system-specific parameters

funcmake_chemistry_extractor(species_groups)Callable

Create a flatten function that extracts chemistry based on species groupings.

Allows flexible mapping of resnames to output groups, with support for merging multiple resnames into one group (e.g., protonation states).

paramspecies_groupsdict[str, list[str]]

Mapping of output group name to list of resnames to include. Single-item lists extract that resname directly. Multi-item lists merge those resnames (summing counts/fractions).

Example

{ "HL": ["HL"], # single resname "CHL": ["CHL"], # single resname "IL": ["ILN", "ILP"], # merged group }

Returns

callable

A function(build_input) -> dict that extracts chemistry.

Output columns for single-resname groups:

  • {group}_count, {group}_fraction, {group}_smiles

Output columns for merged groups:

  • {group}_count, {group}_fraction (summed totals)
  • {resname}_count, {resname}_fraction, {resname}_smiles (per member)
funcextract_all_species(build_input)dict[str, Any]

Extract all species data from BuildInput without any grouping or filtering.

Automatically extracts count, fraction, and SMILES for every species defined in the YAML file. No configuration needed.

parambuild_inputBuildInput

BuildInput instance to extract chemistry from

Returns

dict

Dict with keys for each species resname:

  • {resname}_count: int
  • {resname}_fraction: float
  • {resname}_smiles: str | None
  • total_species_count: int (number of species types)
  • total_molecule_count: int (sum of all counts)
funcget_chemistry_extractor(mode='all', species_groups=None)Callable

Get a chemistry extractor function based on mode.

Convenience function for selecting between extraction modes.

parammodestr
= 'all'

Extraction mode:

  • "all": Extract all species from YAML (no filtering/grouping)
  • "lnp": Use LNP-specific grouping (HL, CHL, IL with ILN+ILP merged)
  • "custom": Use custom species_groups (requires species_groups param)
paramspecies_groupsdict[str, list[str]] | None
= None

Required when mode="custom". Mapping of group names to resnames.

Returns

callable

A function(build_input) -> dict that extracts chemistry.

funcsystem_chemistry(simulation, **kwargs)pd.DataFrame

Extract species composition as a long-format DataFrame.

One row per species in the simulation. Reads only from BuildInput metadata (no trajectory data required).

paramsimulationSimulation

Simulation instance (only build_input is accessed)

paramkwargs
= {}

Returns

pandas.DataFrame

DataFrame with columns:

  • resname: str - Residue name
  • smiles: str | None - SMILES string (None for base Species)
  • count: int - Molecule count
  • fraction: float - Mole fraction
  • simulation_type: str - e.g. "bilayer", "mixedbox"