MDFactoryMDFactory
User Guide

CSV Format

Bulk input specification for high-throughput simulation setup

Use CSV files to define multiple simulation systems at once. Each row becomes a separate BuildInput YAML file via mdfactory prepare-build.

Basic structure

simulation_type,parametrization,engine,system.species.SOL.smiles,system.species.SOL.count,system.species.ETH.smiles,system.species.ETH.count,system.target_density
mixedbox,smirnoff,gromacs,O,900,CCO,100,1.0
mixedbox,cgenff,gromacs,O,800,CCO,200,1.0

Required columns

  • simulation_type (mixedbox, bilayer, or lnp)
  • parametrization (smirnoff or cgenff)
  • engine (gromacs)

Column naming convention

Columns map to the nested BuildInput YAML structure using dot notation:

system.species.{RESNAME}.{property}

For example:

  • system.species.SOL.smiles → the SMILES string for a species with resname SOL
  • system.species.SOL.count → the count for that species
  • system.species.SOL.fraction → the molar fraction for that species
  • system.target_density → the target density for a mixedbox system

Species specification

Define molecules using the pattern system.species.{RESNAME}.{property}:

PropertyTypeDescription
smilesstringSMILES structure
countintegerAbsolute molecule count
fractionfloatMolar fraction (0-1)

Either count or fraction must be provided for each species. If using fractions, system.total_count specifies the total number of molecules.

Example: Mixed box with multiple components

simulation_type,parametrization,engine,system.species.SOL.smiles,system.species.SOL.fraction,system.species.ETH.smiles,system.species.ETH.count,system.total_count,system.target_density
mixedbox,smirnoff,gromacs,O,0.95,CCO,500,10000,1.0

Example: Bilayer

simulation_type,parametrization,engine,system.species.POPC.smiles,system.species.POPC.count,system.z_padding
bilayer,cgenff,gromacs,POPC_SMILES_HERE,128,20.0

Example: LNP

For LNP systems, core and shell species use separate column prefixes:

simulation_type,parametrization,engine,system.radius,system.shell_thickness,system.core.species.ILN.smiles,system.core.species.ILN.fraction,system.shell.species.DSP.smiles,system.shell.species.DSP.fraction
lnp,smirnoff,gromacs,60.0,28.0,ILN_SMILES,0.5,DSP_SMILES,0.5

Processing the CSV

Convert CSV to individual system directories:

mdfactory prepare-build sample_input.csv output_systems

Each row generates a directory named by the system's hash, containing the BuildInput YAML file.

Validation

Check that a CSV is valid without building:

mdfactory check-csv sample_input.csv

This parses every row into a BuildInput model and reports validation errors.

Conversion flow

Column reference

Top-level settings

ColumnTypeRequiredDescription
simulation_typestringYesmixedbox, bilayer, or lnp
parametrizationstringYessmirnoff or cgenff
enginestringYesgromacs

System fields

These map directly to BuildInput.system fields using dot notation:

ColumnTypeDescription
system.total_countintegerTotal molecule count (for use with fractions)
system.target_densityfloatTarget density in g/cm³ (mixedbox)
system.z_paddingfloatWater padding in Å (bilayer)
system.monolayerboolBuild monolayer (bilayer)
system.radiusfloatLNP radius in Å
system.shell_thicknessfloatShell thickness in Å
system.paddingfloatWater padding in Å (LNP)
system.core.target_densityfloatCore density in g/cm³ (LNP)
system.shell.z0floatPivotal plane offset in Å (LNP)
system.shell.area_per_lipidfloatArea per lipid in Ų (LNP)

Parametrization config (optional)

ColumnTypeDescription
parametrization_config.forcefieldstringOpenFF force field file (SMIRNOFF)
parametrization_config.water_modelstringWater model file (SMIRNOFF)
parametrization_config.charge_methodstringCharge assignment method (SMIRNOFF)

Invalid SMILES strings will cause parametrization to fail. Use mdfactory check-csv to validate before building.

Next steps

On this page