YAML Format
Complete guide to the BuildInput YAML schema used by mdfactory build
This page documents the YAML input format used by:
mdfactory build <input.yaml> <output_dir>The YAML is parsed into the BuildInput model, which combines:
- top-level run settings (
simulation_type,engine,parametrization) systemcomposition (varies by simulation type)- optional
parametrization_config
Top-level schema
| Key | Required | Type | Notes |
|---|---|---|---|
simulation_type | Yes | string | One of mixedbox, bilayer, lnp |
system | Yes | object | Shape depends on simulation_type |
parametrization | No | string | cgenff or smirnoff (default: cgenff) |
parametrization_config | No | object | If omitted, defaults are injected based on parametrization |
engine | No | string | Currently only gromacs (default: gromacs) |
How system works
The system block is type-dispatched from simulation_type:
mixedbox->MixedBoxCompositionbilayer->BilayerCompositionlnp->LNPComposition
Each type has different required fields and validation rules.
Species block basics
Most system types include species entries with this pattern:
system:
species:
- smiles: "CCO"
resname: ETH
count: 100Species-level rules:
resnameis required and uppercased automatically.resnamemax length is 3 characters.smilesis canonicalized and validated; ambiguous stereochemistry is rejected.- Each species must include
countorfraction.
Composition-level rules:
- Use either counts for all species or fractions for all species.
- If fractions are used, they must sum exactly to
1.0andtotal_countis required. - Zero-count species are removed after validation.
Parametrization config
Default behavior
If parametrization_config is omitted:
parametrization: smirnoff-> defaultSmirnoffConfigparametrization: cgenff-> defaultCgenffConfig
SMIRNOFF config example
parametrization: smirnoff
parametrization_config:
type: smirnoff
forcefield: openff-2.2.0.offxml
water_model: opc3.offxml
charge_method: openff-gnn-am1bcc-0.1.0-rc.3.ptCGenFF config example
parametrization: cgenff
parametrization_config:
type: cgenffCGenFF installation path is configured globally via SILCSBIODIR in mdfactory config init / config file.
Ionization block
mixedbox, bilayer, and lnp all support:
ionization:
neutralize: true
concentration: 0.15
min_distance: 5.0
seed: 42Defaults:
neutralize: trueconcentration: 0.15min_distance: 5.0seed: null
Full examples by simulation type
These examples intentionally use compounds from the repository example inputs under examples/ so docs stay aligned with tested/public examples.
simulation_type: mixedbox
engine: gromacs
parametrization: smirnoff
system:
species:
- smiles: "O"
resname: SOL
fraction: 0.90
- smiles: "CCO"
resname: ETH
fraction: 0.10
total_count: 2000
target_density: 1.0
ionization:
neutralize: true
concentration: 0.15MixedBoxComposition fields:
species(SingleMoleculeSpecies[]) requiredtotal_countoptional if using counts, required if using fractionstarget_densityoptional, default1.0ionizationoptional, defaults applied
simulation_type: bilayer
engine: gromacs
parametrization: smirnoff
system:
species:
- smiles: "CCCCCCCC/C=C\\CCCCCCCC(=O)OC[C@H](CO[P@](=O)([O-])OCC[N+](C)(C)C)OC(=O)CCCCCCC/C=C\\CCCCCCCC"
resname: POC
count: 128
z_padding: 20.0
monolayer: false
ionization:
neutralize: trueBilayerComposition fields:
species(LipidSpecies[]) requiredz_paddingoptional, default20.0monolayeroptional, defaultfalseionizationoptional, defaults applied
Validation note:
- when
monolayer: false, each lipid species count must be even.
simulation_type: lnp
engine: gromacs
parametrization: smirnoff
system:
radius: 120.0
shell_thickness: 28.0
padding: 25.0
core:
species:
- smiles: "CCCCCCCC[C@@H](CCCCCC)C(=O)OCCCCCCN(CCCCO)CCCCCCOC(=O)[C@@H](CCCCCC)CCCCCCCC"
resname: ALN
fraction: 0.50
- smiles: "CCCCCCCC[C@@H](CCCCCC)C(=O)OCCCCCC[N@H+](CCCCO)CCCCCCOC(=O)[C@@H](CCCCCC)CCCCCCCC"
resname: ALP
fraction: 0.20
- smiles: "CC(C)CCC[C@@H](C)[C@H]1CC[C@H]2[C@@H]3CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3CC[C@]12C"
resname: CHL
fraction: 0.30
target_density: 0.95
shell:
species:
- smiles: "CCCCCCCC[C@@H](CCCCCC)C(=O)OCCCCCCN(CCCCO)CCCCCCOC(=O)[C@@H](CCCCCC)CCCCCCCC"
resname: ALN
fraction: 0.50
- smiles: "CCCCCCCCCCCCCCCCCC(=O)OC[C@H](CO[P@](=O)([O-])OCC[N+](C)(C)C)OC(=O)CCCCCCCCCCCCCCCCC"
resname: DSP
fraction: 0.10
- smiles: "CC(C)CCC[C@@H](C)[C@H]1CC[C@H]2[C@@H]3CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3CC[C@]12C"
resname: CHL
fraction: 0.38
- smiles: "CCCCCCCCCCCCCCCN(CCCCCCCCCCCCCC)C(=O)COCCOC"
resname: PEG
fraction: 0.02
z0: 10.0
area_per_lipid: 65.0
ionization:
neutralize: trueLNPComposition fields:
radiusrequired (> 0)shell_thicknessoptional, default28.0paddingoptional, default25.0corerequiredshellrequiredionizationoptional, defaults applied
LNP-specific behavior:
- core/shell species are fraction-based and must each sum to
1.0 - counts are computed automatically from geometry/density
core_radius = radius - shell_thicknessmust stay positive
Compounds used in docs YAML examples
The YAML examples on this page use these repository compounds:
| Resname | SMILES source |
|---|---|
SOL | examples/mixedbox/water_ethanol_smirnoff.yaml |
ETH | examples/mixedbox/water_ethanol_smirnoff.yaml |
POC | examples/bilayer/popc_smirnoff.yaml |
ALN | examples/lnp/lnp_alc0315_smirnoff.yaml |
ALP | examples/lnp/lnp_alc0315_smirnoff.yaml |
DSP | examples/lnp/lnp_alc0315_smirnoff.yaml |
CHL | examples/lnp/lnp_alc0315_smirnoff.yaml |
PEG | examples/lnp/lnp_alc0315_smirnoff.yaml |
What happens during parsing
When YAML is loaded:
simulation_typeselects the correct system model.- Validation checks species, fractions/counts, and type-specific constraints.
- Missing defaults are injected (engine, parametrization config, ionization defaults).
- A canonical hash is generated from the final normalized model payload.
This is why small YAML changes (including defaults becoming explicit) can change the resulting hash.
Common validation failures
- Invalid or ambiguous stereochemistry in SMILES
resnamelonger than 3 characters- Mixed usage of counts and fractions in one composition
- Fraction sums not equal to
1.0 - Missing
total_countwhen using fractions in mixedbox/bilayer - Bilayer odd lipid counts when
monolayer: false - LNP
radius <= shell_thickness
