Tutorial: 20 Mixed-Box Campaign
Hands-on walkthrough for a 20-system water plus alcohol chain-length campaign, local analysis, SQLite sync, and notebook aggregation
This tutorial walks through an end-to-end example for 20 different mixedbox systems using only workflows that exist in the current codebase.
The campaign varies alcohol chain length and composition at the same time:
- methanol
- ethanol
- 1-propanol
- 1-butanol
- 1-pentanol
For each alcohol, the tutorial uses four solvent fractions. That gives a total of 20 systems while keeping the setup easy to inspect.
- prepare 20 systems from one CSV
- build all simulation directories
- optionally run the checked-in GROMACS Nextflow workflow
- run the current mixed-box analysis
- push systems and analysis results into SQLite
- pull the data back out
- aggregate and plot it in a notebook with the Python API
In the current codebase, mixedbox exposes one registered analysis: system_chemistry. That analysis reads from the validated BuildInput metadata and does not require a trajectory. This makes it a good tutorial target for an end-to-end setup, analysis, and sync workflow.
Files already in the repository
Reference mixed-box YAML examples already exist in:
examples/mixedbox/water_ethanol_smirnoff.yamlexamples/mixedbox/water_ethanol_cgenff.yaml
This tutorial adds a larger CSV-style campaign example. The full 20-system input file is available here:
Campaign design
The example campaign holds these settings fixed:
simulation_type: mixedboxengine: gromacsparametrization: smirnoffsystem.total_count: 2000system.target_density: 1.0
The campaign varies:
- alcohol identity through SMILES
- alcohol residue name
- alcohol fraction in water
The 20 systems are a 5 x 4 grid:
- 5 alcohols: methanol, ethanol, 1-propanol, 1-butanol, 1-pentanol
- 4 fractions: 0.05, 0.15, 0.25, 0.35
First few rows:
simulation_type,parametrization,engine,system.species.SOL.smiles,system.species.SOL.fraction,system.species.MEOH.smiles,system.species.MEOH.fraction,system.total_count,system.target_density
mixedbox,smirnoff,gromacs,O,0.95,CO,0.05,2000,1.0
mixedbox,smirnoff,gromacs,O,0.85,CO,0.15,2000,1.0
mixedbox,smirnoff,gromacs,O,0.75,CO,0.25,2000,1.0
mixedbox,smirnoff,gromacs,O,0.65,CO,0.35,2000,1.0
mixedbox,smirnoff,gromacs,O,0.95,CCO,0.05,2000,1.01. Configure SQLite
Initialize config with the wizard:
mdfactory config initFor this tutorial, choose the sqlite backend. You can verify the active config path with:
mdfactory config pathYour config should include SQLite paths similar to:
[database]
TYPE = sqlite
[sqlite]
RUN_DB_PATH = <platform data dir>/runs.db
ANALYSIS_DB_PATH = <platform data dir>/analysis.db2. Prepare the 20 build directories
mkdir -p tutorial_campaign
mdfactory prepare-build \
docs/public/tutorials/mixedbox_chain_length_series_20.csv \
tutorial_campaignThis creates:
- one hash directory per row
- one summary YAML named after the CSV stem
For this tutorial, the summary YAML will be:
tutorial_campaign/mixedbox_chain_length_series_20.yaml3. Build all 20 systems
One simple local loop is:
for yml in tutorial_campaign/*/*.yaml; do
mdfactory build "$yml" "$(dirname "$yml")"
doneAfter this step, each hash directory should contain the standard build outputs such as:
system.pdbtopology.topem.mdpnvt.mdpnpt.mdpmd.mdp
4. Optional: run the checked-in GROMACS workflow
If you want to continue through a full simulation campaign, use the stage-2 Nextflow workflow:
nextflow run workflows/simulate.nf \
-c workflows/simulate.config \
--base_dir tutorial_campaign \
--config_yaml tutorial_campaign/mixedbox_chain_length_series_20.yamlCluster-specific configuration
The shipped simulate.config contains SLURM settings tuned for a specific cluster. Edit this file to match your HPC environment before running. See Running on HPC Clusters for details.
This executes the fixed sequence implemented in workflows/simulate.nf:
- minimization
- NVT equilibration
- NPT equilibration
- production
You do not need to finish the MD run to continue with the analysis example below, because system_chemistry does not require a trajectory.
5. Run the current mixed-box analysis
Run the registered mixed-box analysis on the full campaign:
mdfactory analysis run \
tutorial_campaign/mixedbox_chain_length_series_20.yaml \
--analysis system_chemistryInspect status:
mdfactory analysis info tutorial_campaign/mixedbox_chain_length_series_20.yamlThe analysis output is written per simulation under:
.analysis/system_chemistry.parquet6. Initialize the SQLite sync tables
mdfactory sync init systems
mdfactory sync init analysis
mdfactory sync init artifactsFor this tutorial, the important tables are the run database and the analysis tables.
7. Push systems and analysis into SQLite
Push the system metadata:
mdfactory sync push systems tutorial_campaign/mixedbox_chain_length_series_20.yamlPush the system_chemistry analysis:
mdfactory sync push analysis \
tutorial_campaign/mixedbox_chain_length_series_20.yaml \
--analysis-name system_chemistry8. Pull the results back out
Pull the system summary:
mdfactory sync pull systems \
--simulation-type mixedbox \
--output tutorial_systems.csvPull the analysis table:
mdfactory sync pull analysis \
--analysis-name system_chemistry \
--simulation-type mixedbox \
--output tutorial_system_chemistry.csvSee the analysis overview table:
mdfactory sync pull analysis --overview9. Aggregate the campaign in a notebook
If you completed the simulation run step, each directory should now include prod.xtc, so the campaign can be discovered directly with SimulationStore.
Notebook cell: load and join analysis data
import matplotlib.pyplot as plt
import pandas as pd
from mdfactory.analysis.store import SimulationStore
from mdfactory.analysis.utils import flatten_system_parameters
store = SimulationStore("tutorial_campaign")
store.discover()
chemistry = store.load_analysis_with_metadata(
"system_chemistry",
flatten_fn=flatten_system_parameters,
missing_ok=True,
)
chemistry.head()This gives you:
- one row per species per simulation from
system_chemistry - the simulation hash
- flattened metadata such as
simulation_type,engine,parametrization, andtarget_density
Notebook cell: annotate alcohol chain length
chain_length = {
"MEOH": 1,
"ETH": 2,
"PRO": 3,
"BUT": 4,
"PEN": 5,
}
alcohols = chemistry[chemistry["resname"] != "SOL"].copy()
alcohols["chain_length"] = alcohols["resname"].map(chain_length)
alcohols[["hash", "resname", "chain_length", "count", "fraction"]].head()Notebook cell: plot count versus fraction by chain length
fig, ax = plt.subplots(figsize=(7, 4))
for chain_len, frame in alcohols.groupby("chain_length"):
frame = frame.sort_values("fraction")
ax.plot(
frame["fraction"],
frame["count"],
marker="o",
label=f"C{chain_len} alcohol",
)
ax.set_xlabel("Alcohol mole fraction")
ax.set_ylabel("Alcohol count")
ax.set_title("20-system water/alcohol chain-length campaign")
ax.legend()
plt.tight_layout()
plt.show()Alternative notebook path: work from the pulled SQLite CSV
If you skipped the simulation run and only want to work from the synchronized analysis export:
import matplotlib.pyplot as plt
import pandas as pd
chemistry = pd.read_csv("tutorial_system_chemistry.csv")
chain_length = {
"MEOH": 1,
"ETH": 2,
"PRO": 3,
"BUT": 4,
"PEN": 5,
}
alcohols = chemistry[chemistry["resname"] != "SOL"].copy()
alcohols["chain_length"] = alcohols["resname"].map(chain_length)
fig, ax = plt.subplots(figsize=(7, 4))
for chain_len, frame in alcohols.groupby("chain_length"):
frame = frame.sort_values("fraction")
ax.plot(frame["fraction"], frame["count"], marker="o", label=f"C{chain_len}")
ax.set_xlabel("Alcohol mole fraction")
ax.set_ylabel("Alcohol count")
ax.set_title("20-system water/alcohol chain-length campaign")
ax.legend(title="Series")
plt.tight_layout()
plt.show()Where this fits in the architecture
This tutorial exercises the main architectural layers:
Concretely:
- build input validation is handled by
BuildInputand the composition models - system construction is dispatched from
workflows.pyintobuild.py - analysis execution and local storage run through
Simulation - push and pull flows use the configured backend through
DataManager
For the broader code structure, see Architecture.
