MDFactoryMDFactory
User Guide

Sync & Databases

Push and pull systems, analyses, and artifacts across supported backends

The sync commands use the backend configured in your active config file (mdfactory config path). The current code supports:

SQLite

Default local backend using `runs.db` and `analysis.db` files.

CSV

File-based backend using `runs.csv` plus per-table analysis/artifact CSV files.

Foundry

Enterprise backend with configured dataset paths for runs, analyses, and artifacts.

Initialize backend storage

Initialization creates the backend objects used by sync commands: SQL tables for SQLite, CSV files/directories for CSV mode, or datasets for Foundry mode. You only need to run these when setting up a new backend or after intentionally resetting storage.

mdfactory sync init systems
mdfactory sync init analysis
mdfactory sync init artifacts

Use --force to reset and recreate existing tables or datasets.

For Foundry, configure and validate first:

mdfactory config init
mdfactory sync init check

sync init check verifies Foundry connectivity and checks that configured paths exist (BASE_PATH, RUN_DB_PATH, ANALYSIS_DB_PATH, ARTIFACT_DB_PATH).

Push commands

Push commands scan local simulation folders and write structured metadata/results to the configured backend. They support duplicate-handling modes:

  • default: raises on conflicting duplicate keys
  • --diff: skip existing keys
  • --force: overwrite existing keys

For sync push systems, sync push analysis, and sync push artifacts, the code expects exactly one of:

  • a positional SOURCE
  • --csv

Push systems

sync push systems discovers simulation folders, validates BuildInput YAML, computes simulation status, and writes run-level records into RUN_DATABASE.

From a build summary YAML:

mdfactory sync push systems build_summary.yaml

From a directory root:

mdfactory sync push systems /path/to/simulations

From a CSV plus an explicit search root:

mdfactory sync push systems \
  --csv sample_input.csv \
  --csv-root /path/to/output_systems

Conflict-handling flags:

mdfactory sync push systems build_summary.yaml --diff
mdfactory sync push systems build_summary.yaml --force

Push analyses

sync push analysis reads completed analysis outputs from .analysis/<analysis_name>.parquet, stores row/column metadata, and serializes the analysis frame into data_csv for backend retrieval.

mdfactory sync push analysis build_summary.yaml
mdfactory sync push analysis build_summary.yaml --analysis-name area_per_lipid

Push artifacts

sync push artifacts reads artifact entries from analysis metadata and stores file lists plus checksums in artifact tables.

mdfactory sync push artifacts build_summary.yaml
mdfactory sync push artifacts build_summary.yaml --artifact-name bilayer_snapshot

Pull commands

Pull commands query the backend and print to CLI by default. When you pass --output, results are written to .csv (default) or .json (JSON lines) and include full columns.

Systems

mdfactory sync pull systems --status completed --simulation-type bilayer

Write to disk:

mdfactory sync pull systems --output systems.csv
mdfactory sync pull systems --full --output systems.json

Without --full, CLI output is a summary view (hash, simulation_type, parametrization, status, directory).

Analyses

Overview:

mdfactory sync pull analysis --overview

Specific analysis table:

mdfactory sync pull analysis \
  --analysis-name area_per_lipid \
  --hash ABC123 \
  --output apl.csv

Use --overview to inspect completion status across analyses; use --analysis-name to retrieve records from a specific ANALYSIS_* table.

Artifacts

Overview:

mdfactory sync pull artifacts --overview

Specific artifact table:

mdfactory sync pull artifacts \
  --artifact-name bilayer_snapshot \
  --output snapshots.csv

Use --overview for completion/status-level tracking and --artifact-name for detailed file/checksum metadata.

Output shape (table structure)

The sync backend stores four logical record types:

Logical tablePrimary key behaviorKey columns
RUN_DATABASEkeyed by hashhash, engine, parametrization, simulation_type, directory, status, timestamp_utc, plus input_data and input_data_type
ANALYSIS_OVERVIEWkeyed by (hash, item_type, item_name)hash, simulation_type, directory, item_type, item_name, status, row_count, file_count, updated_at
ANALYSIS_<name>keyed by hashhash, directory, simulation_type, row_count, columns, data_csv, data_path, timestamp_utc
ARTIFACT_<name>keyed by hashhash, directory, simulation_type, file_count, files, checksums, timestamp_utc

ANALYSIS_<name>.columns, ARTIFACT_<name>.files, and ARTIFACT_<name>.checksums are JSON-encoded strings. ANALYSIS_<name>.data_csv stores the serialized analysis dataframe so results can be pulled directly from the backend without local parquet access.

Backend file/dataset layout

  • SQLite: RUN_DATABASE is stored in sqlite.RUN_DB_PATH; analysis/artifact/overview tables are stored in sqlite.ANALYSIS_DB_PATH.
  • CSV: RUN_DATABASE is csv.RUN_DB_PATH; analysis/artifact/overview are per-table files under csv.ANALYSIS_DB_PATH (for example ANALYSIS_OVERVIEW.csv and ANALYSIS_area_per_lipid.csv).
  • Foundry: Runs use foundry.RUN_DB_PATH; analysis tables are under foundry.ANALYSIS_DB_PATH; artifact tables are under foundry.ARTIFACT_DB_PATH.

Clear commands

Clear commands delete rows from selected sync tables/datasets after confirmation prompts.

Clear runs:

mdfactory sync clear systems

Clear specific analysis or artifact tables:

mdfactory sync clear analysis --analysis-name area_per_lipid
mdfactory sync clear analysis --artifact-name bilayer_snapshot

Clear grouped analysis storage:

mdfactory sync clear analysis --overview
mdfactory sync clear analysis --analyses
mdfactory sync clear analysis --artifacts
mdfactory sync clear analysis --all

Clear everything, including runs:

mdfactory sync clear all

Next steps

On this page