Sync & Databases
Push and pull systems, analyses, and artifacts across supported backends
The sync commands use the backend configured in your active config file (mdfactory config path). The current code supports:
SQLite
Default local backend using `runs.db` and `analysis.db` files.
CSV
File-based backend using `runs.csv` plus per-table analysis/artifact CSV files.
Foundry
Enterprise backend with configured dataset paths for runs, analyses, and artifacts.
Initialize backend storage
Initialization creates the backend objects used by sync commands: SQL tables for SQLite, CSV files/directories for CSV mode, or datasets for Foundry mode. You only need to run these when setting up a new backend or after intentionally resetting storage.
mdfactory sync init systems
mdfactory sync init analysis
mdfactory sync init artifactsUse --force to reset and recreate existing tables or datasets.
For Foundry, configure and validate first:
mdfactory config init
mdfactory sync init checksync init check verifies Foundry connectivity and checks that configured paths exist (BASE_PATH, RUN_DB_PATH, ANALYSIS_DB_PATH, ARTIFACT_DB_PATH).
Push commands
Push commands scan local simulation folders and write structured metadata/results to the configured backend. They support duplicate-handling modes:
- default: raises on conflicting duplicate keys
--diff: skip existing keys--force: overwrite existing keys
For sync push systems, sync push analysis, and sync push artifacts, the code expects exactly one of:
- a positional
SOURCE --csv
Push systems
sync push systems discovers simulation folders, validates BuildInput YAML, computes simulation status, and writes run-level records into RUN_DATABASE.
From a build summary YAML:
mdfactory sync push systems build_summary.yamlFrom a directory root:
mdfactory sync push systems /path/to/simulationsFrom a CSV plus an explicit search root:
mdfactory sync push systems \
--csv sample_input.csv \
--csv-root /path/to/output_systemsConflict-handling flags:
mdfactory sync push systems build_summary.yaml --diff
mdfactory sync push systems build_summary.yaml --forcePush analyses
sync push analysis reads completed analysis outputs from .analysis/<analysis_name>.parquet, stores row/column metadata, and serializes the analysis frame into data_csv for backend retrieval.
mdfactory sync push analysis build_summary.yaml
mdfactory sync push analysis build_summary.yaml --analysis-name area_per_lipidPush artifacts
sync push artifacts reads artifact entries from analysis metadata and stores file lists plus checksums in artifact tables.
mdfactory sync push artifacts build_summary.yaml
mdfactory sync push artifacts build_summary.yaml --artifact-name bilayer_snapshotPull commands
Pull commands query the backend and print to CLI by default.
When you pass --output, results are written to .csv (default) or .json (JSON lines) and include full columns.
Systems
mdfactory sync pull systems --status completed --simulation-type bilayerWrite to disk:
mdfactory sync pull systems --output systems.csv
mdfactory sync pull systems --full --output systems.jsonWithout --full, CLI output is a summary view (hash, simulation_type, parametrization, status, directory).
Analyses
Overview:
mdfactory sync pull analysis --overviewSpecific analysis table:
mdfactory sync pull analysis \
--analysis-name area_per_lipid \
--hash ABC123 \
--output apl.csvUse --overview to inspect completion status across analyses; use --analysis-name to retrieve records from a specific ANALYSIS_* table.
Artifacts
Overview:
mdfactory sync pull artifacts --overviewSpecific artifact table:
mdfactory sync pull artifacts \
--artifact-name bilayer_snapshot \
--output snapshots.csvUse --overview for completion/status-level tracking and --artifact-name for detailed file/checksum metadata.
Output shape (table structure)
The sync backend stores four logical record types:
| Logical table | Primary key behavior | Key columns |
|---|---|---|
RUN_DATABASE | keyed by hash | hash, engine, parametrization, simulation_type, directory, status, timestamp_utc, plus input_data and input_data_type |
ANALYSIS_OVERVIEW | keyed by (hash, item_type, item_name) | hash, simulation_type, directory, item_type, item_name, status, row_count, file_count, updated_at |
ANALYSIS_<name> | keyed by hash | hash, directory, simulation_type, row_count, columns, data_csv, data_path, timestamp_utc |
ARTIFACT_<name> | keyed by hash | hash, directory, simulation_type, file_count, files, checksums, timestamp_utc |
ANALYSIS_<name>.columns, ARTIFACT_<name>.files, and ARTIFACT_<name>.checksums are JSON-encoded strings.
ANALYSIS_<name>.data_csv stores the serialized analysis dataframe so results can be pulled directly from the backend without local parquet access.
Backend file/dataset layout
- SQLite:
RUN_DATABASEis stored insqlite.RUN_DB_PATH; analysis/artifact/overview tables are stored insqlite.ANALYSIS_DB_PATH. - CSV:
RUN_DATABASEiscsv.RUN_DB_PATH; analysis/artifact/overview are per-table files undercsv.ANALYSIS_DB_PATH(for exampleANALYSIS_OVERVIEW.csvandANALYSIS_area_per_lipid.csv). - Foundry:
Runs use
foundry.RUN_DB_PATH; analysis tables are underfoundry.ANALYSIS_DB_PATH; artifact tables are underfoundry.ARTIFACT_DB_PATH.
Clear commands
Clear commands delete rows from selected sync tables/datasets after confirmation prompts.
Clear runs:
mdfactory sync clear systemsClear specific analysis or artifact tables:
mdfactory sync clear analysis --analysis-name area_per_lipid
mdfactory sync clear analysis --artifact-name bilayer_snapshotClear grouped analysis storage:
mdfactory sync clear analysis --overview
mdfactory sync clear analysis --analyses
mdfactory sync clear analysis --artifacts
mdfactory sync clear analysis --allClear everything, including runs:
mdfactory sync clear all