AnalysisRegistry
Manages the analysis registry (.analysis/metadata.json) for a simulation.
The registry tracks analysis results stored as Parquet files, maintaining metadata about each analysis including row counts, columns, and timestamps.
Attributes
attributeSCHEMA_VERSION= '1.0'attributeREGISTRY_FILENAME= 'metadata.json'attributeanalysis_dirPath= Path(analysis_dir)Path to .analysis directory
attribute_registrydict[str, Any] | None= NoneDict holding the in-memory registry state
attributeregistry_pathPathPath to metadata.json file.
Functions
func__init__(self, analysis_dir)Initialize registry for given .analysis directory.
Does not auto-load; call load() explicitly to read from disk.
paramselfparamanalysis_dirPath | strPath to .analysis directory
Returns
Nonefuncload(self) → dict[str, Any]Load registry from disk.
If file doesn't exist or is corrupted, returns default empty registry with a warning.
paramselfReturns
dictThe registry dict with 'schema_version' and 'analyses' keys
funcsave(self) → NoneSave current registry state to disk.
Creates .analysis directory if it doesn't exist.
paramselfReturns
Nonefuncadd_entry(self, name, df, **extras) → NoneAdd new analysis entry to registry.
paramselfparamnamestrAnalysis name (without .parquet extension)
paramdfpd.DataFrameDataFrame to extract metadata from
paramextras= {}Returns
Nonefuncupdate_entry(self, name, df, **extras) → NoneUpdate existing analysis entry or create if doesn't exist.
Preserves 'created_at' timestamp if entry exists, updates 'updated_at'.
paramselfparamnamestrAnalysis name (without .parquet extension)
paramdfpd.DataFrameDataFrame to extract metadata from
paramextras= {}Returns
Nonefuncget_entry(self, name) → dict[str, Any]Retrieve analysis entry by name.
paramselfparamnamestrAnalysis name
Returns
dictDict with analysis metadata
funclist_analyses(self) → list[str]Return sorted list of analysis names.
paramselfReturns
listSorted list of analysis names
funclist_artifacts(self) → list[str]Return sorted list of artifact names.
paramselfReturns
listSorted list of artifact names
funcadd_artifact_entry(self, name, files, checksums, **extras) → NoneAdd a new artifact entry to the registry.
paramselfparamnamestrArtifact name
paramfileslist[str]Relative file paths under .analysis
paramchecksumsdict[str, str]Mapping of relative paths to sha256 checksums
paramextras= {}Returns
Nonefuncupdate_artifact_entry(self, name, files, checksums, **extras) → NoneUpdate or create an artifact entry in the registry.
paramselfparamnamestrArtifact name
paramfileslist[str]Relative file paths under .analysis
paramchecksumsdict[str, str]Mapping of relative paths to sha256 checksums
paramextras= {}Returns
Nonefuncget_artifact_entry(self, name) → dict[str, Any]Retrieve artifact entry by name.
paramselfparamnamestrArtifact name
Returns
dictDict with artifact metadata
funcremove_artifact_entry(self, name) → NoneRemove artifact entry from registry.
paramselfparamnamestrArtifact name to remove
Returns
Nonefunccheck_integrity(self) → dict[str, Any]Verify registry integrity against actual filesystem.
Checks for:
- Missing files: In registry but file doesn't exist
- Extra files: Parquet file exists but not in registry
- Row count mismatches: File row count differs from registry
- Artifact missing files: Files in artifact entries that are missing
- Artifact checksum mismatches: sha256 mismatch for artifact files
paramselfReturns
dictDict with:
- valid: bool - True if no issues found
- missing_files: list[str] - Analyses in registry but file missing
- extra_files: list[str] - Parquet files not in registry
- row_count_mismatches: list[dict] - Analyses with row count mismatches
- artifact_missing_files: list[dict] - Artifact missing file entries
- artifact_checksum_mismatches: list[dict] - Artifact checksum mismatches
func_create_default_registry(self) → dict[str, Any]Create default empty registry structure.
paramselfReturns
dict[str, typing.Any]func_ensure_registry_keys(self) → NoneEnsure expected top-level keys exist in the registry.
paramselfReturns
Nonefunc_extract_metadata(self, df) → dict[str, Any]Extract metadata from a DataFrame.
paramselfparamdfpd.DataFrameDataFrame to extract metadata from.
Returns
dictDictionary with 'row_count' and 'columns' keys.
func_get_timestamp(self) → strGet ISO 8601 UTC timestamp.
paramselfReturns
strfunc_calculate_checksum(self, path) → strCalculate sha256 checksum for a file.
paramselfparampathPathReturns
strfuncremove_entry(self, name) → NoneRemove analysis entry from registry.
paramselfparamnamestrAnalysis name to remove
Returns
None