MDFactoryMDFactory

AnalysisRegistry

Manages the analysis registry (.analysis/metadata.json) for a simulation.

The registry tracks analysis results stored as Parquet files, maintaining metadata about each analysis including row counts, columns, and timestamps.

Attributes

attributeSCHEMA_VERSION
= '1.0'
attributeREGISTRY_FILENAME
= 'metadata.json'
attributeanalysis_dirPath
= Path(analysis_dir)

Path to .analysis directory

attribute_registrydict[str, Any] | None
= None

Dict holding the in-memory registry state

attributeregistry_pathPath

Path to metadata.json file.

Functions

func__init__(self, analysis_dir)

Initialize registry for given .analysis directory.

Does not auto-load; call load() explicitly to read from disk.

paramself
paramanalysis_dirPath | str

Path to .analysis directory

Returns

None
funcload(self)dict[str, Any]

Load registry from disk.

If file doesn't exist or is corrupted, returns default empty registry with a warning.

paramself

Returns

dict

The registry dict with 'schema_version' and 'analyses' keys

funcsave(self)None

Save current registry state to disk.

Creates .analysis directory if it doesn't exist.

paramself

Returns

None
funcadd_entry(self, name, df, **extras)None

Add new analysis entry to registry.

paramself
paramnamestr

Analysis name (without .parquet extension)

paramdfpd.DataFrame

DataFrame to extract metadata from

paramextras
= {}

Returns

None
funcupdate_entry(self, name, df, **extras)None

Update existing analysis entry or create if doesn't exist.

Preserves 'created_at' timestamp if entry exists, updates 'updated_at'.

paramself
paramnamestr

Analysis name (without .parquet extension)

paramdfpd.DataFrame

DataFrame to extract metadata from

paramextras
= {}

Returns

None
funcget_entry(self, name)dict[str, Any]

Retrieve analysis entry by name.

paramself
paramnamestr

Analysis name

Returns

dict

Dict with analysis metadata

funclist_analyses(self)list[str]

Return sorted list of analysis names.

paramself

Returns

list

Sorted list of analysis names

funclist_artifacts(self)list[str]

Return sorted list of artifact names.

paramself

Returns

list

Sorted list of artifact names

funcadd_artifact_entry(self, name, files, checksums, **extras)None

Add a new artifact entry to the registry.

paramself
paramnamestr

Artifact name

paramfileslist[str]

Relative file paths under .analysis

paramchecksumsdict[str, str]

Mapping of relative paths to sha256 checksums

paramextras
= {}

Returns

None
funcupdate_artifact_entry(self, name, files, checksums, **extras)None

Update or create an artifact entry in the registry.

paramself
paramnamestr

Artifact name

paramfileslist[str]

Relative file paths under .analysis

paramchecksumsdict[str, str]

Mapping of relative paths to sha256 checksums

paramextras
= {}

Returns

None
funcget_artifact_entry(self, name)dict[str, Any]

Retrieve artifact entry by name.

paramself
paramnamestr

Artifact name

Returns

dict

Dict with artifact metadata

funcremove_artifact_entry(self, name)None

Remove artifact entry from registry.

paramself
paramnamestr

Artifact name to remove

Returns

None
funccheck_integrity(self)dict[str, Any]

Verify registry integrity against actual filesystem.

Checks for:

  • Missing files: In registry but file doesn't exist
  • Extra files: Parquet file exists but not in registry
  • Row count mismatches: File row count differs from registry
  • Artifact missing files: Files in artifact entries that are missing
  • Artifact checksum mismatches: sha256 mismatch for artifact files
paramself

Returns

dict

Dict with:

  • valid: bool - True if no issues found
  • missing_files: list[str] - Analyses in registry but file missing
  • extra_files: list[str] - Parquet files not in registry
  • row_count_mismatches: list[dict] - Analyses with row count mismatches
  • artifact_missing_files: list[dict] - Artifact missing file entries
  • artifact_checksum_mismatches: list[dict] - Artifact checksum mismatches
func_create_default_registry(self)dict[str, Any]

Create default empty registry structure.

paramself

Returns

dict[str, typing.Any]
func_ensure_registry_keys(self)None

Ensure expected top-level keys exist in the registry.

paramself

Returns

None
func_extract_metadata(self, df)dict[str, Any]

Extract metadata from a DataFrame.

paramself
paramdfpd.DataFrame

DataFrame to extract metadata from.

Returns

dict

Dictionary with 'row_count' and 'columns' keys.

func_get_timestamp(self)str

Get ISO 8601 UTC timestamp.

paramself

Returns

str
func_calculate_checksum(self, path)str

Calculate sha256 checksum for a file.

paramself
parampathPath

Returns

str
funcremove_entry(self, name)None

Remove analysis entry from registry.

paramself
paramnamestr

Analysis name to remove

Returns

None

On this page