foundry_dev_tools.cached_foundry_client module#

Cached Foundry Client, high level client to work with Foundry.

class foundry_dev_tools.cached_foundry_client.CachedFoundryClient[source]#

Bases: object

Initialize CachedFoundryClient.

Possible to pass overwrite config and uses Configuration to read it from the ~/.foundry-dev-tools/config file.

Parameters:
  • config – config dict to overwrite values from config file

  • ctx – foundrycontext to use, if supplied the config parameter will be ignored

__init__(config=None, ctx=None)[source]#

Initialize CachedFoundryClient.

Possible to pass overwrite config and uses Configuration to read it from the ~/.foundry-dev-tools/config file.

Parameters:
  • config (dict | None) – config dict to overwrite values from config file

  • ctx (FoundryContext | FoundryRestClient | None) – foundrycontext to use, if supplied the config parameter will be ignored

load_dataset(dataset_path_or_rid, branch='master')[source]#

Loads complete dataset from Foundry and stores in cache.

Cache is invalidated once new transaction is present in Foundry. Last 2 transactions are kept in cache and older transactions are cleaned up.

Parameters:
  • dataset_path_or_rid (str) – Path to dataset or the rid of the dataset

  • branch (str) – The branch of the dataset

Returns:

DataFrame

Return type:

pyspark.sql.DataFrame

fetch_dataset(dataset_path_or_rid, branch='master')[source]#

Downloads complete dataset from Foundry and stores in cache.

Returns local path to dataset

Parameters:
  • dataset_path_or_rid (str) – Path to dataset or the rid of the dataset

  • branch (str) – The branch of the dataset

Returns:

local path to the dataset, dataset_identity

Return type:

Tuple[str, dict]

save_dataset(df, dataset_path_or_rid, branch='master', exists_ok=False, mode='SNAPSHOT')[source]#

Saves a dataframe to Foundry. If the dataset in Foundry does not exist it is created.

If the branch does not exist, it is created. If the dataset exists, an exception is thrown. If exists_ok=True is passed, the dataset is overwritten. Creates SNAPSHOT transactions by default.

Parameters:
  • df (pandas.DataFrame | pyspark.sql.DataFrame) – A pyspark or pandas DataFrame to upload

  • dataset_path_or_rid (str) – Path or Rid of the dataset in which the object should be stored.

  • branch (str) – Branch of the dataset in which the object should be stored

  • exists_ok (bool) – By default, this method creates a new dataset. Pass exists_ok=True to overwrite according to strategy from parameter ‘mode’

  • mode (str) – Foundry Transaction type: SNAPSHOT (only new files are present after transaction), UPDATE (replace files with same filename, keep present files), APPEND (add files that are not present yet)

Returns:

tuple with (dataset_rid, transaction_rid)

Return type:

Tuple

Raises:
save_model(model_obj, dataset_path_or_rid, branch='master', exists_ok=False, mode='SNAPSHOT')[source]#

Saves a python object to a foundry dataset.

The python object is pickled and uploaded to path model.pickle. The uploaded model can be loaded for performing predictions inside foundry pipelines.

Parameters:
  • model_obj (object) – Any python object that can be pickled

  • dataset_path_or_rid (str) – Path or Rid of the dataset in which the object should be stored.

  • branch (bool) – Branch of the dataset in which the object should be stored

  • exists_ok (bool) – By default, this method creates a new dataset. Pass exists_ok=True to overwrite according to strategy from parameter ‘mode’

  • mode (str) – Foundry Transaction type: SNAPSHOT (only new files are present after transaction), UPDATE (replace files with same filename, keep present files), APPEND (add files that are not present yet)

Raises:

ValueError – When model_obj or branch is None

Returns:

Tuple with (dataset_rid, transaction_rid)

Return type:

Tuple