foundry_dev_tools.cached_foundry_client module#
Cached Foundry Client, high level client to work with Foundry.
- class foundry_dev_tools.cached_foundry_client.CachedFoundryClient[source]#
Bases:
object
Initialize CachedFoundryClient.
Possible to pass overwrite config and uses Configuration to read it from the ~/.foundry-dev-tools/config file.
- Parameters:
config – config dict to overwrite values from config file
ctx – foundrycontext to use, if supplied the config parameter will be ignored
- __init__(config=None, ctx=None)[source]#
Initialize CachedFoundryClient.
Possible to pass overwrite config and uses Configuration to read it from the ~/.foundry-dev-tools/config file.
- Parameters:
config (dict | None) – config dict to overwrite values from config file
ctx (FoundryContext | FoundryRestClient | None) – foundrycontext to use, if supplied the config parameter will be ignored
- load_dataset(dataset_path_or_rid, branch='master')[source]#
Loads complete dataset from Foundry and stores in cache.
Cache is invalidated once new transaction is present in Foundry. Last 2 transactions are kept in cache and older transactions are cleaned up.
- Parameters:
- Returns:
- Return type:
- fetch_dataset(dataset_path_or_rid, branch='master')[source]#
Downloads complete dataset from Foundry and stores in cache.
Returns local path to dataset
- save_dataset(df, dataset_path_or_rid, branch='master', exists_ok=False, mode='SNAPSHOT')[source]#
Saves a dataframe to Foundry. If the dataset in Foundry does not exist it is created.
If the branch does not exist, it is created. If the dataset exists, an exception is thrown. If exists_ok=True is passed, the dataset is overwritten. Creates SNAPSHOT transactions by default.
- Parameters:
df (
pandas.DataFrame
|pyspark.sql.DataFrame
) – A pyspark or pandas DataFrame to uploaddataset_path_or_rid (str) – Path or Rid of the dataset in which the object should be stored.
branch (str) – Branch of the dataset in which the object should be stored
exists_ok (bool) – By default, this method creates a new dataset. Pass exists_ok=True to overwrite according to strategy from parameter ‘mode’
mode (str) – Foundry Transaction type: SNAPSHOT (only new files are present after transaction), UPDATE (replace files with same filename, keep present files), APPEND (add files that are not present yet)
- Returns:
tuple with (dataset_rid, transaction_rid)
- Return type:
Tuple
- Raises:
ValueError – when dataframe is None
ValueError – when branch is None
- save_model(model_obj, dataset_path_or_rid, branch='master', exists_ok=False, mode='SNAPSHOT')[source]#
Saves a python object to a foundry dataset.
The python object is pickled and uploaded to path model.pickle. The uploaded model can be loaded for performing predictions inside foundry pipelines.
- Parameters:
model_obj (object) – Any python object that can be pickled
dataset_path_or_rid (str) – Path or Rid of the dataset in which the object should be stored.
branch (bool) – Branch of the dataset in which the object should be stored
exists_ok (bool) – By default, this method creates a new dataset. Pass exists_ok=True to overwrite according to strategy from parameter ‘mode’
mode (str) – Foundry Transaction type: SNAPSHOT (only new files are present after transaction), UPDATE (replace files with same filename, keep present files), APPEND (add files that are not present yet)
- Raises:
ValueError – When model_obj or branch is None
- Returns:
Tuple with (dataset_rid, transaction_rid)
- Return type:
Tuple