foundry_dev_tools.foundry_api_client module#
Contains FoundryRestClient and FoundrySqlClient and exception classes.
One of the gaols of this module is to be self-contained so that it can be dropped into any python installation with minimal dependency to ‘requests’ Optional dependencies for the SQL functionality to work are pandas and pyarrow.
- class foundry_dev_tools.foundry_api_client.FoundryRestClient[source]#
Bases:
object
Create an instance of FoundryRestClient.
- Parameters:
config – config dictionary which tries to get parsed into the v2 configuration, to be backwards compatible
ctx – or just pass the v2 FoundryContext directly instead of the ‘old’ configuration, the config dict will be ignored
Examples
>>> fc = FoundryRestClient() >>> fc = FoundryRestClient(config={"jwt": "<token>"}) >>> fc = FoundryRestClient(config={"client_id": "<client_id>"})
>>> ctx = FoundryContext() >>> fc = FoundryRestClient(ctx=ctx)
- __init__(config=None, ctx=None)[source]#
Create an instance of FoundryRestClient.
- Parameters:
config (dict | None) – config dictionary which tries to get parsed into the v2 configuration, to be backwards compatible
ctx (FoundryContext | None) – or just pass the v2 FoundryContext directly instead of the ‘old’ configuration, the config dict will be ignored
Examples
>>> fc = FoundryRestClient() >>> fc = FoundryRestClient(config={"jwt": "<token>"}) >>> fc = FoundryRestClient(config={"client_id": "<client_id>"})
>>> ctx = FoundryContext() >>> fc = FoundryRestClient(ctx=ctx)
- get_dataset(dataset_rid)[source]#
Gets dataset_rid and fileSystemId.
- Parameters:
dataset_rid (str) – Dataset rid
- Returns:
with the keys rid and fileSystemId
- Return type:
- Raises:
DatasetNotFoundError – if dataset does not exist
- delete_dataset(dataset_rid)[source]#
Deletes a dataset in Foundry and moves it to trash.
- Parameters:
dataset_rid (str) – Unique identifier of the dataset
- Raises:
DatasetNotFoundError – if dataset does not exist
- move_resource_to_trash(rid)[source]#
Moves a Compass resource (e.g. dataset or folder) to trash.
- Parameters:
rid (str) – rid of the resource
- create_branch(dataset_rid, branch, parent_branch_id=None, parent_branch=None)[source]#
Creates a new branch in a dataset.
If dataset is ‘new’, only parameter dataset_rid and branch are required.
- Parameters:
- Returns:
the response as a json object
- Return type:
- update_branch(dataset_rid, branch, parent_branch=None)[source]#
Updates the latest transaction of branch ‘branch’ to the latest transaction of branch ‘parent_branch’.
- Parameters:
- Returns:
example below for the branch response
- Return type:
{ "id": "..", "rid": "ri.foundry.main.branch...", "ancestorBranchIds": [], "creationTime": "", "transactionRid": "ri.foundry.main.transaction....", }
- get_branch(dataset_rid, branch)[source]#
Returns branch information.
- Parameters:
- Returns:
with keys id (name) and rid (unique id) of the branch.
- Return type:
- Raises:
BranchNotFoundError – if branch does not exist.
- open_transaction(dataset_rid, mode='SNAPSHOT', branch='master')[source]#
Opens a new transaction on a dataset.
- Parameters:
- Returns:
the transaction ID
- Return type:
- Raises:
BranchNotFoundError – if branch does not exist
DatasetNotFoundError – if dataset does not exist
DatasetHasOpenTransactionError – if dataset has an open transaction
- remove_dataset_file(dataset_rid, transaction_id, logical_path, recursive=False)[source]#
Removes the given file from an open transaction.
If the logical path matches a file exactly then only that file will be removed, regardless of the value of recursive. If the logical path represents a directory, then all files prefixed with the logical path followed by ‘/’ will be removed when recursive is true and no files will be removed when recursive is false. If the given logical path does not match a file or directory then this call is ignored and does not throw an exception.
- add_files_to_delete_transaction(dataset_rid, transaction_id, logical_paths)[source]#
Adds files in an open DELETE transaction.
Files added to DELETE transactions affect the dataset view by removing files from the view.
- commit_transaction(dataset_rid, transaction_id)[source]#
Commits a transaction, should be called after file upload is complete.
- abort_transaction(dataset_rid, transaction_id)[source]#
Aborts a transaction. Dataset will remain on transaction N-1.
- get_dataset_transactions(dataset_rid, branch='master', last=50, include_open_exclusive_transaction=False)[source]#
Returns the transactions of a dataset / branch combination.
Returns last 50 transactions by default, pagination not implemented.
- Parameters:
- Returns:
of transaction information.
- Return type:
- Raises:
BranchNotFoundError – if branch not found
DatasetHasNoTransactionsError – If the dataset has not transactions
- get_dataset_last_transaction(dataset_rid, branch='master')[source]#
Returns the last transaction of a dataset / branch combination.
- get_dataset_last_transaction_rid(dataset_rid, branch='master')[source]#
Returns the last transaction rid of a dataset / branch combination.
- upload_dataset_file(dataset_rid, transaction_rid, path_or_buf, path_in_foundry_dataset)[source]#
Uploads a file like object to a path in a foundry dataset.
- Parameters:
- Return type:
- upload_dataset_files(dataset_rid, transaction_rid, path_file_dict, parallel_processes=None)[source]#
Uploads multiple local files to a foundry dataset.
- Parameters:
{ '<path_in_foundry_dataset>': '<local_file_path>', ... }
- get_dataset_details(dataset_path_or_rid)[source]#
Returns the resource information of a dataset.
- Parameters:
dataset_path_or_rid (str) – The full path or rid to the dataset
- Returns:
the json response of the api
- Return type:
- Raises:
DatasetNotFoundError – if dataset not found
- get_child_objects_of_folder(folder_rid, page_size=None)[source]#
Returns the child objects of a compass folder.
- Parameters:
- Yields:
dict – information about child objects
- Raises:
FolderNotFoundError – if folder not found
- Return type:
Iterator[dict]
- get_dataset_path(dataset_rid)[source]#
Returns the path of a dataset as str.
- Parameters:
dataset_rid (str) – The rid of the dataset
- Returns:
the dataset_path
- Return type:
- Raises:
DatasetNotFoundError – if dataset was not found
- get_dataset_paths(dataset_rids)[source]#
Returns a list of paths for a list of passed rid’s of a dataset.
- get_dataset_schema(dataset_rid, transaction_rid=None, branch='master')[source]#
Returns the foundry dataset schema for a dataset, transaction, branch combination.
- Parameters:
- Returns:
with foundry dataset schema
- Return type:
- Raises:
DatasetNotFoundError – if dataset was not found
DatasetHasNoSchemaError – if dataset has no scheme
BranchNotFoundError – if branch was not found
KeyError – if the combination of dataset_rid, transaction_rid and branch was not found
- upload_dataset_schema(dataset_rid, transaction_rid, schema, branch='master')[source]#
Uploads the foundry dataset schema for a dataset, transaction, branch combination.
- Parameters:
- Return type:
- infer_dataset_schema(dataset_rid, branch='master')[source]#
Calls the foundry-schema-inference service to infer the dataset schema.
Returns dict with foundry schema, if status == SUCCESS
- Parameters:
- Returns:
with dataset schema, that can be used to call upload_dataset_schema
- Return type:
- Raises:
ValueError – if foundry schema inference failed
- get_dataset_identity(dataset_path_or_rid, branch='master', check_read_access=True)[source]#
Returns the identity of this dataset (dataset_path, dataset_rid, last_transaction_rid, last_transaction).
- Parameters:
dataset_path_or_rid (str) – Path to dataset (e.g. /Global/…) or rid of dataset (e.g. ri.foundry.main.dataset…)
branch (str) – branch of the dataset
check_read_access (bool) – default is True, checks if the user has read access (‘gatekeeper:view-resource’) to the dataset otherwise exception is thrown
- Returns:
with the keys ‘dataset_path’, ‘dataset_rid’, ‘last_transaction_rid’, ‘last_transaction’
- Return type:
- Raises:
DatasetNoReadAccessError – if you have no read access for that dataset
- list_dataset_files(dataset_rid, exclude_hidden_files=True, view='master', logical_path=None, detail=False, *, include_open_exclusive_transaction=False)[source]#
Returns list of internal filenames of a dataset.
- Parameters:
dataset_rid (str) – the dataset rid
exclude_hidden_files (bool) – if hidden files should be excluded (e.g. _log files)
view (str) – branch or transaction rid of the dataset
logical_path (str) – If logical_path is absent, returns all files in the view. If logical_path matches a file exactly, returns just that file. Otherwise, returns all files in the “directory” of logical_path: (a slash is added to the end of logicalPath if necessary and a prefix-match is performed)
detail (bool) – if passed as True, returns complete response from catalog API, otherwise only returns logicalPath
include_open_exclusive_transaction (bool) – if files added in open transaction should be returned as well in the response
- Returns:
filenames
- Return type:
- Raises:
DatasetNotFound – if dataset was not found
- get_dataset_stats(dataset_rid, view='master')[source]#
Returns response from foundry catalogue stats endpoint.
- foundry_stats(dataset_rid, end_transaction_rid, branch='master')[source]#
Returns row counts and size of the dataset/view.
- Parameters:
- Returns:
With the following structure: { datasetRid: str, branch: str, endTransactionRid: str, schemaId: str, computedDatasetStats: { rowCount: str | None, sizeInBytes: str, columnStats: { “…”: {nullCount: str | None, uniqueCount: str | None, avgLength: str | None, maxLength: str | None,} }, }, }
- Return type:
- download_dataset_file(dataset_rid, output_directory, foundry_file_path, view='master')[source]#
Downloads a single foundry dataset file into a directory.
Creates sub folder if necessary.
- Parameters:
- Returns:
local file path in case output_directory was passed or file content as bytes
- Return type:
- Raises:
ValueError – If download failed
- download_dataset_files(dataset_rid, output_directory, files=None, view='master', parallel_processes=None)[source]#
Downloads files of a dataset (in parallel) to a local output directory.
- Parameters:
dataset_rid (str) – the dataset rid
files (list | None) – list of files or None, in which case all files are downloaded
output_directory (str) – the output directory for the files default value is calculated: multiprocessing.cpu_count() - 1
view (str) – branch or transaction rid of the dataset
parallel_processes (int | None) – Set number of threads for upload
- Returns:
path to downloaded files
- Return type:
List[str]
- download_dataset_files_temporary(dataset_rid, files=None, view='master', parallel_processes=None)[source]#
Downloads all files of a dataset to a temporary directory.
Which is deleted when the context is closed. Function returns the temporary directory. Example usage:
>>> import parquet >>> import pandas as pd >>> from pathlib import Path >>> with client.download_dataset_files_temporary(dataset_rid='ri.foundry.main.dataset.1', view='master') as >>> temp_folder: >>> # Read using Pandas >>> df = pd.read_parquet(temp_folder) >>> # Read using pyarrow, here we pass only the files, which are normally in subfolder 'spark' >>> pq = parquet.ParquetDataset(path_or_paths=[x for x in Path(temp_dir).glob('**/*') if x.is_file()])
- Parameters:
- Yields:
Iterator[str] – path to temporary folder containing root of dataset files
- Return type:
Iterator[str]
- get_dataset_as_raw_csv(dataset_rid, branch='master')[source]#
Uses csv API to download a dataset as csv.
- query_foundry_sql_legacy(query: str, return_type: Literal['pandas'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) pd.core.frame.DataFrame [source]#
- query_foundry_sql_legacy(query: str, return_type: Literal['spark'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) pyspark.sql.DataFrame
- query_foundry_sql_legacy(query: str, return_type: Literal['arrow'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) pa.Table
- query_foundry_sql_legacy(query: str, return_type: Literal['raw'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) tuple[dict, list[list]]
- query_foundry_sql_legacy(query: str, return_type: api_types.SQLReturnType = 'raw', branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) tuple[dict, list[list]] | pd.core.frame.DataFrame | pa.Table | pyspark.sql.DataFrame
Queries the dataproxy query API with spark SQL.
Example
- query_foundry_sql_legacy(query=”SELECT * FROM /Global/Foundry Operations/Foundry Support/iris”,
branch=”master”)
- Parameters:
query – the sql query
branch – the branch of the dataset / query
return_type – See
foundry_dev_tools.utils.api_types.SQLReturnType
sql_dialect – the SQL dialect used for the query
timeout – the query request timeout
- Returns:
- (foundry_schema, data)
data: contains the data matrix, foundry_schema: the foundry schema (fieldSchemaList key). Can be converted to a pandas Dataframe, see below
foundry_schema, data = self.query_foundry_sql_legacy(query, branch) df = pd.DataFrame( data=data, columns=[e["name"] for e in foundry_schema["fieldSchemaList"]] )
- Return type:
- Raises:
ValueError – if return_type is not in :py:class:SQLReturnType
DatasetHasNoSchemaError – if dataset has no schema
BranchNotFoundError – if branch was not found
- query_foundry_sql(query: str, return_type: Literal['pandas'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) pd.core.frame.DataFrame [source]#
- query_foundry_sql(query: str, return_type: Literal['spark'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) pyspark.sql.DataFrame
- query_foundry_sql(query: str, return_type: Literal['arrow'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) pa.Table
- query_foundry_sql(query: str, return_type: Literal['raw'], branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) tuple[dict, list[list]]
- query_foundry_sql(query: str, return_type: api_types.SQLReturnType = 'pandas', branch: api_types.Ref = 'master', sql_dialect: api_types.SqlDialect = 'SPARK', timeout: int = 600) tuple[dict, list[list]] | pd.core.frame.DataFrame | pa.Table | pyspark.sql.DataFrame
Queries the Foundry SQL server with spark SQL dialect.
Uses Arrow IPC to communicate with the Foundry SQL Server Endpoint.
Falls back to query_foundry_sql_legacy in case pyarrow is not installed or the query does not return Arrow Format.
Example
df1 = client.query_foundry_sql(“SELECT * FROM /Global/Foundry Operations/Foundry Support/iris”) query = (“SELECT col1 FROM {start_transaction_rid}:{end_transaction_rid}@{branch}.`{dataset_path_or_rid}` WHERE filterColumns = ‘value1’ LIMIT 1”) df2 = client.query_foundry_sql(query)
- Parameters:
query – The SQL Query in Foundry Spark Dialect (use backticks instead of quotes)
branch – the dataset branch
sql_dialect – the sql dialect
return_type – See :py:class:foundry_dev_tools.foundry_api_client.SQLReturnType
timeout – Query Timeout, default value is 600 seconds
- Returns:
A pandas DataFrame, Spark DataFrame or pyarrow.Table with the result.
- Return type:
- Raises:
ValueError – Only direct read eligible queries can be returned as arrow Table.
- get_user_info()[source]#
Returns the multipass user info.
- Return type:
{ "id": "<multipass-id>", "username": "<username>", "attributes": { "multipass:email:primary": ["<email>"], "multipass:given-name": ["<given-name>"], "multipass:organization": ["<your-org>"], "multipass:organization-rid": ["ri.multipass..organization. ..."], "multipass:family-name": ["<family-name>"], "multipass:upn": ["<upn>"], "multipass:realm": ["<your-company>"], "multipass:realm-name": ["<your-org>"], }, }
- get_group(group_id)[source]#
Returns the multipass group information.
{ 'id': '<id>', 'name': '<groupname>', 'attributes': { 'multipass:realm': ['palantir-internal-realm'], 'multipass:organization': ['<your-org>'], 'multipass:organization-rid': ['ri.multipass..organization.<...>'], 'multipass:realm-name': ['Palantir Internal'] }
- delete_group(group_id)[source]#
Deletes multipass group.
- Parameters:
group_id (str) – the group id to delete
- Return type:
- create_third_party_application(client_type, display_name, description, grant_types, redirect_uris, logo_uri, organization_rid, allowed_organization_rids=None, resources=None, operations=None, marking_ids=None, role_set_id=None, role_grants=None, **kwargs)[source]#
Creates Foundry Third Party application (TPA).
https://www.palantir.com/docs/foundry/platform-security-third-party/third-party-apps-overview/ User must have ‘Manage OAuth 2.0 clients’ workflow permissions.
- Parameters:
client_type (Literal['CONFIDENTIAL', 'PUBLIC']) – Server Application (CONFIDENTIAL) or Native or single-page application (PUBLIC)
display_name (str) – Display name of the TPA
description (str | None) – Long description of the TPA
grant_types (list[Literal['AUTHORIZATION_CODE', 'CLIENT_CREDENTIALS', 'REFRESH_TOKEN']]) – Usually, [“AUTHORIZATION_CODE”, “REFRESH_TOKEN”] (authorization code grant) or [“REFRESH_TOKEN”, “CLIENT_CREDENTIALS”] (client credentials grant)
redirect_uris (list | None) – Redirect URLs of TPA, used in combination with AUTHORIZATION_CODE grant
logo_uri (str | None) – URI or embedded image ‘data:image/png;base64,<…>’
organization_rid (str) – Parent Organization of this TPA
allowed_organization_rids (list | None) – Passing None or empty list means TPA is activated for all Foundry organizations
resources (list[str] | None) – Resources allowed to access by the client, otherwise no resource restrictions
operations (list[str] | None) – Operations the client can be granted, otherwise no operation restrictions
marking_ids (list[str] | None) – Markings allowed to access by the client, otherwise no marking restrictions
role_set_id (str | None) – roles allowed for this client, defaults to oauth2-client
role_grants (dict[str, list[str]] | None) – mapping between roles and principal ids dict[role id,list[principal id]]
**kwargs – gets passed to
APIClient.api_request()
- Return type:
See below for the structure
{ "clientId":"<...>", "clientSecret":"<...>", "clientType":"<CONFIDENTIAL/PUBLIC>", "organizationRid":"<...>", "displayName":"<...>", "description":null, "logoUri":null, "grantTypes":[<"AUTHORIZATION_CODE","REFRESH_TOKEN","CLIENT_CREDENTIALS">], "redirectUris":[], "allowedOrganizationRids":[] }
- delete_third_party_application(client_id)[source]#
Deletes a Third Party Application.
- Parameters:
client_id (str) – The unique identifier of the TPA.
- Return type:
- update_third_party_application(client_id, client_type, display_name, description, grant_types, redirect_uris, logo_uri, organization_rid, allowed_organization_rids=None, resources=None, operations=None, marking_ids=None, role_set_id=None, **kwargs)[source]#
Updates Foundry Third Party application (TPA).
https://www.palantir.com/docs/foundry/platform-security-third-party/third-party-apps-overview/ User must have ‘Manage OAuth 2.0 clients’ workflow permissions.
- Parameters:
client_id (str) – The unique identifier of the TPA.
client_type (Literal['CONFIDENTIAL', 'PUBLIC']) – Server Application (CONFIDENTIAL) or Native or single-page application (PUBLIC)
display_name (str) – Display name of the TPA
description (str | None) – Long description of the TPA
grant_types (list[Literal['AUTHORIZATION_CODE', 'CLIENT_CREDENTIALS', 'REFRESH_TOKEN']]) – Usually, [“AUTHORIZATION_CODE”, “REFRESH_TOKEN”] (authorization code grant) or [“REFRESH_TOKEN”, “CLIENT_CREDENTIALS”] (client credentials grant)
redirect_uris (list | None) – Redirect URLs of TPA, used in combination with AUTHORIZATION_CODE grant
logo_uri (str | None) – URI or embedded image ‘data:image/png;base64,<…>’
organization_rid (str) – Parent Organization of this TPA
allowed_organization_rids (list | None) – Passing None or empty list means TPA is activated for all Foundry organizations
resources (list[str] | None) – Resources allowed to access by the client, otherwise no resource restrictions
operations (list[str] | None) – Operations the client can be granted, otherwise no operation restrictions
marking_ids (list[str] | None) – Markings allowed to access by the client, otherwise no marking restrictions
role_set_id (str | None) – roles allowed for this client, defaults to oauth2-client
**kwargs – gets passed to
APIClient.api_request()
- Return type:
Reponse in following structure:
{ "clientId":"<...>", "clientType":"<CONFIDENTIAL/PUBLIC>", "organizationRid":"<...>", "displayName":"<...>", "description":null, "logoUri":null, "grantTypes":[<"AUTHORIZATION_CODE","REFRESH_TOKEN","CLIENT_CREDENTIALS">], "redirectUris":[], "allowedOrganizationRids":[] }
- rotate_third_party_application_secret(client_id)[source]#
Rotates Foundry Third Party application (TPA) secret.
- Parameters:
client_id (str) – The unique identifier of the TPA.
- Returns:
See below for the structure
- Return type:
{ "clientId":"<...>", "clientSecret": "<...>", "clientType":"<CONFIDENTIAL/PUBLIC>", "organizationRid":"<...>", "displayName":"<...>", "description":null, "logoUri":null, "grantTypes":[<"AUTHORIZATION_CODE","REFRESH_TOKEN","CLIENT_CREDENTIALS">], "redirectUris":[], "allowedOrganizationRids":[] }
- enable_third_party_application(client_id, operations=None, resources=None, marking_ids=None, grant_types=None, require_consent=True, **kwargs)[source]#
Enables Foundry Third Party application (TPA).
- Parameters:
client_id (str) – The unique identifier of the TPA.
operations (list | None) – Scopes that this TPA is allowed to use (To be confirmed) if None or empty list is passed, all scopes will be activated.
resources (list | None) – Compass Project RID’s that this TPA is allowed to access, if None or empty list is passed, unrestricted access will be given.
marking_ids (list[str] | None) – Marking Ids that this TPA is allowed to access, if None or empty list is passed, unrestricted access will be given.
grant_types (list[Literal['AUTHORIZATION_CODE', 'CLIENT_CREDENTIALS', 'REFRESH_TOKEN']] | None) – Grant types that this TPA is allowed to use to access resources, if None is passed, no grant type restrictions if an empty list is passed, no grant types are allowed for this TPA
require_consent (bool) – Wether users need to provide consent for this application to act on their behalf, defaults to true
**kwargs – gets passed to
APIClient.api_request()
- Return type:
Response with the following structure:
{ "client": { "clientId": "<...>", "organizationRid": "ri.multipass..organization.<...>", "displayName": "<...>", "description": None, "logoUri": None, }, "installation": {"resources": [], "operations": [], "markingIds": None}, }
- start_checks_and_build(repository_id, ref_name, commit_hash, file_paths)[source]#
Starts checks and builds.
- get_s3fs_storage_options()[source]#
Get the foundry s3 credentials in the s3fs storage_options format.
Example
>>> fc = FoundryRestClient() >>> storage_options = fc.get_s3fs_storage_options() >>> df = pd.read_parquet( ... "s3://ri.foundry.main.dataset.<uuid>/spark", storage_options=storage_options ... )
- Return type:
- get_boto3_s3_client(**kwargs)[source]#
Returns the boto3 s3 client with credentials applied and endpoint url set.
See
foundry_dev_tools.clients.s3_client.api_assume_role_with_webidentity
.Example
>>> from foundry_dev_tools import FoundryRestClient >>> fc = FoundryRestClient() >>> s3_client = fc.get_boto3_client() >>> s3_client
- Parameters:
**kwargs – gets passed to
boto3.session.Session.client()
, endpoint_url will be overwritten