foundry_dev_tools.clients.data_proxy module#

The DataProxy API client.

class foundry_dev_tools.clients.data_proxy.DataProxyClient[source]#

Bases: APIClient

DataProxyClient class that implements methods from the ‘foundry-data-proxy’ API.

api_name: ClassVar[str] = 'foundry-data-proxy'#

upload_dataset_file(dataset_rid, transaction_rid, path, path_in_foundry_dataset)[source]#

Uploads a file into a foundry dataset.

Parameters:

dataset_rid (DatasetRid) – Unique identifier of the dataset
transaction_rid (TransactionRid) – transaction id
path (Path) – File to upload
path_in_foundry_dataset (PathInDataset) – The destination path in the dataset

Return type:

requests.Response

upload_dataset_files(dataset_rid, transaction_rid, path_file_dict, max_workers=None)[source]#

Uploads multiple local files to a foundry dataset.

Parameters:

dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction id
max_workers (int | None) – Set number of threads for upload
path_file_dict (dict[PathInDataset, Path]) – A dictionary with the following structure:

{
'<path_in_foundry_dataset>': '<local_file_path>',
...
}

query_foundry_sql_legacy(query: str, return_type: Literal['pandas'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) → pd.core.frame.DataFrame[source]#

query_foundry_sql_legacy(query: str, return_type: Literal['spark'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) → pyspark.sql.DataFrame

query_foundry_sql_legacy(query: str, return_type: Literal['arrow'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) → pa.Table

query_foundry_sql_legacy(query: str, return_type: Literal['raw'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) → tuple[dict, list[list]]

query_foundry_sql_legacy(query: str, return_type: SQLReturnType = 'raw', branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) → tuple[dict, list[list]] | pd.core.frame.DataFrame | pa.Table | pyspark.sql.DataFrame

Queries the dataproxy query API with spark SQL.

Example

query_foundry_sql_legacy(query=”SELECT * FROM /Global/Foundry Operations/Foundry Support/iris”,: branch=”master”)

Parameters:

query – the sql query
branch – the branch of the dataset / query
return_type – See foundry_dev_tools.utils.api_types.SQLReturnType
sql_dialect – the SQL dialect used for the query
timeout – the query request timeout

Returns:

(foundry_schema, data): data: contains the data matrix, foundry_schema: the foundry schema (fieldSchemaList key). Can be converted to a pandas Dataframe, see below

foundry_schema, data = self.query_foundry_sql_legacy(query, branch)
df = pd.DataFrame(
    data=data, columns=[e["name"] for e in foundry_schema["fieldSchemaList"]]
)

Return type:

tuple (dict, list)

Raises:

ValueError – if return_type is not in :py:class:SQLReturnType
DatasetHasNoSchemaError – if dataset has no schema
BranchNotFoundError – if branch was not found

download_dataset_file(dataset_rid, output_directory, foundry_file_path, view='master')[source]#

Downloads a single foundry dataset file into a directory.

The local folder (and its parents) will be created if it does not exist.

If you want the bytes instead of downloading it to a file use DataProxyClient.api_get_file_in_view() directly.

Parameters:

dataset_rid (DatasetRid) – the dataset rid
output_directory (Path) – the local output directory for the file
foundry_file_path (PathInDataset) – the file_path on the foundry file system
view (View) – branch or transaction rid of the dataset

Returns:

local file path

Return type:

Path

download_dataset_files(dataset_rid, output_directory, files=None, view='master', max_workers=None)[source]#

Downloads files of a dataset (in parallel) to a local output directory.

Parameters:

dataset_rid (DatasetRid) – the dataset rid
files (set[PathInDataset] | None) – list of files or None, in which case all files are downloaded
output_directory (Path) – the output directory for the files
view (View) – branch or transaction rid of the dataset
max_workers (int | None) – Set number of threads for upload

Returns:

path to downloaded files

Return type:

list[str]

api_put_file(dataset_rid, transaction_rid, logical_path, file_data, overwrite=None, **kwargs)[source]#

Opens, writes, and closes a file under the specified dataset and transaction.

Parameters:

dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction rid
logical_path (PathInDataset) – file path in dataset
file_data (str | bytes | IO[AnyStr]) – content of the file
overwrite (bool | None) – defaults to false, if true -> Overwrite the file if it already exists in the transaction.
**kwargs – gets passed to APIClient.api_request()

Return type:

requests.Response

api_get_file(dataset_rid, transaction_rid, logical_path, range_header=None, requests_stream=True, **kwargs)[source]#

Returns a file from the specified dataset and transaction.

Parameters:

dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction rid
logical_path (PathInDataset) – path in dataset
range_header (str | None) – HTTP range header
requests_stream (bool) – passed to requests.Session.request() as stream
**kwargs – gets passed to APIClient.api_request()

Return type:

requests.Response

api_get_file_in_view(dataset_rid, end_ref, logical_path, start_transaction_rid=None, range_header=None, **kwargs)[source]#

Returns a file from the specified dataset and end ref.

Parameters:

dataset_rid (DatasetRid) – dataset rid
end_ref (View) – end ref/view
logical_path (PathInDataset) – PathInDataset
start_transaction_rid (TransactionRid | None) – start transaction rid
range_header (str | None) – HTTP range header
**kwargs – gets passed to APIClient.api_request()

Return type:

requests.Response

api_get_files(dataset_rid, transaction_rid, logical_paths, requests_stream=True, **kwargs)[source]#

Returns specified files as a zip archive.

If logical_paths is an empty set, it will return all files of the transaction.

Parameters:

dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction rid
logical_paths (set[PathInDataset]) – a set with paths in the dataset
requests_stream (bool) – passed to requests.Session.request() as stream
**kwargs – gets passed to APIClient.api_request()

Return type:

requests.Response

api_get_files_in_view(dataset_rid, end_ref, logical_paths, start_transaction_rid=None, stream=True, **kwargs)[source]#

Returns specified files by logical_paths and end_ref in a zip archive.

Parameters:

dataset_rid (DatasetRid) – dataset rid
end_ref (View) – end ref/view
logical_paths (set[PathInDataset]) – set of paths in the dataset
start_transaction_rid (TransactionRid | None) – transaction rid
stream (bool) – passed to requests.Session.request()
**kwargs – gets passed to APIClient.api_request()

Return type:

requests.Response

api_get_dataset_as_csv2(dataset_rid, branch_id, start_transaction_rid=None, end_transaction_rid=None, include_column_names=True, include_bom=True, **kwargs)[source]#

Gets dataset data with each record as a CSV line.

Parameters:

dataset_rid (DatasetRid) – the dataset rid
branch_id (Ref) – branch of the dataset
start_transaction_rid (TransactionRid | None) – start transaction rid
end_transaction_rid (TransactionRid | None) – end transaction rid
include_column_names (bool) – include column names
include_bom (bool) – include bom
**kwargs – gets passed to APIClient.api_request()

Returns:

with the csv stream. Can be converted to a pandas DataFrame >>> pd.read_csv(io.BytesIO(response.content))

Return type:

Response

api_query_with_fallbacks2(query, fallback_branch_ids, dialect='SPARK', **kwargs)[source]#

Queries for data from 1 or more tables and returns the results as JSON.

Parameters:

query (str) – the SQL query
fallback_branch_ids (list[str]) – fallback branch ids
dialect (SqlDialect) – the SqlDialect of the query, see foundry_dev_tools.utils.api_types.SqlDialect()
**kwargs – gets passed to APIClient.api_request()

Return type:

requests.Response

foundry_dev_tools.clients.data_proxy module

Contents

foundry_dev_tools.clients.data_proxy module#