foundry_dev_tools.clients.data_proxy module#
The DataProxy API client.
- class foundry_dev_tools.clients.data_proxy.DataProxyClient[source]#
Bases:
APIClient
DataProxyClient class that implements methods from the ‘foundry-data-proxy’ API.
- upload_dataset_file(dataset_rid, transaction_rid, path, path_in_foundry_dataset)[source]#
Uploads a file into a foundry dataset.
- Parameters:
dataset_rid (DatasetRid) – Unique identifier of the dataset
transaction_rid (TransactionRid) – transaction id
path (Path) – File to upload
path_in_foundry_dataset (PathInDataset) – The destination path in the dataset
- Return type:
- upload_dataset_files(dataset_rid, transaction_rid, path_file_dict, max_workers=None)[source]#
Uploads multiple local files to a foundry dataset.
- Parameters:
{ '<path_in_foundry_dataset>': '<local_file_path>', ... }
- query_foundry_sql_legacy(query: str, return_type: Literal['pandas'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) pd.core.frame.DataFrame [source]#
- query_foundry_sql_legacy(query: str, return_type: Literal['spark'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) pyspark.sql.DataFrame
- query_foundry_sql_legacy(query: str, return_type: Literal['arrow'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) pa.Table
- query_foundry_sql_legacy(query: str, return_type: Literal['raw'], branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) tuple[dict, list[list]]
- query_foundry_sql_legacy(query: str, return_type: SQLReturnType = 'raw', branch: Ref = 'master', sql_dialect: SqlDialect = 'SPARK', timeout: int = 600) tuple[dict, list[list]] | pd.core.frame.DataFrame | pa.Table | pyspark.sql.DataFrame
Queries the dataproxy query API with spark SQL.
Example
- query_foundry_sql_legacy(query=”SELECT * FROM /Global/Foundry Operations/Foundry Support/iris”,
branch=”master”)
- Parameters:
query – the sql query
branch – the branch of the dataset / query
return_type – See
foundry_dev_tools.utils.api_types.SQLReturnType
sql_dialect – the SQL dialect used for the query
timeout – the query request timeout
- Returns:
- (foundry_schema, data)
data: contains the data matrix, foundry_schema: the foundry schema (fieldSchemaList key). Can be converted to a pandas Dataframe, see below
foundry_schema, data = self.query_foundry_sql_legacy(query, branch) df = pd.DataFrame( data=data, columns=[e["name"] for e in foundry_schema["fieldSchemaList"]] )
- Return type:
- Raises:
ValueError – if return_type is not in :py:class:SQLReturnType
DatasetHasNoSchemaError – if dataset has no schema
BranchNotFoundError – if branch was not found
- download_dataset_file(dataset_rid, output_directory, foundry_file_path, view='master')[source]#
Downloads a single foundry dataset file into a directory.
The local folder (and its parents) will be created if it does not exist.
If you want the bytes instead of downloading it to a file use
DataProxyClient.api_get_file_in_view()
directly.- Parameters:
dataset_rid (DatasetRid) – the dataset rid
output_directory (Path) – the local output directory for the file
foundry_file_path (PathInDataset) – the file_path on the foundry file system
view (View) – branch or transaction rid of the dataset
- Returns:
local file path
- Return type:
Path
- download_dataset_files(dataset_rid, output_directory, files=None, view='master', max_workers=None)[source]#
Downloads files of a dataset (in parallel) to a local output directory.
- Parameters:
dataset_rid (DatasetRid) – the dataset rid
files (set[PathInDataset] | None) – list of files or None, in which case all files are downloaded
output_directory (Path) – the output directory for the files
view (View) – branch or transaction rid of the dataset
max_workers (int | None) – Set number of threads for upload
- Returns:
path to downloaded files
- Return type:
- api_put_file(dataset_rid, transaction_rid, logical_path, file_data, overwrite=None, **kwargs)[source]#
Opens, writes, and closes a file under the specified dataset and transaction.
- Parameters:
dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction rid
logical_path (PathInDataset) – file path in dataset
overwrite (bool | None) – defaults to false, if true -> Overwrite the file if it already exists in the transaction.
**kwargs – gets passed to
APIClient.api_request()
- Return type:
- api_get_file(dataset_rid, transaction_rid, logical_path, range_header=None, requests_stream=True, **kwargs)[source]#
Returns a file from the specified dataset and transaction.
- Parameters:
dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction rid
logical_path (PathInDataset) – path in dataset
range_header (str | None) – HTTP range header
requests_stream (bool) – passed to
requests.Session.request()
as stream**kwargs – gets passed to
APIClient.api_request()
- Return type:
- api_get_file_in_view(dataset_rid, end_ref, logical_path, start_transaction_rid=None, range_header=None, **kwargs)[source]#
Returns a file from the specified dataset and end ref.
- Parameters:
dataset_rid (DatasetRid) – dataset rid
end_ref (View) – end ref/view
logical_path (PathInDataset) – PathInDataset
start_transaction_rid (TransactionRid | None) – start transaction rid
range_header (str | None) – HTTP range header
**kwargs – gets passed to
APIClient.api_request()
- Return type:
- api_get_files(dataset_rid, transaction_rid, logical_paths, requests_stream=True, **kwargs)[source]#
Returns specified files as a zip archive.
If logical_paths is an empty set, it will return all files of the transaction.
- Parameters:
dataset_rid (DatasetRid) – dataset rid
transaction_rid (TransactionRid) – transaction rid
logical_paths (set[PathInDataset]) – a set with paths in the dataset
requests_stream (bool) – passed to
requests.Session.request()
as stream**kwargs – gets passed to
APIClient.api_request()
- Return type:
- api_get_files_in_view(dataset_rid, end_ref, logical_paths, start_transaction_rid=None, stream=True, **kwargs)[source]#
Returns specified files by logical_paths and end_ref in a zip archive.
- Parameters:
dataset_rid (DatasetRid) – dataset rid
end_ref (View) – end ref/view
logical_paths (set[PathInDataset]) – set of paths in the dataset
start_transaction_rid (TransactionRid | None) – transaction rid
stream (bool) – passed to
requests.Session.request()
**kwargs – gets passed to
APIClient.api_request()
- Return type:
- api_get_dataset_as_csv2(dataset_rid, branch_id, start_transaction_rid=None, end_transaction_rid=None, include_column_names=True, include_bom=True, **kwargs)[source]#
Gets dataset data with each record as a CSV line.
- Parameters:
dataset_rid (DatasetRid) – the dataset rid
branch_id (Ref) – branch of the dataset
start_transaction_rid (TransactionRid | None) – start transaction rid
end_transaction_rid (TransactionRid | None) – end transaction rid
include_column_names (bool) – include column names
include_bom (bool) – include bom
**kwargs – gets passed to
APIClient.api_request()
- Returns:
with the csv stream. Can be converted to a pandas DataFrame >>> pd.read_csv(io.BytesIO(response.content))
- Return type:
- api_query_with_fallbacks2(query, fallback_branch_ids, dialect='SPARK', **kwargs)[source]#
Queries for data from 1 or more tables and returns the results as JSON.
- Parameters:
query (str) – the SQL query
dialect (SqlDialect) – the SqlDialect of the query, see
foundry_dev_tools.utils.api_types.SqlDialect()
**kwargs – gets passed to
APIClient.api_request()
- Return type: