S3#

Palantir Foundry released a S3-compatible API for Foundry datasets, which lets you use the AWS Cli, boto3 and other s3 compatible libraries. As the authentication via a JWT does not work directly, but needs another API call in-between, we created these methods and the Cli to get you easily started with the API.

CLI#

Init#

For the AWS Cli, Foundry DevTools can create a custom profile, which dynamically hands over the S3 credentials to the AWS Cli.

To create this profile in your config run:

fdt s3 init

This will do the following things:

  1. Read your current AWS config file, if it does not exist, it will create one.

  2. Add the ‘foundry’ profile programmatically, which uses the credential_config option to call the fdt s3 auth cli.

  3. Show a diff to the previous config and create a backup of the old config, if a change was made.

After running the command, please check the diff and your AWS config file, you can always recover the config via the backup file, the path to the backup will be printed, and is in the same directory as your AWS config.

Auth#

fdt s3 auth

The auth command gets used by the configuration profile as mentioned above, it adheres to the output schema defined at https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sourcing-external.html Usually, you do not have to call this manually as the AWS Cli or other tools will perform this call for you internally.

Methods you can use in python#

Examples#

S3Client#

from foundry_dev_tools import FoundryContext

ctx = FoundryContext()
s3_client = ctx.s3.get_boto3_client()
resp = s3_client.head_object(
    Bucket="ri.foundry.main.dataset.2ce7cb50-41f3-4e22-a6b7-ae4deaf3985e", # replace with a dataset RID of yours
    Key="some-file-in-the-dataset.txt"
)
print(resp['LastModified']) # returns 2023-04-13 09:57:09+00:00

S3Resource#

from foundry_dev_tools import FoundryContext

ctx = FoundryContext()
s3_resource = ctx.s3.get_boto3_resource()
obj = s3_resource.Object(
    bucket_name="ri.foundry.main.dataset.2ce7cb50-41f3-4e22-a6b7-ae4deaf3985e",  # replace with a dataset RID of yours
    key="some-file-in-the-dataset.txt"
)
print(obj.last_modified) # returns 2023-04-13 09:57:09+00:00

Pandas#

import pandas as pd
from foundry_dev_tools import FoundryContext

ctx = FoundryContext()

storage_options = ctx.s3.get_s3fs_storage_options()
df = pd.read_parquet(
    "s3://ri.foundry.main.dataset.2ce7cb50-41f3-4e22-a6b7-ae4deaf3985e/spark",
    storage_options=storage_options,
)  # replace with a dataset RID of yours, should be a dataset with parquet files

print(df)

Polars#

import polars as pl
from foundry_dev_tools import FoundryContext

ctx = FoundryContext()
storage_options = ctx.s3.get_polars_storage_options()
df = pl.read_parquet(
    "s3://ri.foundry.main.dataset.2ce7cb50-41f3-4e22-a6b7-ae4deaf3985e/**/*.parquet",
    storage_options=storage_options,
)  # replace with a dataset RID of yours, should be a dataset with parquet files

print(df.head())

DuckDb#

import duckdb
from foundry_dev_tools import FoundryContext

ctx = FoundryContext()

con = duckdb.connect()
con.execute(ctx.s3.get_duckdb_create_secret_string())

df = con.execute("SELECT * FROM read_parquet('s3://ri.foundry.main.dataset.d5308d45-9822-4e02-afb9-2704636308ee/**/*.parquet') LIMIT 1;").df()

print(df.head())

fsspec#

import fsspec
from foundry_dev_tools import FoundryContext

ctx = FoundryContext()

dataset_rid = "ri.foundry.main.dataset.2ce6cb10-59f4-4e19-a3b8-ae3deaf5985e"
with fsspec.open(f"s3://{dataset_rid}/test.csv", "r", **ctx.s3.get_s3fs_storage_options()) as f:
    print(f.read())

# -------------------

fs = fsspec.filesystem("s3", **ctx.s3.get_s3fs_storage_options())
with fs.open(f"{dataset_rid}/fsspec_test_write.txt", "w") as f:
    f.write("hihi")
with fs.open(f"{dataset_rid}/fsspec_test_write.txt", "r") as f:
    print(f.read())