foundry_dev_tools.utils.converter.foundry_spark module#

Helper function for conversion of data structures.

foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_spark_schema(foundry_schema)[source]#

Converts foundry json schema format to spark StructType schema.

See the table below for supported field types:

Type

FieldType

Python type

Aliases

Array

ArrayFieldType

list, tuple, or array

Boolean

BooleanFieldType

bool

bool, boolean

Binary

BinaryFieldType

bytearray

binary, bytes

Byte

ByteFieldType

int or long

byte, int8

Date

DateFieldType

datetime.date

date

Decimal

DecimalFieldType

decimal.Decimal

decimal

Double

DoubleFieldType

float

double, float64

Float

FloatFieldType

float

float, float32

Integer

IntegerFieldType

int or long

integer, int, int32

Long

LongFieldType

long

long, int64

Map

MapFieldType

dict

Short

ShortFieldType

int or long

short, int16

String

StringFieldType

string

string, str

Struct

StructFieldType

list or tuple

Timestamp

TimestampFieldType

datetime.timestamp

timestamp datetime

Parameters:

foundry_schema (dict) – output from foundry’s schema API

Returns:

Spark schema from foundry schema

Return type:

StructType

foundry_dev_tools.utils.converter.foundry_spark.spark_schema_to_foundry_schema(spark_schema, file_format='parquet')[source]#

Converts spark_schema to foundry schema API compatible payload.

Parameters:
Returns:

foundry schema from spark schema

Return type:

dict

foundry_dev_tools.utils.converter.foundry_spark.infer_dataset_format_from_foundry_schema(foundry_schema, list_of_files)[source]#

Infers dataset format from Foundry Schema dict, looking at key dataFrameReaderClass.

Parameters:
  • foundry_schema (api_types.FoundrySchema | None) – Schema from foundry schema API

  • list_of_files (list) – files of dataset, as fallback option, first file will be checked for file ending

Returns:

parquet, csv or unknown

Return type:

str | None

foundry_dev_tools.utils.converter.foundry_spark.foundry_sql_data_to_spark_dataframe(data, spark_schema)[source]#

Converts the result of a foundry sql API query to a spark dataframe.

Parameters:
Returns:

spark dataframe from foundry sql data

Return type:

DataFrame

foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_read_options(foundry_schema)[source]#

Converts Foundry Schema Metadata to Spark Read Options.

Parameters:

foundry_schema (dict) – output from foundry’s schema API

Returns:

Key, values that can be passed to the ‘options’ call of a pyspark reader

Return type:

dict

foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_dataset_format(foundry_schema)[source]#

Infers from Foundry Schema Metadata one of ‘parquet’, ‘avro’, ‘csv’, ‘json’.

Parameters:

foundry_schema (dict) – output from foundry’s schema API

Returns:

value indicating spark reader required

Return type:

str

Raises:

ValueError – If the dataset format can’t be inferred from the schema

foundry_dev_tools.utils.converter.foundry_spark.arrow_stream_to_spark_dataframe(stream_reader)[source]#

Dumps an arrow stream to a parquet file in a temporary directory.

And reads the parquet file with spark.

Parameters:

stream_reader (pa.ipc.RecordBatchStreamReader) – Arrow Stream

Returns:

converted to a Spark DataFrame

Return type:

DataFrame