foundry_dev_tools.utils.converter.foundry_spark module#
Helper function for conversion of data structures.
- foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_spark_schema(foundry_schema)[source]#
Converts foundry json schema format to spark StructType schema.
See the table below for supported field types:
Type
FieldType
Python type
Aliases
Array
ArrayFieldType
list, tuple, or array
Boolean
BooleanFieldType
bool
bool, boolean
Binary
BinaryFieldType
bytearray
binary, bytes
Byte
ByteFieldType
int or long
byte, int8
Date
DateFieldType
datetime.date
date
Decimal
DecimalFieldType
decimal.Decimal
decimal
Double
DoubleFieldType
float
double, float64
Float
FloatFieldType
float
float, float32
Integer
IntegerFieldType
int or long
integer, int, int32
Long
LongFieldType
long
long, int64
Map
MapFieldType
dict
Short
ShortFieldType
int or long
short, int16
String
StringFieldType
string
string, str
Struct
StructFieldType
list or tuple
Timestamp
TimestampFieldType
datetime.timestamp
timestamp datetime
- Parameters:
foundry_schema (dict) – output from foundry’s schema API
- Returns:
Spark schema from foundry schema
- Return type:
- foundry_dev_tools.utils.converter.foundry_spark.spark_schema_to_foundry_schema(spark_schema, file_format='parquet')[source]#
Converts spark_schema to foundry schema API compatible payload.
- Parameters:
spark_schema (pyspark.sql.types.StructType | dict) – output from foundry’s schema API
file_format (str) – currently only parquet supported
- Returns:
foundry schema from spark schema
- Return type:
- foundry_dev_tools.utils.converter.foundry_spark.infer_dataset_format_from_foundry_schema(foundry_schema, list_of_files)[source]#
Infers dataset format from Foundry Schema dict, looking at key dataFrameReaderClass.
- foundry_dev_tools.utils.converter.foundry_spark.foundry_sql_data_to_spark_dataframe(data, spark_schema)[source]#
Converts the result of a foundry sql API query to a spark dataframe.
- Parameters:
spark_schema (pyspark.sql.types.StructType) – the spark schema to apply
- Returns:
spark dataframe from foundry sql data
- Return type:
- foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_read_options(foundry_schema)[source]#
Converts Foundry Schema Metadata to Spark Read Options.
- foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_dataset_format(foundry_schema)[source]#
Infers from Foundry Schema Metadata one of ‘parquet’, ‘avro’, ‘csv’, ‘json’.
- Parameters:
foundry_schema (dict) – output from foundry’s schema API
- Returns:
value indicating spark reader required
- Return type:
- Raises:
ValueError – If the dataset format can’t be inferred from the schema
- foundry_dev_tools.utils.converter.foundry_spark.arrow_stream_to_spark_dataframe(stream_reader)[source]#
Dumps an arrow stream to a parquet file in a temporary directory.
And reads the parquet file with spark.
- Parameters:
stream_reader (pa.ipc.RecordBatchStreamReader) – Arrow Stream
- Returns:
converted to a Spark DataFrame
- Return type: