foundry_dev_tools.utils.converter.foundry_spark module#
Helper function for conversion of data structures.
- foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_spark_schema(foundry_schema)[source]#
Converts foundry json schema format to spark StructType schema.
See the table below for supported field types:
Type
FieldType
Python type
Aliases
Array
ArrayFieldTypelist, tuple, or array
Boolean
BooleanFieldTypebool
bool, boolean
Binary
BinaryFieldTypebytearray
binary, bytes
Byte
ByteFieldTypeint or long
byte, int8
Date
DateFieldTypedatetime.date
date
Decimal
DecimalFieldTypedecimal.Decimal
decimal
Double
DoubleFieldTypefloat
double, float64
Float
FloatFieldTypefloat
float, float32
Integer
IntegerFieldTypeint or long
integer, int, int32
Long
LongFieldTypelong
long, int64
Map
MapFieldTypedict
Short
ShortFieldTypeint or long
short, int16
String
StringFieldTypestring
string, str
Struct
StructFieldTypelist or tuple
Timestamp
TimestampFieldTypedatetime.timestamp
timestamp datetime
- Parameters:
foundry_schema (dict) – output from foundry’s schema API
- Returns:
Spark schema from foundry schema
- Return type:
- foundry_dev_tools.utils.converter.foundry_spark.spark_schema_to_foundry_schema(spark_schema, file_format='parquet')[source]#
Converts spark_schema to foundry schema API compatible payload.
- Parameters:
spark_schema (pyspark.sql.types.StructType | dict) – output from foundry’s schema API
file_format (str) – currently only parquet supported
- Returns:
foundry schema from spark schema
- Return type:
- foundry_dev_tools.utils.converter.foundry_spark.infer_dataset_format_from_foundry_schema(foundry_schema, list_of_files)[source]#
Infers dataset format from Foundry Schema dict, looking at key dataFrameReaderClass.
- foundry_dev_tools.utils.converter.foundry_spark.foundry_sql_data_to_spark_dataframe(data, spark_schema)[source]#
Converts the result of a foundry sql API query to a spark dataframe.
- Parameters:
spark_schema (pyspark.sql.types.StructType) – the spark schema to apply
- Returns:
spark dataframe from foundry sql data
- Return type:
- foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_read_options(foundry_schema)[source]#
Converts Foundry Schema Metadata to Spark Read Options.
- foundry_dev_tools.utils.converter.foundry_spark.foundry_schema_to_dataset_format(foundry_schema)[source]#
Infers from Foundry Schema Metadata one of ‘parquet’, ‘avro’, ‘csv’, ‘json’.
- Parameters:
foundry_schema (dict) – output from foundry’s schema API
- Returns:
value indicating spark reader required
- Return type:
- Raises:
ValueError – If the dataset format can’t be inferred from the schema
- foundry_dev_tools.utils.converter.foundry_spark.arrow_stream_to_spark_dataframe(stream_reader)[source]#
Dumps an arrow stream to a parquet file in a temporary directory.
And reads the parquet file with spark.
- Parameters:
stream_reader (pa.ipc.RecordBatchStreamReader) – Arrow Stream
- Returns:
converted to a Spark DataFrame
- Return type: