I have a dataframe with the schema:
root
|-- col2: integer (nullable = true)
|-- col1: integer (nullable = true)
|-- structCol3: struct (nullable = true)
| |-- structField2: boolean (nullable = true)
| |-- structField1: string (nullable = true)
|-- structCol4: struct (nullable = true)
| |-- nestedArray: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- elem3: double (nullable = true)
| | | |-- elem2: string (nullable = true)
| | | |-- elem1: string (nullable = true)
| |-- structField2: integer (nullable = true)
Desired schema:
root
|-- col1: integer (nullable = true)
|-- col2: integer (nullable = true)
|-- structCol3: struct (nullable = true)
| |-- structField1: string (nullable = true)
| |-- structField2: boolean (nullable = true)
|-- structCol4: struct (nullable = true)
| |-- nestedArray: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- elem1: string (nullable = true)
| | | |-- elem2: string (nullable = true)
| | | |-- elem3: double (nullable = true)
| |-- structField2: integer (nullable = true)
So far I've had success rearranging the columns and the fields inside the structs like this:
dfParquetOutput = df.select(
"col1",
"col2",
struct(
col("structCol3.structField1"),
col("structCol3.structField2")
).alias("structCol3"),
struct(
col("structCol4.nestedArray"),
col("structCol4.structField2")
).alias("structCol4")
)
Unfortunately I'm struggling to find a way to re-arrange the elements inside the StructType that is inside the Array. I thought about trying to use a udf but I've had no success with it.
Is there a simple way to re-ordering the Struct inside the array?