I have a column in dataframe as list of objects(Array of structs) like
column: [{key1:value1}, {key2:value2}, {key3:value3}]
I want to split this column into separate columns with key name as column name and value as column value in same row.
Final result like
key1:value1, key2:value2, key3:value3
How to achieve this in pyspark ?
E.g.
Sample data to create dataframe:
my_new_schema = StructType([
StructField('id', LongType()),
StructField('countries', ArrayType(StructType([
StructField('name', StringType()),
StructField('capital', StringType())
])))
])
l = [(1, [
{'name': 'Italy', 'capital': 'Rome'},
{'name': 'Spain', 'capital': 'Madrid'}
])
]
dz = spark.createDataFrame(l, schema=my_new_schema)
# we have array of structs:
dz.show(truncate=False)
+---+--------------------------------+
|id |countries |
+---+--------------------------------+
|1 |[{Italy, Rome}, {Spain, Madrid}]|
+---+--------------------------------+
Expected output:
+---+--------+---------+
|id |Italy | Spain |
+---+------------------+
|1 |Rome | Madrid |
+---+--------+---------+