How to convert multiple columns to maps after grouping and pivoting in pyspark. Code below is not working as expected.
Input dataframe:
| ID | TYPE | Order | Address | Phone |
|---|---|---|---|---|
| 1 | A | Primary | abc | 111 |
| 1 | B | Primary | def | 222 |
| 1 | A | Secondary | ghi | 333 |
| 1 | B | Secondary | jkl | 444 |
| 2 | A | Primary | mno | 555 |
| 2 | A | Secondary | pqr | 666 |
| 2 | B | Primary | stu | 777 |
| 2 | B | Secondary | vwy | 888 |
Expected Output dataframe:
| ID | Primary_A_attributes | Primary_B_attributes | Secondary_A_attributes | Secondary_B_attributes |
|---|---|---|---|---|
| 1 | {"Address" : "abc", "phone" : "111"} | {"Address" : "def", phone" : "222"} | {"Address" : "ghi", "phone" : "333"} | {"Address" : "jkl", phone" : "444"} |
| 2 | {"Address":"mno", "phone":"555" | {"Address" : "pqr", phone" : "666"} | {"Address":"stu", "phone":"777" | {"Address" : "vwy", phone" : "888"} |
code used:
df.withColumn("collection",F.upper(F.concat_ws('_attributes_','order','type')))\
.groupBy('id').pivot("collection").agg(F.create_map(F.lit("Address"),F.col("Address")\
F.lit("phone"),F.col("phone"))).display()