I found a similiar situation with mine in this line, but he is using SQL server, not pyspark/python: Pivoting Multiple Columns Based On a Single Column
I have a dateset as below:
ID Date Class
1 2021/01/01 math, english
1 2021/01/02 math, english
1 2021/01/03 chinese
1 2021/01/04 math, chemistry
1 2021/01/05 math, english
1 2021/01/06 Chinese
2 2021/01/01 PE
2 2021/01/02 math, chinese
2 2021/01/03 math, english
2 2021/01/04 math, chinese
.......
the desire output should be:
ID Date_1 schedule_1 Date_2 schedule_2 Date_3 schedule_3
1 2021/01/01 math, english 2021/01/03 chinese 2021/01/05 math, chemistry...
1 2021/01/02 math, english 2021/01/06 chinese...
1 2021/01/05 math, english....
2 2021/01/01 PE 2021/01/02 math, chinese 2021/01/03 math, english
2 2021/01/04 math, chinese
I am planning using pivot and groupby, this is my current code, which is not working, and I have no idea on how to solve it.
line2 = line\
.select("ID")\
.groupBy("ID","Class")\
.pivot("Date","Class")\
.agg(expr("coalesce(first(Class), \" \")"))
Any help or ideas or suggestions will be appreciated.