Need help on ... converting multiple rows into single row by keys. group by advise appreciated. Using pyspark Version:2
l = (1,1,'', 'add1' ),
(1,1,'name1', ''),
(1,2,'', 'add2'),
(1,2,'name2', ''),
(2,1,'', 'add21'),
(2,1,'name21', ''),
(2,2,'', 'add22'),
(2,2,'name22', '')
df = sqlContext.createDataFrame(l, ['Key1', 'Key2','Name', 'Address'])
df.show()
+----+----+------+-------+
|Key1|Key2| Name|Address|
+----+----+------+-------+
| 1| 1| | add1|
| 1| 1| name1| |
| 1| 2| | add2|
| 1| 2| name2| |
| 2| 1| | add21|
| 2| 1|name21| |
| 2| 2| | add22|
| 2| 2|name22| |
+----+----+------+-------+
I am stuck looking for output like
+----+----+------+-------+
|Key1|Key2| Name|Address|
+----+----+------+-------+
| 1| 1| name1 | add1|
| 1| 2| name2 | add2|
| 2| 1| name21| add21|
| 2| 2| name22| add22|
+----+----+------+-------+