I have a spark dataframe as follows
+--+--------+-----------+
|id| account| time|
+--+--------+-----------+
| 4| aa| 01/01/2017|
| 2| bb| 03/01/2017|
| 6| cc| 04/01/2017|
| 1| bb| 05/01/2017|
| 5| bb| 09/01/2017|
| 3| aa| 02/01/2017|
+--+--------+-----------+
and I want get the data as follows
+---+---+-------+
|id1|id2|account|
+---+---+-------+
| 4| 3| aa|
| 2| 5| bb|
| 1| 5| bb|
| 2| 1| bb|
+---+---+-------+
so I need find any possible pair within an account, and id1 would be the id with the earlier time and id2 would be the id with later time.
I'm very new to pyspark, I think self join maybe a good start.
Anyone can help me with it?