Having a dataframe like:
## +---+---+
## | id|num|
## +---+---+
## | 2|3.0|
## | 3|6.0|
## | 3|2.0|
## | 3|1.0|
## | 2|9.0|
## | 4|7.0|
## +---+---+
and I want to remove the consecutive repetitions, and obtain:
## +---+---+
## | id|num|
## +---+---+
## | 2|3.0|
## | 3|6.0|
## | 2|9.0|
## | 4|7.0|
## +---+---+
I found ways of doing this in Pandas but nothing in Pyspark.
num. Is this correct, or do you also want this for a distribution of ids like[1,2,1,1,2]resulting in[1,2,1,2]?