I have a spark data frame in scala called df with two columns, say a and b. Column a contains letters and column b contains numbers giving the below.
a b
----------
g 0
f 0
g 0
f 1
I can get the distinct rows using
val dfDistinct=df.select("a","b").distinct
which gives the following:
a b
----------
g 0
f 0
f 1
I want to add another column with the number of times these distinct combinations occurs in the first dataframe so I'd end up with
a b count
----------
g 0 2
f 0 1
f 1 1
I don't mind if that modifies the original command or I have a separate operation on dfDistinct giving another data frame.
Any advice greatly appreciated and I apologise for the trivial nature of this question but I'm not the most experienced with this kind of operation in scala or spark.
Thanks
Dean