As the title states, I'd like to create a normalized version of an existing Double column.
As I'm quite new to pyspark, this was my attempt at solving this:
df2 = df.groupBy('id').count().toDF(*['id','count_trans'])
df2 = df2.withColumn('count_trans_norm', F.col('count_trans) / (F.max(F.col('count_trans'))))
When I do this, I get the following error:
"grouping expressions sequence is empty, and '`movie_id`' is not an aggregate function.
Any help would be much appreciated.