I need add a column to a spark dataframe, which should be duplicate sequence number, such as [1, 1, 1, 2, 2, 2, 3, 3, 3, ..., 10000, 10000, 10000]. I knew that we can use monotonically_increasing_id to get the sequence number as new column.
val df_new = df.withColumn("id", monotonically_increasing_id)
Then, what is the solution to extend this function to get the duplicate sequence number? Thanks!