add a new column in pyspark dataframe based on matching values from a list

Question

I am looking for help with pyspark on adding a new column with matching list values.

I have a list of values with variable unique_ids

[Row(card_id=1), Row(card_id=2)]

for each value in the list, if the list value matches column value, then count the number of rows that matches the value and add then create a new column with count value

this is how I am getting the list

unique_ids = data.select('card_id').distinct().collect()

example df

card_id
1
1
2
1
2
1

required dataframe

card_id	Count
1	4
1	4
2	2
1	4
2	2
1	4

Thanks

AdibP · Accepted Answer · 2021-07-17 15:46:52Z

1

Use window function count

import pyspark.sql.functions as F
from pyspark.sql.window import Window

unique_ids = data.withColumn('count', F.count('card_id').over(Window.partitionBy('card_id')))
unique_ids.show()

+-------+-----+
|card_id|count|
+-------+-----+
|      1|    4|
|      1|    4|
|      1|    4|
|      1|    4|
|      2|    2|
|      2|    2|
+-------+-----+

answered Jul 17, 2021 at 15:46

AdibP

2,9691 gold badge13 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

add a new column in pyspark dataframe based on matching values from a list

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related