1

I am looking for help with pyspark on adding a new column with matching list values.

I have a list of values with variable unique_ids

[Row(card_id=1), Row(card_id=2)]

for each value in the list, if the list value matches column value, then count the number of rows that matches the value and add then create a new column with count value

this is how I am getting the list

unique_ids = data.select('card_id').distinct().collect()

example df

card_id
1
1
2
1
2
1

required dataframe

card_id Count
1 4
1 4
2 2
1 4
2 2
1 4

Thanks

1 Answer 1

1

Use window function count

import pyspark.sql.functions as F
from pyspark.sql.window import Window

unique_ids = data.withColumn('count', F.count('card_id').over(Window.partitionBy('card_id')))
unique_ids.show()

+-------+-----+
|card_id|count|
+-------+-----+
|      1|    4|
|      1|    4|
|      1|    4|
|      1|    4|
|      2|    2|
|      2|    2|
+-------+-----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.