adding values from two different rows into one using pyspark

Question

I have two rows with the exact same data but columns changing between those two rows:

id	product	class	cost
1	table	large	5.12
1	table	medium	2.20

so I'm trying to get the following:

id	product	class	cost
1	table	large, Medium	7.32

I'm currently using the following code to get this:

df.groupBy("id", "product").agg(collect_list("class"),
    (
       F.sum("cost")
    ).alias("Sum")

The issue with this snippet code is that when doing the grouping is the first value it finds in class, and the addition doesn't seem to be correct (I'm not sure if it because is getting the first value and adding it the times it encounters class on that same id throughout the rows), so I'm getting something like this

id	product	class	cost
1	table	large, large	10.24

this is another snippet code I used, so I could get all my other fields while performing the addition on those two columns:

df.withColumn("total", F.sum("cost").over(Window.partitionBy("id")))

will it be the same to apply the F.array_join() function ?

过过招 · Accepted Answer · 2022-06-02 00:53:09Z

1

You need to use the array_join function to join the results of collect_list with commas (,).

df = df.groupBy('id', 'product').agg(
    F.array_join(F.collect_list('class'), ',').alias('class'),
    F.sum('cost').alias('cost')
)

answered Jun 2, 2022 at 0:53

过过招

4,3372 gold badges7 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Wendy Velasquez Over a year ago

I update my original post. I also tried what you post, for some reason in my data it only takes the first value it encounters. so instead of listing (large, medium) as in the example. It list (large, large) for all the additions is doing even if the class is actually different. is that because of the logic of the code?

Collectives™ on Stack Overflow

adding values from two different rows into one using pyspark

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related