0

I have a dataset "events" that includes an array of maps. I want to turn it into one map which is the aggregation of the amounts and counts

Currently, I'm running the following statement:

events.select(functions.col(“totalAmounts)).collectAsList() 

which returns the following:

[
    [
        Map(totalCreditAmount -> 10, totalDebitAmount -> 50)
    ],
    [
        Map(totalCreditAmount -> 50, totalDebitAmount -> 100)
    ]   
]

I want to aggregate the amounts and counts and have it return:

[
    Map(totalCreditAmount -> 60, totalDebitAmount -> 150)
]   

1 Answer 1

0

You can try using the explode function on the array of map column to get the result into an flattened array and then performing the sum aggregate

from pyspark.sql import functions as F

df = events.select(F.explode("totalAmounts").alias("flattenedAmounts"))
df = df.select(F.explode(df.flattenedAmounts)).groupBy("key").agg(F.sum("value").alias("value"))

final_result_as_map = df.rdd.collectAsMap()

The final_result_as_map must be of the shape and form you are expecting.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.