How do I aggregate an array of maps in Java Spark

Question

I have a dataset "events" that includes an array of maps. I want to turn it into one map which is the aggregation of the amounts and counts

Currently, I'm running the following statement:

events.select(functions.col(“totalAmounts)).collectAsList()

which returns the following:

[
    [
        Map(totalCreditAmount -> 10, totalDebitAmount -> 50)
    ],
    [
        Map(totalCreditAmount -> 50, totalDebitAmount -> 100)
    ]   
]

I want to aggregate the amounts and counts and have it return:

[
    Map(totalCreditAmount -> 60, totalDebitAmount -> 150)
]

Lingesh.K · Accepted Answer · 2024-02-19 23:20:20Z

0

You can try using the explode function on the array of map column to get the result into an flattened array and then performing the sum aggregate

from pyspark.sql import functions as F

df = events.select(F.explode("totalAmounts").alias("flattenedAmounts"))
df = df.select(F.explode(df.flattenedAmounts)).groupBy("key").agg(F.sum("value").alias("value"))

final_result_as_map = df.rdd.collectAsMap()

The final_result_as_map must be of the shape and form you are expecting.

answered Feb 19, 2024 at 23:20

Lingesh.K

5104 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How do I aggregate an array of maps in Java Spark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related