Consider the following DataFrame. Here I want the array of maps merged into one map without using UDFs.
+---+------------------------------------+
|id |greek |
+---+------------------------------------+
|1 |[{alpha -> beta}, {gamma -> delta}] |
|2 |[{epsilon -> zeta}, {etha -> theta}]|
+---+------------------------------------+
I think I've tried all the mapping funcions in the pyspark 3 docs. I thought I'd be able to do map_from_entries, but it just throws an exception where it says it requires maps and not an array of maps?
Although I'm aware that this is easily done using UDFs, I find it hard to believe that there are no easier way?
Runnable python code
from pyspark.sql import SparkSession
spark = (
SparkSession
.builder
.getOrCreate()
)
df = spark.createDataFrame([
(1, [{"alpha": "beta"}, {"gamma": "delta"}]),
(2, [{"epsilon": "zeta"}, {"etha": "theta"}])
],
schema=["id", "greek"]
)