How to Explode PySpark column having multiple dictionaries in one row

Question

I have a spark dataframe with one column having multiple dictionaries:

id	result
1	{'key1':'a', 'key2':'b'}, {'key1':'d', 'key2':'e'}, {'key1':'m', 'key2':'n'}
2	{'key1':'r', 'key2':'s'}, {'key1':'t', 'key2':'u'}

I need the final output as:

id	key1	key2
1	a	b
1	d	e
1	m	n
2	r	s
2	t	u

And planning to explode this twice to get the results.

Although, The column result is of StringType() and therefore I am unable to explode it using the explode function:

df.withColumn("output", explode(col("result")))

Error:

AnalysisException: cannot resolve 'explode(result)' due to data type mismatch: input to function explode should be array or map type, not string; 'Project [result#9651, explode(result#9651) AS output#9660] +- Relation[result#9651] json

Please help on how to resolve this.

过过招 · Accepted Answer · 2022-05-12 07:21:19Z

1

First convert the result column to an array of struct structure using the from_json function, and then expand it using the inline function.

json_schema = """
    array<struct<key1:string,key2:string>>
"""
df = df.withColumn('result', F.from_json(F.concat(F.lit('['), 'result', F.lit(']')), json_schema)) \
    .selectExpr('id', 'inline(result)')
df.show(truncate=False)

answered May 12, 2022 at 7:21

过过招

4,3372 gold badges7 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to Explode PySpark column having multiple dictionaries in one row

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related