concatenating json list attributes in pyspark into one value

Question

I'm using pyspark 3.0.1 and I have a json file where i need to parse a json column, the json looks as follows:

df1.select("mycol").show()

[
    {"l1": 0, "l2": "abc", "l3": "xyz"}, 
    {"l1": 1, "l2": "def", "l3": "xzz"}, 
    {"l1": 2, "l2": "ghi", "l3": "yyy"}, 
]

I want either a dataframe column or a string that returns the following output in the form of "l2.value: l3.value" for each array in the list.

abc: xyz
def: xzz
ghi: yyy

so far i have this:

df1.createOrReplaceTempView("MY_TEST")
select mc.l2 ||": "|| mc.l3 from (select explode(mycol) as mc from MY_TEST)

and it does give me the result i want but each line is in a different row because of the explode, i need it all in 1 single row or 1 single string (including end of line):

concat(concat(mc.l2 AS `l2`, : ), mc.l3 AS `l3`)
abc: xyz
def: xzz
ghi: yyy

desired output:

result
abc: xyz/ndef: xzz/nghi: yyy

i wonder also if there's anything more efficient and perhaps not having to go through a temp table.

Nithish · Accepted Answer · 2022-08-10 17:22:22Z

1

You can use transform higher order function, since you are in Spark 3.0.1 you can use transform available as a SQL expression. And then concatenate the elements in the array using 'concat_ws`.

from pyspark.sql import functions as F

data_row = ([
    {"l1": 0, "l2": "abc", "l3": "xyz"}, 
    {"l1": 1, "l2": "def", "l3": "xzz"}, 
    {"l1": 2, "l2": "ghi", "l3": "yyy"}, 
], )

df = spark.createDataFrame([data_row], "STRUCT<mycol:ARRAY<STRUCT<l1: INT, l2: STRING, l3: STRING>>>")


(df.select(F.concat_ws("/n", 
                       F.expr("transform(mycol, x -> concat(x.l2, ':', x.l3))"))
           .alias("result"))
   .show(truncate=False))

"""
+-------------------------+
|result                   |
+-------------------------+
|abc:xyz/ndef:xzz/nghi:yyy|
+-------------------------+
"""

answered Aug 10, 2022 at 17:22

Nithish

3,2472 gold badges11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Klam Over a year ago

Just tested it and works perfectly well, exactly what i needed, thank you so much! Answer accepted.

Collectives™ on Stack Overflow

concatenating json list attributes in pyspark into one value

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related