I'm using pyspark 3.0.1 and I have a json file where i need to parse a json column, the json looks as follows:
df1.select("mycol").show()
[
{"l1": 0, "l2": "abc", "l3": "xyz"},
{"l1": 1, "l2": "def", "l3": "xzz"},
{"l1": 2, "l2": "ghi", "l3": "yyy"},
]
I want either a dataframe column or a string that returns the following output in the form of "l2.value: l3.value" for each array in the list.
abc: xyz
def: xzz
ghi: yyy
so far i have this:
df1.createOrReplaceTempView("MY_TEST")
select mc.l2 ||": "|| mc.l3 from (select explode(mycol) as mc from MY_TEST)
and it does give me the result i want but each line is in a different row because of the explode, i need it all in 1 single row or 1 single string (including end of line):
concat(concat(mc.l2 AS l2, : ), mc.l3 AS l3) |
|---|
| abc: xyz |
| def: xzz |
| ghi: yyy |
desired output:
| result |
|---|
| abc: xyz/ndef: xzz/nghi: yyy |
i wonder also if there's anything more efficient and perhaps not having to go through a temp table.