I'd like to create a new column that is a JSON representation of some other columns. key, value pairs in a list.
Source:
| origin | destination | count |
|---|---|---|
| toronto | ottawa | 5 |
| montreal | vancouver | 10 |
What I want:
| origin | destination | count | json |
|---|---|---|---|
| toronto | ottawa | 5 | [{"origin":"toronto"},{"destination","ottawa"}, {"count": "5"}] |
| montreal | vancouver | 10 | [{"origin":"montreal"},{"destination","vancouver"}, {"count": "10"}] |
(everything can be a string, doesn't matter).
I've tried something like:
df.withColumn('json', to_json(struct(col('origin'), col('destination'), col('count'))))
But it creates the column with all the key:value pairs in one object:
{"origin":"United States","destination":"Romania"}
Is this possible without a UDF?