0

I have column values which is of the form of {"1":"mediaMaaadadeftch||OAISAOID|true|ModsVersio|67900|clk|true|PPOOOS|20220501164113|34958|38177557..}

This is not a json format, some values are pipe separated and some are double pipe separated, how can we write a udf which breaks this value and convert into multiple columns.

col_1|col_2|col_3|col_4|..
1|mediaMaaadadeftch|OAISAOID|true| ..

1 Answer 1

0

instead of writing udf , you can do it by csv file

store the data in a csv file and load it

>>> df1 = spark.read.load("/path_to/sample.csv",format="csv", sep="|")
>>> df1.show()
+--------------------+----+--------+----+----------+-----+---+----+------+--------------+-----+--------+
|                 _c0| _c1|     _c2| _c3|       _c4|  _c5|_c6| _c7|   _c8|           _c9| _c10|    _c11|
+--------------------+----+--------+----+----------+-----+---+----+------+--------------+-----+--------+
|"1":"mediaMaaadad...|null|OAISAOID|true|ModsVersio|67900|clk|true|PPOOOS|20220501164113|34958|38177557|
+--------------------+----+--------+----+----------+-----+---+----+------+--------------+-----+--------+

columns with double pipe "||" will be null , so if you don't need those columns you can remove them

Sign up to request clarification or add additional context in comments.

1 Comment

data is very big, csv creating is not possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.