0

I found several helpful answers but that were all converting son file to df, in my case, I have a df with columns with son in them, like this:

s-timestamp: 2019-10-10

content: {"META":{"testA":"1","TABLENAME":"some_table_name"},"PINACOLADA":{"sampleID":"0","itemInserted":"2019-10-10","sampleType":"BASE",}"

I need to normalize the content column, how can I do that.

2
  • What do you mean with normalization? You need to extract some columns from the json column into the initial df maybe? Commented Oct 12, 2019 at 7:41
  • probably similar to stackoverflow.com/questions/58037893/… Commented Oct 12, 2019 at 16:26

1 Answer 1

0

Welcome! There are a few ways of dealing with JSON strings in Spark DF columns. You can use functions like get_json_object to extract specific fields from your JSON or from_json to transform the field into a StructType with a given schema. Another option is to use spark.read.json to parse and create a separate dataframe from the column's contents. Have a look at my solution here and let me know if it helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, looks like I'll have same problem as the solution post, my data may be inconsistent. I am going to experiment using your approach. Thanks so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.