0

I have a dataframe with following schema.

root
 |-- id: string (nullable = true)
 |-- type: string (nullable = true)
 |-- key: string (nullable = true)
 |-- value: string (nullable = true)

It is first pivoted on a column "value".

val pivot_df = df.groupBy("id","key").pivot("type").agg(first("value"))

The "value" column's data may contains of nested json . So earlier i was converting the whole "value" column to get the required schema .

val schema = spark.read.json(df.select("value").as[String]).schema

val res_df = df.select($"id",$"type",$"key",from_json($"value",schema).as("s")).select("id","type","key","s.*")

But now i need to first pivot it on "value" column data and then read from json of all the new generated columns . Can someone suggest what code changes should be applied?

2
  • could you share the sample data input or df.show()? it will be easy for me to investigate? Commented Aug 6, 2020 at 4:50
  • you are using spark reader json wrong. spark.read.json takes a string path to data, not string data. You should specify your json schema manually like in this question: stackoverflow.com/questions/64764317/… and after pass it to from_json function. Commented Nov 21, 2020 at 9:57

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.