I have a dataframe with following schema.
root
|-- id: string (nullable = true)
|-- type: string (nullable = true)
|-- key: string (nullable = true)
|-- value: string (nullable = true)
It is first pivoted on a column "value".
val pivot_df = df.groupBy("id","key").pivot("type").agg(first("value"))
The "value" column's data may contains of nested json . So earlier i was converting the whole "value" column to get the required schema .
val schema = spark.read.json(df.select("value").as[String]).schema
val res_df = df.select($"id",$"type",$"key",from_json($"value",schema).as("s")).select("id","type","key","s.*")
But now i need to first pivot it on "value" column data and then read from json of all the new generated columns . Can someone suggest what code changes should be applied?
df.show()? it will be easy for me to investigate?spark.read.jsontakes a string path to data, not string data. You should specify your jsonschemamanually like in this question: stackoverflow.com/questions/64764317/… and after pass it tofrom_jsonfunction.