I have multiple jsons coming from any restapi's and I don't know the schema of it. I am unable to use the explode function of dataframes , because i am unaware about the column names, which is getting created by spark api.
1.Can we store the keys of the nested arrays elements keys by decoding values from dataframe.schema.fields, As spark only provides the value part in the rows of the dataframe and take the top level key as column name.
Dataframe --
+--------------------+
| stackoverflow|
+--------------------+
|[[[Martin Odersky...|
+--------------------+
Is there any optimal way to flatten the json by using the dataframe methods via determining the schema at the run time.
Sample Json -:
{
"stackoverflow": [{
"tag": {
"id": 1,
"name": "scala",
"author": "Martin Odersky",
"frameworks": [
{
"id": 1,
"name": "Play Framework"
},
{
"id": 2,
"name": "Akka Framework"
}
]
}
},
{
"tag": {
"id": 2,
"name": "java",
"author": "James Gosling",
"frameworks": [
{
"id": 1,
"name": "Apache Tomcat"
},
{
"id": 2,
"name": "Spring Boot"
}
]
}
}
]
}
Note - We need to do all the operations in dataframe , because there is a huge amount of data , that is coming and we cannot parse each and every json.