I am a newbee and trying to resolve the following problem. Any help is highly appreciated.
I have the following Json.
{
"index": "identity",
"type": "identity",
"id": "100000",
"source": {
"link_data": {
"source_Id": "0011245"
},
"attribute_data": {
"first": {
"val": [
true
],
"updated_at": "2011"
},
"second": {
"val": [
true
],
"updated_at": "2010"
}
}
}
}
Attributes under "attribute_data" may vary. it can have another attribute, say "third"
I am expecting the result in below format:
_index _type _id source_Id attribute_data val updated_at
ID ID randomid 00000 first true 2000-08-08T07:51:14Z
ID ID randomid 00000 second true 2010-08-08T07:51:14Z
I tried the following approach.
val df = spark.read.json("sample.json")
val res = df.select("index","id","type","source.attribute_data.first.updated_at", "source.attribute_data.first.val", "source.link_data.source_id");
It just adds new column not the rows as following
index id type updated_at val source_id
identity 100000 identity 2011 [true] 0011245