I'm working on a spark structured streaming app and I'm trying to parse JSON given in below format.
{"name":"xyz","age":29,"details":["city":"mumbai","country":"India"]}
{"name":"abc","age":25,"details":["city":"mumbai","country":"India"]}
Below is my Spark code to parse the JSON:
import org.apache.spark.sql.types._
import spark.implicits._
val schema= new StructType()
.add("name",DataTypes.StringType )
.add("age", DataTypes.IntegerType)
.add("details",
new StructType()
.add("city", DataTypes.StringType)
.add("country", DataTypes.StringType)
)
val dfLogLines = dfRawData.selectExpr("CAST(value AS STRING)") //Converting binary to text
val personNestedDf = dfLogLines.select(from_json($"value", schema).as("person"))
val personFlattenedDf = personNestedDf.selectExpr("person.name", "person.age")
personFlattenedDf.printSchema()
personFlattenedDf.writeStream.format("console").option("checkpointLocation",checkpoint_loc3).start().awaitTermination()
Output:
root
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
-------------------------------------------
Batch: 0
-------------------------------------------
+----+----+
|name| age|
+----+----+
|null|null|
|null|null|
+----+----+
The code does not throw any error but it returns null values in output. What am I doing wrong here? Thanks in advance.