0

I stored into my MongoDB some JSON documents. Each document looks like: {"businessData":{"capacity":{"fuelCapacity":282}, ..}.

After reading all the documents, I want to export them as a valid JSON file. Specifically:

// Read JSON data from the DB
val df: DataFrame = MongoSpark.load(sparkSession, readConfig)
df.show
// Export into the file system
df.coalesce(1).write.mode(SaveMode.Overwrite).json("export.json")
// The show command only shows the .json values
+--------------------+
|        businessData|
+--------------------+
|[[282],0,[true,20...|
|[[280],0,[true,20...|
|[[290],0,[true,20...|
|[[292],0,[true,20...|
|[[282],16,[true,2...|
+--------------------+

// export.json
{"businessData":{"capacity":{"fuelCapacity":282}, ..}
{"businessData":{"capacity":{"fuelCapacity":280}, ..}
{"businessData":{"capacity":{"fuelCapacity":290}, ..}
{"businessData":{"capacity":{"fuelCapacity":292}, ..}
{"businessData":{"capacity":{"fuelCapacity":282}, ..}

But when I export to the file system I want to combine these 5 rows into an Array and also add some custom meta-data. For example:

{
  "metadata" : { "exportTime": "20/20/2020" , ...} 
  "allBusinessData" : [
    {"businessData":{"capacity":{"fuelCapacity":282}, ..},
    // all 5 rows from above
  ]
}

I have seen questions here and here advising against it. They also partially answer the question, as the don't add a custom json structure to the export.

Assuming however that this is only way I can proceed, how can I do it?

Many thanks!

1 Answer 1

0

From Spark-2.2+:

You can try using to_json (or) create struct<array<...etc>> field in spark then write the df in json format to get required output.

  • for sample data i assumed exportedtime as current_timestamp()

Example:

val df=spark.read.json(Seq("""[{"businessData":{"capacity":{"fuelCapacity":282}}},{"businessData":{"capacity":{"fuelCapacity":456}}}""").toDS)

//creating a struct field called metadata and write data in json format.
df.selectExpr("struct(current_timestamp() as exporttime,struct(collect_list(businessData) as businessData)as allBusinessData) as metadata").write.format("json").mode("overwrite").save("json_path")

//using .to_json to create json object in dataframe
df.selectExpr("to_json(struct(current_timestamp() as exporttime,struct(collect_list(businessData) as businessData)as allBusinessData))metadata").show(false)

//+-------------------------------------------------------------------------------------------------------------------------------------------------------+
//|metadata                                                                                                                                               |
//+-------------------------------------------------------------------------------------------------------------------------------------------------------+
//|{"exporttime":"2020-03-21T15:17:54.769-05:00","allBusinessData":{"businessData":[{"capacity":{"fuelCapacity":282}},{"capacity":{"fuelCapacity":456}}]}}|
//+-------------------------------------------------------------------------------------------------------------------------------------------------------+

//using  .toJSON to view json in shell(non-prod use only)
df.selectExpr("struct(current_timestamp() as exporttime,struct(collect_list(businessData) as businessData)as allBusinessData)metadata").toJSON.collect()

//Array[String] = Array({"metadata":{"exporttime":"2020-03-21T15:19:35.890-05:00","allBusinessData":{"businessData":[{"capacity":{"fuelCapacity":282}},{"capacity":{"fuelCapacity":456}}]}}})
Sign up to request clarification or add additional context in comments.

2 Comments

Hello, I have to read using the MongoSpark driver (MongoSpark.load), which does not have a .toDS method. I still tried your solution, it almost answers my question. I get: "allBusinessData":{instead of the desired "allBusinessData":[{... (note the opening quare bracket [)
@user1485864, U don't have to worry about .toDS... for sample data I used .toDS to read as .json. for your case your df is in json format.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.