Spark - How to parse a JSON-escaped String field as a JSON Object in DataFrames?

Question

I have as input a set of files formatted as a single JSON object per line. The problem, however, is that one field on these JSON objects is a JSON-escaped String. Example

{"clientAttributes":{"backfillId":null,"clientPrimaryKey":"abc"},"escapedJsonPayload":"{\"name\":\"Akash\",\"surname\":\"Patel\",\"items\":[{\"itemId\":\"abc\",\"itemName\":\"xyz\"}"}

As I create a data frame by reading json file, it is creating data frame as below

val df = spark.sqlContext.read.json("file:///home/akaspate/sample.json")
df: org.apache.spark.sql.DataFrame = [clientAttributes: struct<backfillId: string, clientPrimaryKey: string>, escapedJsonPayload: string]

As we can see "escapedJsonPayload" is String and I need it to be Struct.

Note: I got similar question in StackOverflow and followed it (How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?) but it is giving me "[_corrupt_record: string]"

I have tried below steps

val df = spark.sqlContext.read.json("file:///home/akaspate/sample.json") (Work file)
val escapedJsons: RDD[String] = sc.parallelize(Seq("""df"""))
val unescapedJsons: RDD[String] = escapedJsons.map(_.replace("\"{", "{").replace("\"}", "}").replace("\\\"", "\""))
val dfJsons: DataFrame = spark.sqlContext.read.json(unescapedJsons) (This results in [_corrupt_record: string])

Any help would be appreciated

himanshuIIITian · Accepted Answer · 2017-06-24 07:09:47Z

4

First of all the JSON you have provided is of wrong format (syntactically). The corrected JSON is as follows:

{"clientAttributes":{"backfillId":null,"clientPrimaryKey":"abc"},"escapedJsonPayload":{\"name\":\"Akash\",\"surname\":\"Patel\",\"items\":[{\"itemId\":\"abc\",\"itemName\":\"xyz\"}]}}

Next, to parse the JSON correctly from the above JSON, you have to use following code:

val rdd = spark.read.textFile("file:///home/akaspate/sample.json").toJSON.map(value => value.replace("\\", "").replace("{\"value\":\"", "").replace("}\"}", "}")).rdd

val df = spark.read.json(rdd)

Above code will give you following output:

df.show(false)

+----------------+-------------------------------------+
|clientAttributes|escapedJsonPayload                   |
+----------------+-------------------------------------+
|[null,abc]      |[WrappedArray([abc,xyz]),Akash,Patel]|
+----------------+-------------------------------------+

With following schema:

df.printSchema

root
 |-- clientAttributes: struct (nullable = true)
 |    |-- backfillId: string (nullable = true)
 |    |-- clientPrimaryKey: string (nullable = true)
 |-- escapedJsonPayload: struct (nullable = true)
 |    |-- items: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- itemId: string (nullable = true)
 |    |    |    |-- itemName: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- surname: string (nullable = true)

I hope this helps !

answered Jun 24, 2017 at 7:09

himanshuIIITian

6,1157 gold badges55 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Akash Patel Over a year ago

Thanks himanshuIIITian@ for detailed answer. Regarding "First of all the JSON you have provided is of wrong format (syntactically)", we do not have control over this input file and we are getting this format from up-stream service. So it would appear like this "escapedJsonPayload":"{\"name\":\"Akash\"}". Can you please let us know how to handle this in spark?

himanshuIIITian Over a year ago

Simple...just apply replace("\\", "") on JSON value. Like I mentioned in my answer.

himanshuIIITian Over a year ago

@AkashPatel Please accept the answer or provide feedback.

Akash Patel Over a year ago

Accepted the answer. Thanks for you help

Collectives™ on Stack Overflow

Spark - How to parse a JSON-escaped String field as a JSON Object in DataFrames?

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related