0

I have several columns to be extracted from json string. However one field has nested values. Not sure how to deal with that?

Need to explode into multiple rows to get values of field name, Value1, Value2.

import spark.implicits._

val df = Seq(
  ("1", """{"k": "foo", "v": 1.0}""", "some_other_field_1"),
  ("2", """{"p": "bar", "q": 3.0}""", "some_other_field_2"),
  ("3",
    """{"nestedKey":[ {"field name":"name1","Value1":false,"Value2":true},
      |                 {"field name":"name2","Value1":"100","Value2":"200"}
      |]}""".stripMargin, "some_other_field_3")

).toDF("id","json","other")

df.show(truncate = false)
val df1= df.withColumn("id1",col("id"))
  .withColumn("other1",col("other"))
  .withColumn("k",get_json_object(col("json"),"$.k"))
  .withColumn("v",get_json_object(col("json"),"$.v"))
  .withColumn("p",get_json_object(col("json"),"$.p"))
  .withColumn("q",get_json_object(col("json"),"$.q"))
  .withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
    .select("id1","other1","k","v","p","q","nestedKey")
df1.show(truncate = false)

2 Answers 2

2

You can parse the nestedKey using from_json and explode it:

val df2 = df1.withColumn(
    "nestedKey", 
    expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
).select("*", "nestedKey.*").drop("nestedKey")

df2.show
+---+------------------+----+----+----+----+----------+------+------+
|id1|            other1|   k|   v|   p|   q|field name|Value1|Value2|
+---+------------------+----+----+----+----+----------+------+------+
|  1|some_other_field_1| foo| 1.0|null|null|      null|  null|  null|
|  2|some_other_field_2|null|null| bar| 3.0|      null|  null|  null|
|  3|some_other_field_3|null|null|null|null|     name1| false|  true|
|  3|some_other_field_3|null|null|null|null|     name2|   100|   200|
+---+------------------+----+----+----+----+----------+------+------+
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much . Is it possible to alias those nested keys like field name as 'My Field' etc
yes, you can rename columns using .withColumnRenamed
0

i did it in one dataframe

 val df1= df.withColumn("id1",col("id"))
    .withColumn("other1",col("other"))
    .withColumn("k",get_json_object(col("json"),"$.k"))
    .withColumn("v",get_json_object(col("json"),"$.v"))
    .withColumn("p",get_json_object(col("json"),"$.p"))
    .withColumn("q",get_json_object(col("json"),"$.q"))
    .withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
  .withColumn(
    "nestedKey",
    expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
  ).withColumn("fieldname",col("nestedKey.field name"))
    .withColumn("valueone",col("nestedKey.Value1"))
    .withColumn("valuetwo",col("nestedKey.Value2"))
   .select("id1","other1","k","v","p","q","fieldname","valueone","valuetwo")```


still working to make it more elegant

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.