Spark dataframe from Json string with nested key

Question

I have several columns to be extracted from json string. However one field has nested values. Not sure how to deal with that?

Need to explode into multiple rows to get values of field name, Value1, Value2.

import spark.implicits._

val df = Seq(
  ("1", """{"k": "foo", "v": 1.0}""", "some_other_field_1"),
  ("2", """{"p": "bar", "q": 3.0}""", "some_other_field_2"),
  ("3",
    """{"nestedKey":[ {"field name":"name1","Value1":false,"Value2":true},
      |                 {"field name":"name2","Value1":"100","Value2":"200"}
      |]}""".stripMargin, "some_other_field_3")

).toDF("id","json","other")

df.show(truncate = false)
val df1= df.withColumn("id1",col("id"))
  .withColumn("other1",col("other"))
  .withColumn("k",get_json_object(col("json"),"$.k"))
  .withColumn("v",get_json_object(col("json"),"$.v"))
  .withColumn("p",get_json_object(col("json"),"$.p"))
  .withColumn("q",get_json_object(col("json"),"$.q"))
  .withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
    .select("id1","other1","k","v","p","q","nestedKey")
df1.show(truncate = false)

mck · Accepted Answer · 2021-04-25 08:47:30Z

2

You can parse the nestedKey using from_json and explode it:

val df2 = df1.withColumn(
    "nestedKey", 
    expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
).select("*", "nestedKey.*").drop("nestedKey")

df2.show
+---+------------------+----+----+----+----+----------+------+------+
|id1|            other1|   k|   v|   p|   q|field name|Value1|Value2|
+---+------------------+----+----+----+----+----------+------+------+
|  1|some_other_field_1| foo| 1.0|null|null|      null|  null|  null|
|  2|some_other_field_2|null|null| bar| 3.0|      null|  null|  null|
|  3|some_other_field_3|null|null|null|null|     name1| false|  true|
|  3|some_other_field_3|null|null|null|null|     name2|   100|   200|
+---+------------------+----+----+----+----+----------+------+------+

answered Apr 25, 2021 at 8:47

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

CoolGoose Over a year ago

Thank you so much . Is it possible to alias those nested keys like field name as 'My Field' etc

mck Over a year ago

yes, you can rename columns using .withColumnRenamed

CoolGoose · Accepted Answer · 2021-04-26 23:37:19Z

i did it in one dataframe

 val df1= df.withColumn("id1",col("id"))
    .withColumn("other1",col("other"))
    .withColumn("k",get_json_object(col("json"),"$.k"))
    .withColumn("v",get_json_object(col("json"),"$.v"))
    .withColumn("p",get_json_object(col("json"),"$.p"))
    .withColumn("q",get_json_object(col("json"),"$.q"))
    .withColumn("nestedKey",get_json_object(col("json"),"$.nestedKey"))
  .withColumn(
    "nestedKey",
    expr("explode_outer(from_json(nestedKey, 'array<struct<`field name`:string, Value1:string, Value2:string>>'))")
  ).withColumn("fieldname",col("nestedKey.field name"))
    .withColumn("valueone",col("nestedKey.Value1"))
    .withColumn("valuetwo",col("nestedKey.Value2"))
   .select("id1","other1","k","v","p","q","fieldname","valueone","valuetwo")```


still working to make it more elegant

Collectives™ on Stack Overflow

Spark dataframe from Json string with nested key

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related