3

Let the schema of my json is :

       root
     |-- data: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: string (containsNull = true)

JSON is like this

{
  "data": [
    [
      10429183,
      "4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6",
      10429183,
      1454527245,
      "386824",
      1454527245,
      "386824",
      null,
      "6702002",
      "HM193685",
      "2006-02-21T21:00:00",
      "078XX S VERNON AVE",
      "2092",
      "NARCOTICS",
      "SOLICIT NARCOTICS ON PUBLICWAY",
      "STREET",
      true,
      false,
      "0624",
      "006",
      "6",
      "69",
      "26",
      null,
      null,
      "2006",
      "2015-08-17T15:03:40",
      null,
      null,
      [
        null,
        null,
        null,
        null,
        null
      ]
    ]
  ]
}
val df2 = 
df1
.withColumn("data", explode(array(jsonElements: _*)))
.withColumn("id", $"data" (0)).select("data.*") 

error :

Exception in thread "main" org.apache.spark.sql.AnalysisException: Can only star expand struct data types. Attribute: ArrayBuffer(data);

Need to create a data frame for every element of data?

2 Answers 2

1

As I understood, you are trying to split each json element in the array as seperate column...

one way is as below

import org.apache.spark.sql._

object JsonTest extends App {
  val jsonStr =
    """
      |{
      |  "data": [
      |    [
      |      10429183,
      |      "4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6",
      |      10429183,
      |      1454527245,
      |      "386824",
      |      1454527245,
      |      "386824",
      |      null,
      |      "6702002",
      |      "HM193685",
      |      "2006-02-21T21:00:00",
      |      "078XX S VERNON AVE",
      |      "2092",
      |      "NARCOTICS",
      |      "SOLICIT NARCOTICS ON PUBLICWAY",
      |      "STREET",
      |      true,
      |      false,
      |      "0624",
      |      "006",
      |      "6",
      |      "69",
      |      "26",
      |      null,
      |      null,
      |      "2006",
      |      "2015-08-17T15:03:40",
      |      null,
      |      null,
      |      [
      |        null,
      |        null,
      |        null,
      |        null,
      |        null
      |      ]
      |    ]
      |  ]
      |}
    """.stripMargin
  private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

  spark.sparkContext.setLogLevel("ERROR")

  import org.apache.spark.sql.functions._
  import spark.implicits._

  val df1 = spark.read.json(Seq(jsonStr).toDS)
  println("before explode")
  df1.show(false)
  println(df1.schema)
  println("after explode")
  //  import org.apache.spark.sql.functions.schema_of_json
  //  val schema = df1.select(schema_of_json($"data")).as[String].first
  //  df1.withColumn("jsonData", from_json($"data", schema, Map[String, String]())).show
  val df2 = df1
    .withColumn("data", explode(col("data")))
  println(df2.schema)
  df2.show(false)

  val nElements = 35
  df2.select(Range(0, nElements).map(idx => $"data" (idx) as "data" + (idx + 2)): _*).show(false)

}

Result :

before explode
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|data                                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[10429183, 4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6, 10429183, 1454527245, 386824, 1454527245, 386824,, 6702002, HM193685, 2006-02-21T21:00:00, 078XX S VERNON AVE, 2092, NARCOTICS, SOLICIT NARCOTICS ON PUBLICWAY, STREET, true, false, 0624, 006, 6, 69, 26,,, 2006, 2015-08-17T15:03:40,,, [null,null,null,null,null]]]|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

StructType(StructField(data,ArrayType(ArrayType(StringType,true),true),true))
after explode
StructType(StructField(data,ArrayType(StringType,true),true))
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|data                                                                                                                                                                                                                                                                                                                  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[10429183, 4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6, 10429183, 1454527245, 386824, 1454527245, 386824,, 6702002, HM193685, 2006-02-21T21:00:00, 078XX S VERNON AVE, 2092, NARCOTICS, SOLICIT NARCOTICS ON PUBLICWAY, STREET, true, false, 0624, 006, 6, 69, 26,,, 2006, 2015-08-17T15:03:40,,, [null,null,null,null,null]]|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+--------+------------------------------------+--------+----------+------+----------+------+-----+-------+--------+-------------------+------------------+------+---------+------------------------------+------+------+------+------+------+------+------+------+------+------+------+-------------------+------+------+--------------------------+------+------+------+------+------+
|data2   |data3                               |data4   |data5     |data6 |data7     |data8 |data9|data10 |data11  |data12             |data13            |data14|data15   |data16                        |data17|data18|data19|data20|data21|data22|data23|data24|data25|data26|data27|data28             |data29|data30|data31                    |data32|data33|data34|data35|data36|
+--------+------------------------------------+--------+----------+------+----------+------+-----+-------+--------+-------------------+------------------+------+---------+------------------------------+------+------+------+------+------+------+------+------+------+------+------+-------------------+------+------+--------------------------+------+------+------+------+------+
|10429183|4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6|10429183|1454527245|386824|1454527245|386824|null |6702002|HM193685|2006-02-21T21:00:00|078XX S VERNON AVE|2092  |NARCOTICS|SOLICIT NARCOTICS ON PUBLICWAY|STREET|true  |false |0624  |006   |6     |69    |26    |null  |null  |2006  |2015-08-17T15:03:40|null  |null  |[null,null,null,null,null]|null  |null  |null  |null  |null  |
+--------+------------------------------------+--------+----------+------+----------+------+-----+-------+--------+-------------------+------------------+------+---------+------------------------------+------+------+------+------+------+------+------+------+------+------+------+-------------------+------+------+--------------------------+------+------+------+------+------+

you can change column names with withColumn and you can dropColumns which are not needed..

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much. It was really helpful and it works
1

If I understood correctly, you are trying to explode the external array into a new column data. Then get the first value of that array into a new field id. If that is the case then the next code should help you:

df.withColumn("data", explode($"data"))
  .withColumn("id", $"data".getItem(0))
  .show()

Output:

+--------------------+--------+
|                data|      id|
+--------------------+--------+
|[10429183, 4057F5...|10429183|
+--------------------+--------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.