0

My schema structure is following. I need to concatenate #VALUE,@DescriptionCode and @LanguageCode these are nested to an array.

root
 |-- partnumber: string (nullable = true)
 |-- brandlabel: string (nullable = true)
 |-- availabledate: string (nullable = true)
 |-- description: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- #VALUE: string (nullable = true)
 |    |    |-- @DescriptionCode: string (nullable = true)
 |    |    |-- @LanguageCode: string (nullable = true)

I have tried a lot but nothing work for me. I need following schema

root
 |-- partnumber: string (nullable = true)
 |-- brandlabel: string (nullable = true)
 |-- availabledate: string (nullable = true)
 |-- descriptions: array (nullable = true)
 |-- |--   element: string (containsNull = true) 
1
  • Can you share what you have tried please ? Commented Aug 6, 2016 at 19:04

2 Answers 2

1

I believe you need to create an User Defined Function:

import org.apache.spark.sql.functions._

val func: (Seq[Row]) => Seq[String] = {
  _.map( 
    element =>
      element.getAs[String]("#VALUE") + 
      element.getAs[String]("@DescriptionCode") +
      element.getAs[String]("@LanguageCode")
  )
}

val myUDF = udf(func)

df.withColumn("descriptions", myUDF(col("description"))).drop(col("description"))

For more information about UDFs, you can read this article.

Sign up to request clarification or add additional context in comments.

Comments

0
    `root
     |-- partnumber: string (nullable = true)
     |-- brandlabel: string (nullable = true)
     |-- availabledate: string (nullable = true)
     |-- description: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- #VALUE: string (nullable = true)
     |    |    |-- @DescriptionCode: string (nullable = true)
     |    |    |-- @LanguageCode: string (nullable = true)
     |    |    |-- @Language: string (nullable = true)`

    suppose We want to concatenate 2 struct fields as one string separated by :,next 2 struct fields as another column.


root
 |-- partnumber: string (nullable = true)
 |-- brandlabel: string (nullable = true)
 |-- availabledate: string (nullable = true)
 |-- descriptions: array (nullable = true)
 |-- |--   element1: string (containsNull = true)
 |-- |--   element2: string (containsNull = true)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.