I am trying to add null columns to embebed array[struct] column, by this way I will able to transform a similar complex column:
case class Additional(id: String, item_value: String)
case class Element(income:String,currency:String,additional: Additional)
case class Additional2(id: String, item_value: String, extra2: String)
case class Element2(income:String,currency:String,additional: Additional2)
val my_uDF = fx.udf((data: Seq[Element]) => {
data.map(x=>new Element2(x.income,x.currency,new Additional2(x.additional.id,x.additional.item_value,null))).seq
})
sparkSession.sqlContext.udf.register("transformElements",my_uDF)
val result=sparkSession.sqlContext.sql("select transformElements(myElements),line_number,country,idate from entity where line_number='1'")
The goal is add to Element.Additional an extra field called extra2, for this reason I map this field with a UDF but it fails because:
org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<income:string,currency:string,additional:struct<id:string,item_value:string>>>) => array<struct<income:string,currency:string,additional:struct<id:string,item_value:string,extra2:string>>>)
If I print schema for 'Elements' field shows:
|-- myElements: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- income: string (nullable = true)
| | |-- currency: string (nullable = true)
| | |-- additional: struct (nullable = true)
| | | |-- id: string (nullable = true)
| | | |-- item_value: string (nullable = true)
And I am trying to convert into this schema:
|-- myElements: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- income: string (nullable = true)
| | |-- currency: string (nullable = true)
| | |-- additional: struct (nullable = true)
| | | |-- id: string (nullable = true)
| | | |-- item_value: string (nullable = true)
| | | |-- extra2: string (nullable = true)