I want to implement an functionality to remove an element from an array of struct in spark scala.For the date "2019-01-26" I want to remove the entire struct from the array column. Following is my code :
import org.apache.spark.sql.types._
val df=Seq(("123","Jack",Seq(("2020-04-26","200","72","ABC"),("2020-05-26","300","71","ABC"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"))),("124","jones",Seq(("2020-04-26","200","72","ABC"),("2020-05-26","300","71","ABC"),("2020-06-26","200","70","ABC"),("2020-08-26","300","69","ABC"),("2020-08-26","300","69","ABC"))),("125","daniel",Seq(("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF")))).toDF("id","name","history").withColumn("history",$"history".cast("array<struct<infodate:Date,amount1:Integer,amount2:Integer,detail:string>>"))
scala> df.printSchema
root
|-- id: string (nullable = true)
|-- name: string (nullable = true)
|-- history: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- infodate: date (nullable = true)
| | |-- amount1: integer (nullable = true)
| | |-- amount2: integer (nullable = true)
| | |-- detail: string (nullable = true)
So the for the date 2019-01-26 , I want to remove the struct in which it is present so that it is removed from the array column.I want a solution like this.
I manage to find the solution but it involves lot of hardcoding and I'm searching for a solution/suggestion that is optimal.
Hardcoded solution:
val dfnew=df
.withColumn( "history" ,
array_except(
col("history"),
array(
struct(
lit("2019-01-26").cast(DataTypes.DateType).alias("infodate"),
lit("200").cast(DataTypes.IntegerType).alias("amount1"),
lit("70").cast(DataTypes.IntegerType).alias("amount2"),
lit("DEF").alias("detail")
)
)
)
)
Is there any way of optimally doing it with one filter condition only on date "2019-01-26", which removes the struct/array from the array column.

