I need to conditionally modify a value of a nested field in a Dataframe (or create a new field with the nested values). I would like to do it without having to use UDF, but I really would want to avoid RDD/map since the production tables can have many hundred millions of records and map in that condition dosen't ring as efficient/fast to me.
Bellow is the test case:
case class teste(var testID: Int = 0, var testDesc: String = "", var testValue: String = "")
val DFMain = Seq( ("A",teste(1, "AAA", "10")),("B",teste(2, "BBB", "20")),("C",teste(3, "CCC", "30"))).toDF("F1","F2")
val DFNewData = Seq( ("A",teste(1, "AAA", "40")),("B",teste(2, "BBB", "50")),("C",teste(3, "CCC", "60"))).toDF("F1","F2")
val DFJoined = DFMain.join(DFNewData,DFMain("F2.testID")===DFNewData("F2.testID"),"left").
select(DFMain("F1"), DFMain("F2"), DFNewData("F2.testValue").as("NewValue")).
withColumn("F2.testValue",$"NewValue")
DFJoined.show()
This will add a new column, but I need that F2.testValue to be equal to the value of NewValue inside the Struct when its above 50.
Original Data:
+---+------------+
| F1| F2|
+---+------------+
| A|[1, AAA, 10]|
| B|[2, BBB, 20]|
| C|[3, CCC, 30]|
+---+------------+
Desired Result:
+---+------------+
| F1| F2|
+---+------------+
| A|[1, AAA, 10]|
| B|[2, BBB, 50]|
| C|[3, CCC, 60]|
+---+------------+