I need to add a new column to a dataframe with a boolean value, evaluating a column inside the dataframe. For example, I have a dataframe of
+----+----+----+----+----+-----------+----------------+
|colA|colB|colC|colD|colE|colPRODRTCE| colCOND|
+----+----+----+----+----+-----------+----------------+
| 1| 1| 1| 1| 3| 39|colA=1 && colB>0|
| 1| 1| 1| 1| 3| 45| colD=1|
| 1| 1| 1| 1| 3| 447|colA>8 && colC=1|
+----+----+----+----+----+-----------+----------------+
In my new column I need to evaluate if the expression of colCOND is true or false.
It's easy if you have something like this:
val df = List(
(1,1,1,1,3),
(2,2,3,4,4)
).toDF("colA", "colB", "colC", "colD", "colE")
val myExpression = "colA<colC"
import org.apache.spark.sql.functions.expr
df.withColumn("colRESULT",expr(myExpression)).show()
+----+----+----+----+----+---------+
|colA|colB|colC|colD|colE|colRESULT|
+----+----+----+----+----+---------+
| 1| 1| 1| 1| 3| false|
| 2| 2| 3| 4| 4| true|
+----+----+----+----+----+---------+
But I have to evaluate a different expression in each row and it is inside the column colCOND.
I thought in create a UDF function with all columns, but my real dataframe have a lot of columns. How can I do it?
Thanks to everyone