Is this even possible in spark dataframe (1.6/2.1)
val data="some variable"
df.filter("column1"> data)
I can do this with a static value but cant figure out how to do filter by a variable.
I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filter method signatures.
So yes, you can work with a non-literal, but try this:
import sparkSession.implicits._
df.filter($"column1" > data)
Note the $, which uses implicit conversion to turn the String into the Column named with that String. Meanwhile, this Column has a > method that takes an Any and returns a new Column. That Any will be your data value.
In Java, we can do like this:
int i =10;
//for equal condition
df.select("column1","column2").filter(functions.col("column1").equalTo(i)).show();
//for greater than or less than
df.select("no","name").filter(functions.col("no").gt(i)).show();
df.select("no","name").filter(functions.col("no").lt(i)).show();
you can simply do it using string interpolation
val data="some variable"
df.filter(s"column1 > $data")
Here is complete demo of filter using < > = on numeric columns where mysearchid is a number declared as val below...
scala>val numRows =10
scala>val ds = spark.range(0, numRows)
ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala>val df = ds.toDF("index")
df: org.apache.spark.sql.DataFrame = [index: bigint]
scala>df.show
+-----+
|index|
+-----+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+-----+
scala>val mysearchid=9
mysearchid: Int = 9
scala>println("filter with less than ")
filter with less than
scala>df.filter(df("index") < mysearchid).show
+-----+
|index|
+-----+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
+-----+
scala> println("filter with greater than ")
filter with greater than
scala> df.filter(df("index") > mysearchid).show
+-----+
|index|
+-----+
+-----+
scala> println("filter with equals ")
filter with equals
scala> df.filter(df("index") === mysearchid).show
+-----+
|index|
+-----+
| 9|
+-----+
val data?