9

Is this even possible in spark dataframe (1.6/2.1)

val data="some variable"

df.filter("column1"> data)

I can do this with a static value but cant figure out how to do filter by a variable.

1
  • Can you provide an example of val data ? Commented Apr 23, 2017 at 14:10

8 Answers 8

6
import org.apache.spark.sql.functions._

val data="some variable"
df.filter(col("column1") > lit(data))
Sign up to request clarification or add additional context in comments.

Comments

4

I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filter method signatures.

So yes, you can work with a non-literal, but try this:

import sparkSession.implicits._
df.filter($"column1" > data)

Note the $, which uses implicit conversion to turn the String into the Column named with that String. Meanwhile, this Column has a > method that takes an Any and returns a new Column. That Any will be your data value.

Comments

1

In Java, we can do like this:

  int i  =10;

 //for equal condition
  df.select("column1","column2").filter(functions.col("column1").equalTo(i)).show();

 //for greater than or less than
 df.select("no","name").filter(functions.col("no").gt(i)).show();
 df.select("no","name").filter(functions.col("no").lt(i)).show();

Comments

1

Yes, you can use a variable to filter Spark Dataframe.

val keyword = "my_key_word"
var keyword = "my_key_word" // if it is a variable

df.filter($"column1".contains(keyword))
df.filter(lower($"column1").contains(keyword)) //if not case sensitive

Comments

1

you can simply do it using string interpolation

val data="some variable"
df.filter(s"column1 > $data")

Comments

1
import org.apache.spark.sql.functions._

val portfolio_name = "Product"

spark.sql("""SELECT
   *
FROM
    Test""").filter($"portfolio_name"===s"$portfolio_name").show(100)

Comments

0

Here is complete demo of filter using < > = on numeric columns where mysearchid is a number declared as val below...

scala>val numRows =10
scala>val ds = spark.range(0, numRows)
ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala>val df = ds.toDF("index")
df: org.apache.spark.sql.DataFrame = [index: bigint]

scala>df.show
+-----+
|index|
+-----+
|    0|
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
|    7|
|    8|
|    9|
+-----+


scala>val mysearchid=9
mysearchid: Int = 9

scala>println("filter with less than ")
filter with less than

scala>df.filter(df("index") < mysearchid).show
+-----+
|index|
+-----+
|    0|
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
|    7|
|    8|
+-----+


scala> println("filter with greater than ")
filter with greater than

scala> df.filter(df("index") > mysearchid).show
+-----+
|index|
+-----+
+-----+


scala> println("filter with equals ")
filter with equals

scala> df.filter(df("index") ===  mysearchid).show
+-----+
|index|
+-----+
|    9|
+-----+

Comments

0
val x = "2020-05-01"
df.filter($"column_name"===x).show()

It will work,if you want to compare a variable with the whole column

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.