1

Is it possible to query spark sql by ids?

Ideally what I'm looking for is something like this

val ids = ["123", "345", "456", "972"]
df.filter(df("id") in ids)

Another idea scenario would be if I could even pass in a dataframe that has a single column.

val ids = df.map(r => r.getString(1))
dataDf.filter(dataDF("id" in ids)
1
  • @cheseaux Dataframes have a collection behind them, its more of create a dataframe, then for a specific value go query values by that id. Commented Nov 2, 2016 at 18:37

2 Answers 2

4

I'm not sure I understood your question correctly, but you can use isin to filter based on a list of values. Here is an example

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val df = sc.parallelize(Seq(1,2,3)).toDF("id")
df.show

+---+
| id|
+---+
|  1|
|  2|
|  3|
+---+

Then you can filter using a List/Seq/Array that you have to convert to a vararg like this :

val ids = Array(1,2)
df.filter(df("id").isin(ids:_*)).show

+---+
| id|
+---+
|  1|
|  2|
+---+

Or you could also write directly df.filter(df("id").isin(1,2))

Sign up to request clarification or add additional context in comments.

Comments

0

You can do it by something like ... df.filter($"id".isin( ids: _* )). For more information look at the documentation of isin() defined on the org.apache.spark.sql.Column class

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.