2

I want to filter a schemaRDD using language integrated SQL based on SQL functions. For example I want to run

SELECT name FROM people WHERE name LIKE '%AHSAN%' AND name regexp '^[A-Z]{20}$'

How can I use such SQL functions in people.where()?

Reference:

For language integrated SQL, I am following the example given here.

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val people: RDD[Person] = ... // An RDD of case class objects, from the first example.
// The following is the same as 'SELECT name FROM people WHERE age >= 10 AND age <= 19'
val teenagers = people.where('age >= 10).where('age <= 19).select('name)
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)

Thanks in advance!

1 Answer 1

0

You can use SQL functions like numeric operators. E.g.,

people.where('name like "%AHSAN%").where('name rlike "^[A-Z]{20}$").select('name)

There is no regexp in Spark SQL but it's same as rlike.

Sign up to request clarification or add additional context in comments.

2 Comments

can we have two where conditions ?
Yep. Multiple where conditions are okay.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.