2

I am trying to join two dataframe with condition like "Wo" in "Hello World" i.e (dataframe1 col contains dataframe2 col1 value).

In HQL, we can use instr(t1.col1,t2.col1)>0

How can I achieve this same condtition in Dataframe in Scala ? I tried

df1.join(df2,df1("col1").indexOfSlice(df2("col1")) > 0)

But it throwing me the below error

error: value indexOfSlice is not a member of org.apache.spark.sql.Column

I just want to achive the below hql query using DataFrames.

select t1.*,t2.col1 from t1,t2 where instr(t1.col1,t2.col1)>0
2
  • First, this is a spark specific question and it should be tagged with the apache-spark. Secondly, It's not very clear what it is you are doing. Are you actually trying to perform a cross join with a condition ? Also wouldn't you care presenting a MVCE so we can try to help ? The error is obvious thought Commented Nov 2, 2017 at 13:39
  • @eliasah Yes, I am trying to do cross join. I will add sql example in to the question... Commented Nov 2, 2017 at 13:41

1 Answer 1

2

The following solution is tested with spark 2.2. You'll be needing to define a UDF and you can specify a join condition as part of where filter :

val indexOfSlice_ = (c1: String, c2: String) => c1.indexOfSlice(c2)
val islice = udf(indexOfSlice_)

val df10: DataFrame = Seq(("Hello World", 2), ("Foo", 3)).toDF("c1", "c2")
val df20: DataFrame = Seq(("Wo", 2), ("Bar", 3)).toDF("c3", "c4")

df10.crossJoin(df20).where(islice(df10.col("c1"), df20.col("c3")) > 0).show
// +-----------+---+---+---+
// |         c1| c2| c3| c4|
// +-----------+---+---+---+
// |Hello World|  2| Wo|  2|
// +-----------+---+---+---+

PS: Beware ! Using a cross-join is an expensive operation as it yields a cartesian join.

EDIT: Consider reading this when you want to use this solution.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.