0

I am trying to join 2 tables with user events. I want to join table_a with table_b by user_id (id) and when the difference timestamps smaller than 5s (5000ms).

Here is what I am doing:

table_a = (
  table_a
  .join(
  table_b,
    table_a.uid == table_b.uid 
     & abs(table_b.b_timestamp - table_a.a_timestamp) < 5000 
     & table_a.a_timestamp.isNotNull()
  ,
  how = 'left'
  )
) 

I am getting 2 errors:

Error 1) ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

Error 2 when if I remove the 2nd condition on the join and leave only the 1st and 3rd: org.apache.spark.sql.AnalysisException: cannot resolve &#39;(uidAND (a_timestampIS NOT NULL))&#39; due to data type mismatch: differing types in &#39;(uidAND (a_timestampIS NOT NULL))&#39; (string and boolean).;;

Any help is much appreciated!

1
  • 1
    Your conditions needs parantheses (Ex. ((condtion1) & (condition2) & ..). It is a common issue with pyspark. For the second error you should consider first filling NA with 0 values and on the second condition parsing your two timestamp values as double, decimal, integer or anything same. Then you can just use not equal to zero for your last condition. Commented Apr 13, 2020 at 20:00

1 Answer 1

1

You just need parentheses around each filtering condition. For example, the following works:

df1 = spark.createDataFrame([
    (1, 20),
    (1, 21),
    (1, 25),
    (1, 30),
    (2, 21),
], ['id', 'val'])

df2 = spark.createDataFrame([
    (1, 21),
    (2, 30),
], ['id', 'val'])

df1.join(
    df2, 
    (df1.id == df2.id) 
    & (abs(df1.val - df2.val) < 5)
).show()
# +---+---+---+---+
# | id|val| id|val|
# +---+---+---+---+
# |  1| 20|  1| 21|
# |  1| 21|  1| 21|
# |  1| 25|  1| 21|
# +---+---+---+---+

But without parens:

df1.join(
    df2, 
    df1.id == df2.id
    & abs(df1.val - df2.val) < 5
).show()
# ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Sign up to request clarification or add additional context in comments.

1 Comment

Nailed it! Was bound to lose a few more hours before I found this one. Many thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.