I am trying to join 2 tables with user events. I want to join table_a with table_b by user_id (id) and when the difference timestamps smaller than 5s (5000ms).
Here is what I am doing:
table_a = (
table_a
.join(
table_b,
table_a.uid == table_b.uid
& abs(table_b.b_timestamp - table_a.a_timestamp) < 5000
& table_a.a_timestamp.isNotNull()
,
how = 'left'
)
)
I am getting 2 errors:
Error 1)
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Error 2 when if I remove the 2nd condition on the join and leave only the 1st and 3rd:
org.apache.spark.sql.AnalysisException: cannot resolve '(uidAND (a_timestampIS NOT NULL))' due to data type mismatch: differing types in '(uidAND (a_timestampIS NOT NULL))' (string and boolean).;;
Any help is much appreciated!