I am trying to create a new column in a DataFrame which will be 'true' if the value of another column is in a column of another DataFrame. I have tried the following, but the syntax for isin() is wrong I believe because I am passing a DataFrame with a single column.
customers:
customer_id name
1 John
2 Mary
3 Jane
4 Jack
5 Emma
customer_referred_customer:
from to
1 3
2 4
Result:
customer_id name is_referral
1 John false
2 Mary false
3 Jane true
4 Jack true
5 Emma false
Attempt:
customers.withColumn(
"is_referral",
F.when(
F.col("customer_id").isin(
customer_referred_customer.select("to")
),
F.lit("true"),
).otherwise(F.lit("false")),
)
How can I fix this?