1

I have a dataframe which consists of two columns

+--------------+------------+
|             A|           B|
+--------------+------------+
|       [b,  c]|   [a, b, c]|
|           [a]|      [c, d]|
|       [a,  c]|   [b, c, e]|
|       [b,  c]|      [a, b]|
|           [a]|   [a, d, e]|
|       [a,  c]|         [b]|
+--------------+------------+

Schema:

 |-- A: string (nullable = true)
 |-- B: array (nullable = true)
 |    |-- element: string (containsNull = true)

I want to add a new column which must be O if the intersection of A and B is empty list ([]) and 1 otherwise. I tried the code below but it seem incorrect at all

df.withColumn('Check', when (list((set(col('A'))&set(col('B')))) !=[] , 0).otherwise(1)).show()

Thank you for your help

1 Answer 1

3

I want to add a new column which must be O if the intersection of A and B is empty list ([]) and 1 otherwise.

You can directly use array_intersect with size and when+otherwise

import pyspark.sql.functions as F
df.withColumn("Check",(F.size(F.array_intersect("A","B"))!=0).cast("Integer")).show()

or:

df.withColumn("Check",F.when(F.size(F.array_intersect("A","B"))==0,0).otherwise(1)).show()

+------+---------+-----+
|     A|        B|Check|
+------+---------+-----+
|[b, c]|[a, b, c]|    1|
|   [a]|   [c, d]|    0|
|[a, c]|[b, c, e]|    1|
|[b, c]|   [a, b]|    1|
|   [a]|[a, d, e]|    1|
|[a, c]|      [b]|    0|
+------+---------+-----+
Sign up to request clarification or add additional context in comments.

4 Comments

Thank your for your answer. However it shows me this message array: cannot resolve 'array_intersect(values, values2)' due to data type mismatch
are those arrays in the column or string? can you please print schema and post in the body of the question?
Yes it's my fault because the column A is String. I must convert it to list. Thank you
@Mus no worries something like this might help you or if you are in control of the input and change the type before consuming thats great then :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.