2

I have a pyspark dataframe which looks like below

df

num11   num21 
10     10
20     30 
5      25

I am filtering above dataframe on all columns present, and selecting rows with number greater than 10 [no of columns can be more than two]

from pyspark.sql.functions import col
col_list = df.schema.names
df_fltered = df.where(col(c) >= 10 for c in col_list)

desired output is :

num11    num21
10       10
20       30

How can we achieve filtering on multiple columns using iteration on column list as above. [all efforts are appriciated]

[error i reveive is : condition should be string or column]

2 Answers 2

2

As an alternative, if you not averse to some sql-like snippets of code, the following should work:

df.where("AND".join(["(%s >=10)"%(col) for col in col_list]))
Sign up to request clarification or add additional context in comments.

Comments

1

You can use functools.reduce to combine the column conditions, to simulate an all condition, for instance, you can use reduce(lambda x, y: x & y, ...):

import pyspark.sql.functions as F
from functools import reduce

df.where(reduce(lambda x, y: x & y,  (F.col(x) >= 10 for x in df.columns))).show()
+-----+-----+
|num11|num21|
+-----+-----+
|   10|   10|
|   20|   30|
+-----+-----+

2 Comments

Thanks for the reply. I tried above logic as df_pass = df3.where(reduce(lambda x, y: x | y, (col(x).cast('integer') == 2 | col('status1').cast('string') == 'present in source only' for x in columns1))) but i am receiving error when i am passing 2 columns Error received is "Py4JException: Method or([class java.lang.Integer]) does not exist"
You need to wrap the comparison in parenthesis. df3.where(reduce(lambda x, y: x | y, ((col(x).cast('integer') == 2) | (col('status1').cast('string') == 'present in source only') for x in columns1))) because of the higher operator precedence of | compared to ==.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.