How do you filter rows in a pandas dataframe conditional on columns existing?

Question

I have a dataframe containing counts of two things, which I've put in columns numA and numB. I want to find the rows where numA < x and numB < y, which can be done like so:

filtered_df = df[(df.numA < x) & (df.numB < y)]

This works when both numA and numB are present. However neither column is guaranteed to appear in the dataframe. If only one column exists, I would still like to filter the rows based on it. This could be easily coded with something along the lines of

if "numA" in df.columns:
    filtered_df = df[df.numA < x]
if "numB" in df.columns:
    filtered_df = filtered_df[filtered_df.numB < y]

But this seems very inefficient, especially since in reality I have 9 columns like this, and each of these requires the same check. Is there a way to achieve the same thing but with code that is more readable, easier to maintain and less tedious to write out?

You could fill in the missing entries with a default value (that is higher than your check value and therefore will evaluate to True) - pandas.DataFrame.fillna. Use a value that wouldn't occur and/or use a copy of the data so the the fact there are missing entries doesn't get permanently wiped. — Alan
– Alan, Commented Feb 4, 2021 at 18:17
Theoretically you could chain operators together with isnull(). Not at my desk at the moment to test in full, hence a comment not an answer. e.g. (df.numA < x) | (df.numA.isnull()) — Alan
– Alan, Commented Feb 4, 2021 at 18:20
@Alan there are no null values, the entire columns are potentially absent in the dataframe. For some context: I begin with a large dataframe containing more columns, and then delete some of the columns, potentially removing numA and numB along the way depending on certain conditions. So if numA isn't present then df.numA.isnull() returns an error: 'DataFrame' object has no attribute 'numA'. — Alira
– Alira, Commented Feb 4, 2021 at 19:24

noah · Accepted Answer · 2021-02-04 18:11:13Z

2

If you want an all-or-nothing type comparison I think a fairly easy way is to use set comparisons:

if(set(list_of_cols_to_check).issubset(df.columns)):
    filtered_df = df[(df.numA < x) & ... & (df.numB < y)]

If you want to perform comparisons for all that do exist it gets a bit more complicated. It is not very different than what you have, but I'd probably do it as follows:

filter = (df.index >= 0) #always true
filter = filter & (df.numA < 4)  if 'numA' in df else filter
filter = filter & (df.numB < 2)  if 'numB' in df else filter
filter = filter & (df.numC < 1)  if 'numC' in df else filter
df[filter]

answered Feb 4, 2021 at 18:11

noah

2,79615 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

engineer-x · Accepted Answer · 2023-09-06 02:13:25Z

0

You can use a simpler solution if you're not sure if the columns will be there when you filter for them:

df.loc[:,df.column.isin(["numA", "numB", "numC"])]

isin() will return an array of booleans if the column exists
.loc[] takes index and column as arguments
the : means all indices
the second argument is the array of columns as booleans

answered Sep 6, 2023 at 2:13

engineer-x

3,4153 gold badges22 silver badges52 bronze badges

Collectives™ on Stack Overflow

How do you filter rows in a pandas dataframe conditional on columns existing?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related