0

I have seen similar questions but what I am facing is slightly different. I am trying to select a subset of the columns in my dataframe, based on whether the columns have less than 300 nulls.

df[df.columns[df.isnull().any()]].isnull().sum()<300

I have succeeded at creating this boolean array, but how would I pass this info back to select only df columns where this is True?

1
  • You can only accept a single answer... just fyi Commented Jun 4, 2018 at 22:43

2 Answers 2

4

Let us using thresh from the doc Require that many non-NA values.

df.dropna(axis = 1,thresh = len(df)-300)
Sign up to request clarification or add additional context in comments.

Comments

1

The any is redundant, you can do this with just isnull/isna and sum:

v = df.isna().sum().lt(300)
df[v.index[v]]

Or,

df.loc[:, df.isna().sum().lt(300)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.