Select Dataframe Columns based on Number of Nulls in Each

Question

I have seen similar questions but what I am facing is slightly different. I am trying to select a subset of the columns in my dataframe, based on whether the columns have less than 300 nulls.

df[df.columns[df.isnull().any()]].isnull().sum()<300

I have succeeded at creating this boolean array, but how would I pass this info back to select only df columns where this is True?

You can only accept a single answer... just fyi

cs95
– cs95

2018-06-04 22:43:27 +00:00
Commented Jun 4, 2018 at 22:43 — cs95
– cs95, Commented Jun 4, 2018 at 22:43

BENY · Accepted Answer · 2018-06-04 02:56:06Z

4

Let us using thresh from the doc Require that many non-NA values.

df.dropna(axis = 1,thresh = len(df)-300)

answered Jun 4, 2018 at 2:56

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cs95 · Accepted Answer · 2018-06-04 02:53:44Z

1

The any is redundant, you can do this with just isnull/isna and sum:

v = df.isna().sum().lt(300)
df[v.index[v]]

Or,

df.loc[:, df.isna().sum().lt(300)]

answered Jun 4, 2018 at 2:53

cs95

406k106 gold badges745 silver badges798 bronze badges

Collectives™ on Stack Overflow

Select Dataframe Columns based on Number of Nulls in Each

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related