taking the first non null in python

Question

I'm trying to get the first non null value from multiple pandas series in a dataframe.

df = pd.DataFrame({'a':[2, np.nan, np.nan, np.nan],
              'b':[np.nan, 5, np.nan, np.nan],
              'c':[np.nan, 55, 13, 14],
              'd':[np.nan, np.nan, np.nan, 4],
              'e':[12, np.nan, np.nan, 22],
          })

     a    b     c    d     e
0  2.0  NaN   NaN  NaN  12.0
1  NaN  5.0  55.0  NaN   NaN
2  NaN  NaN  13.0  NaN   NaN
3  NaN  NaN  14.0  4.0  22.0

in this df I want to create a new column 'f', and set it equal to 'a' if a is not null, 'b' if b is not null etc. down to e.

I could do a bunch of np.where statements which is inefficient.

df['f'] = np.where(df.a.notnull(), df.a,
              np.where(df.b.notnull(), df.b,
                   etc.))

I looked into doing df.a or df.b or df.c etc.

result should look like:

     a    b     c    d     e   f
0  2.0  NaN   NaN  NaN  12.0   2
1  NaN  5.0  55.0  NaN   NaN   5
2  NaN  NaN  13.0  NaN   NaN  13
3  NaN  NaN  14.0  4.0  22.0  14

IanS · Accepted Answer · 2018-08-30 08:42:01Z

9

One solution

df.groupby(['f']*df.shape[1], axis=1).first()
Out[385]: 
      f
0   2.0
1   5.0
2  13.0
3  14.0

The orther

df.bfill(1)['a']
Out[388]: 
0     2.0
1     5.0
2    13.0
3    14.0
Name: a, dtype: float64

edited Aug 30, 2018 at 8:42

IanS

16.3k9 gold badges64 silver badges87 bronze badges

answered Aug 29, 2018 at 15:12

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

IanS Over a year ago

Nice, but no need for a numpy array: df.groupby(['f']*df.shape[1], axis=1).first(); I also name axis=1 because I don't know the order of the arguments of groupby by heart :)

Tejeshwar Gurram Over a year ago

df.bfill(1)['a'] seems to be the most efficient though!!

Zero · Accepted Answer · 2018-08-29 15:50:50Z

4

You could also use first_valid_index

In [336]: df.apply(lambda x: x.loc[x.first_valid_index()], axis=1)
Out[336]:
0     2.0
1     5.0
2    13.0
3    14.0
dtype: float64

Or, stack and groupby

In [359]: df.stack().groupby(level=0).first()
Out[359]:
0     2.0
1     5.0
2    13.0
3    14.0
dtype: float64

Or, first_valid_index with lookup

In [355]: df.lookup(df.index, df.apply(pd.Series.first_valid_index, axis=1))
Out[355]: array([ 2.,  5., 13., 14.])

edited Aug 29, 2018 at 15:50

answered Aug 29, 2018 at 15:43

Zero

77.4k22 gold badges154 silver badges154 bronze badges

1 Comment

Niko Fohr Over a year ago

Note that df.first_valid_index() is to be used with df.loc (and not df.iloc).

user3483203 · Accepted Answer · 2018-08-29 15:53:39Z

1

You can also use numpy for this:

first_valid = (~np.isnan(df.values)).argmax(1)

Then use indexing:

df.assign(valid=df.values[range(len(first_valid)), first_valid])

     a    b     c    d     e  valid
0  2.0  NaN   NaN  NaN  12.0    2.0
1  NaN  5.0  55.0  NaN   NaN    5.0
2  NaN  NaN  13.0  NaN   NaN   13.0
3  NaN  NaN  14.0  4.0  22.0   14.0

answered Aug 29, 2018 at 15:53

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Collectives™ on Stack Overflow

taking the first non null in python

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related