Find integer index of rows with NaN in pandas dataframe

Question

I have a pandas DataFrame like this:

                    a         b
2011-01-01 00:00:00 1.883381  -0.416629
2011-01-01 01:00:00 0.149948  -1.782170
2011-01-01 02:00:00 -0.407604 0.314168
2011-01-01 03:00:00 1.452354  NaN
2011-01-01 04:00:00 -1.224869 -0.947457
2011-01-01 05:00:00 0.498326  0.070416
2011-01-01 06:00:00 0.401665  NaN
2011-01-01 07:00:00 -0.019766 0.533641
2011-01-01 08:00:00 -1.101303 -1.408561
2011-01-01 09:00:00 1.671795  -0.764629

Is there an efficient way to find the "integer" index of rows with NaNs? In this case the desired output should be [3, 6].

If you just want to select the rows with nan, you can do df[np.isnan(df['b'])] — Miki Tebeka
– Miki Tebeka, Commented Dec 24, 2012 at 3:38
Following up from @lazy1 - instead of using numpy's isnan you can also use df['b'].isnull() — jmetz
– jmetz, Commented Mar 31, 2015 at 20:42

Wes McKinney · Accepted Answer · 2012-12-25 18:41:23Z

160

Here is a simpler solution:

inds = pd.isnull(df).any(1).nonzero()[0]

In [9]: df
Out[9]: 
          0         1
0  0.450319  0.062595
1 -0.673058  0.156073
2 -0.871179 -0.118575
3  0.594188       NaN
4 -1.017903 -0.484744
5  0.860375  0.239265
6 -0.640070       NaN
7 -0.535802  1.632932
8  0.876523 -0.153634
9 -0.686914  0.131185

In [10]: pd.isnull(df).any(1).nonzero()[0]
Out[10]: array([3, 6])

answered Dec 25, 2012 at 18:41

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user1642513 Over a year ago

I ended up using this: np.where(df['b'].notnull())[0]

cs95 Over a year ago

You could probably simplify this further: r, _ = np.where(df.isna())

7bStan Over a year ago

add .to_numpy() to convert in numpy array first - pd.isnull(df).any(1).to_numpy().nonzero()

huang Over a year ago

AttributeError: 'Series' object has no attribute 'nonzero'

wueb Over a year ago

for pandas version 0.25 and on use pd.isnull(df).any(1).to_numpy().nonzero() as 7bStan mentioned. This will fix Joe Huang's problem.

|

diliop · Accepted Answer · 2012-12-24 03:02:12Z

55

For DataFrame df:

import numpy as np
index = df['b'].index[df['b'].apply(np.isnan)]

will give you back the MultiIndex that you can use to index back into df, e.g.:

df['a'].ix[index[0]]
>>> 1.452354

For the integer index:

df_index = df.index.values.tolist()
[df_index.index(i) for i in index]
>>> [3, 6]

answered Dec 24, 2012 at 3:02

diliop

9,6115 gold badges30 silver badges24 bronze badges

1 Comment

cardamom Over a year ago

As intuitive as ix sounds, for some reasons it sounds like it has been deprecated in favour of iloc

Vasyl Vaskivskyi · Accepted Answer · 2019-01-15 11:23:19Z

30

One line solution. However it works for one column only.

df.loc[pandas.isna(df["b"]), :].index

answered Jan 15, 2019 at 11:23

Vasyl Vaskivskyi

97712 silver badges15 bronze badges

1 Comment

Daniel Butler Over a year ago

This is what I was looking for. I made it into a list by wrapping it in a list(...) just like this:list(df.loc[pandas.isna(df["b"]), :].index)

Filippo Mazza · Accepted Answer · 2017-09-07 14:49:25Z

12

And just in case, if you want to find the coordinates of 'nan' for all the columns instead (supposing they are all numericals), here you go:

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

df
   0  1  2    3    4  5
0  0  1  3  4.0  NaN  2
1  3  5  6  NaN  3.0  3

np.where(np.asanyarray(np.isnan(df)))
(array([0, 1]), array([4, 3]))

answered Sep 7, 2017 at 14:49

Filippo Mazza

4,3774 gold badges25 silver badges25 bronze badges

Comments

Gursewak Singh · Accepted Answer · 2019-05-03 20:43:17Z

12

Don't know if this is too late but you can use np.where to find the indices of non values as such:

indices = list(np.where(df['b'].isna()[0]))

edited May 3, 2019 at 20:43

Gursewak Singh

1721 silver badge6 bronze badges

answered Sep 11, 2018 at 13:07

naturesenshi

3406 silver badges15 bronze badges

Comments

Amirkhm · Accepted Answer · 2019-05-03 21:34:42Z

6

in the case you have datetime index and you want to have the values:

df.loc[pd.isnull(df).any(1), :].index.values

answered May 3, 2019 at 21:34

Amirkhm

1,13615 silver badges13 bronze badges

Comments

Adam Erickson · Accepted Answer · 2019-08-28 18:02:08Z

5

Here are tests for a few methods:

%timeit np.where(np.isnan(df['b']))[0]
%timeit pd.isnull(df['b']).nonzero()[0]
%timeit np.where(df['b'].isna())[0]
%timeit df.loc[pd.isna(df['b']), :].index

And their corresponding timings:

333 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
280 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
313 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.84 ms ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It would appear that pd.isnull(df['DRGWeight']).nonzero()[0] wins the day in terms of timing, but that any of the top three methods have comparable performance.

answered Aug 28, 2019 at 18:02

Adam Erickson

6,4732 gold badges48 silver badges37 bronze badges

Comments

karthikeyan · Accepted Answer · 2019-12-18 15:03:36Z

3

Another simple solution is list(np.where(df['b'].isnull())[0])

answered Dec 18, 2019 at 15:03

karthikeyan

2453 silver badges13 bronze badges

Comments

Xpie · Accepted Answer · 2021-06-16 03:12:47Z

3

This will give you the index values for nan in every column:

df.loc[pd.isna(df).any(1), :].index

answered Jun 16, 2021 at 3:12

Xpie

436 bronze badges

1 Comment

Nixon Kosgei Over a year ago

This creates a new data frame with all rows containing Nan values, the returns its index

Ryan Schaefer · Accepted Answer · 2018-05-10 04:22:01Z

1

Here is another simpler take:

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

inds = np.asarray(df.isnull()).nonzero()

(array([0, 1], dtype=int64), array([4, 3], dtype=int64))

edited May 10, 2018 at 4:22

Ryan Schaefer

3,1101 gold badge29 silver badges48 bronze badges

answered May 3, 2018 at 17:14

baerwme

117 bronze badges

Comments

murthy10 · Accepted Answer · 2018-10-04 16:07:33Z

1

I was looking for all indexes of rows with NaN values.
My working solution:

def get_nan_indexes(data_frame):
    indexes = []
    print(data_frame)
    for column in data_frame:
        index = data_frame[column].index[data_frame[column].apply(np.isnan)]
        if len(index):
            indexes.append(index[0])
    df_index = data_frame.index.values.tolist()
    return [df_index.index(i) for i in set(indexes)]

edited Oct 4, 2018 at 16:07

answered Oct 4, 2018 at 15:20

murthy10

1564 silver badges12 bronze badges

Comments

nassim · Accepted Answer · 2019-05-20 14:40:24Z

0

Let the dataframe be named df and the column of interest(i.e. the column in which we are trying to find nulls) is 'b'. Then the following snippet gives the desired index of null in the dataframe:

   for i in range(df.shape[0]):
       if df['b'].isnull().iloc[i]:
           print(i)

edited May 20, 2019 at 14:40

nassim

1,5811 gold badge16 silver badges28 bronze badges

answered May 20, 2019 at 11:33

Stone Austin

1

Comments

KRUNALg · Accepted Answer · 2021-12-02 11:27:12Z

0

    index_nan = []
        for index, bool_v in df["b"].iteritems().isna():
           if bool_v == True:
               index_nan.append(index)
    print(index_nan)

answered Dec 2, 2021 at 11:27

KRUNALg

1

Comments

Mainland · Accepted Answer · 2023-04-10 05:30:25Z

0

The quick and fast solution to the question is:

# Find the integer index of nulls
nan_idx = np.where(df['column_name'].isnull())[0]

# Find actual index of the nan's
nan_idx = df.iloc[nan_idx].index

answered Apr 10, 2023 at 5:30

Mainland

4,7025 gold badges39 silver badges87 bronze badges

Comments

Marcio Bernardo · Accepted Answer · 2023-04-12 14:42:59Z

0

Easy solution:

# Find the index of nulls

indx = df[df.isnull()].index

# Find the index of nulls of a column or a group of columns

indx_A = df[df['A'].isnull()].index 

col_list = ['A','B','C']

indx_col_list = df[df[col_list].isnull()].index

answered Apr 12, 2023 at 14:42

Marcio Bernardo

738 bronze badges

Comments

Archie · Accepted Answer · 2023-06-20 09:52:43Z

0

A DataFrame object has a built in function isna() these days, which means you could also solve it as follows:

In case one NaN value is sufficient to return the index:

index_na = df.index[df.isna().any(1)]

In case all of them have to be NaN:

index_na = df.index[df.isna().all(1)]

To return the numeric index for the first case:

index_na_num = np.where(df.isna().any(1)[0])

edited Jun 20, 2023 at 9:52

answered Jun 20, 2023 at 9:46

Archie

2,4252 gold badges23 silver badges41 bronze badges

Collectives™ on Stack Overflow

Find integer index of rows with NaN in pandas dataframe

16 Answers 16

7 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

7 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related