Dropping infinite values from dataframes in pandas?

Question

How do I drop nan, inf, and -inf values from a DataFrame without resetting mode.use_inf_as_null?

Can I tell dropna to include inf in its definition of missing values so that the following works?

df.dropna(subset=["col1", "col2"], how="all")

Mateen Ulhaq · Accepted Answer · 2022-06-20 01:43:58Z

716

First replace() infs with NaN:

df.replace([np.inf, -np.inf], np.nan, inplace=True)

and then drop NaNs via dropna():

df.dropna(subset=["col1", "col2"], how="all", inplace=True)

For example:

>>> df = pd.DataFrame({"col1": [1, np.inf, -np.inf], "col2": [2, 3, np.nan]})
>>> df
   col1  col2
0   1.0   2.0
1   inf   3.0
2  -inf   NaN

>>> df.replace([np.inf, -np.inf], np.nan, inplace=True)
>>> df
   col1  col2
0   1.0   2.0
1   NaN   3.0
2   NaN   NaN

>>> df.dropna(subset=["col1", "col2"], how="all", inplace=True)
>>> df
   col1  col2
0   1.0   2.0
1   NaN   3.0

The same method also works for Series.

edited Jun 20, 2022 at 1:43

Mateen Ulhaq

27.9k21 gold badges121 silver badges155 bronze badges

answered Jul 4, 2013 at 21:50

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

3kstc Over a year ago

How can one "exchange" the inf values to a predefined int such as 0, in a certain column?

Andy Hayden Over a year ago

@3kstc use .replace(..., 0). To just do on columns you update those columns i.e. df[cols] = df[cols].replace(..., 0)

Marco Over a year ago

Maybe it's worth to specify that replace does not work in-place, so a new DataFrame is returned

score 90 · Accepted Answer · 2024-06-05 17:03:55Z

90

DEPRECATED

With option context, this is possible without permanently setting use_inf_as_na. For example:

with pd.option_context('mode.use_inf_as_na', True):
    df = df.dropna(subset=['col1', 'col2'], how='all')

Of course it can be set to treat inf as NaN permanently with

pd.set_option('use_inf_as_na', True)

For older versions, replace use_inf_as_na with use_inf_as_null.

edited Jun 5, 2024 at 17:03

answered Aug 17, 2017 at 23:10

user2285236

4 Comments

ijoseph Over a year ago

This is the most readable answer and is consequently the best, even though it violates in letter (but not in spirit) the original question.

Håkon T. Over a year ago

Pandas as of (at least) 0.24: use_inf_as_null had been deprecated and will be removed in a future version. Use use_inf_as_na instead. Add to/update answer?

TaoPR Over a year ago

This one is a better choice to treat inf as nulls at the global setting levels instead of operational level. This could potentially saves time imputing the values first.

user2229219 Over a year ago

Note that this solution now throws a FutureWarning:

FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.

Markus Dutschke · Accepted Answer · 2019-03-18 18:41:47Z

33

Use (fast and simple):

df = df[np.isfinite(df).all(1)]

This answer is based on DougR's answer in an other question. Here an example code:

import pandas as pd
import numpy as np
df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6])
print('Input:\n',df,sep='')
df = df[np.isfinite(df).all(1)]
print('\nDropped:\n',df,sep='')

Result:

Input:
    0
0  1.0000
1  2.0000
2  3.0000
3     NaN
4  4.0000
5     inf
6  5.0000
7    -inf
8  6.0000

Dropped:
     0
0  1.0
1  2.0
2  3.0
4  4.0
6  5.0
8  6.0

answered Mar 18, 2019 at 18:41

Markus Dutschke

10.8k5 gold badges73 silver badges67 bronze badges

2 Comments

user13116294 Over a year ago

I am getting this error - TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Markus Dutschke Over a year ago

not with my code, I guess!? Probably you try to process a column the unsupported types like strings

Alexander · Accepted Answer · 2016-03-04 00:20:30Z

18

Here is another method using .loc to replace inf with nan on a Series:

s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan

So, in response to the original question:

df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))

for i in range(3): 
    df.iat[i, i] = np.inf

df
          A         B         C
0       inf  1.000000  1.000000
1  1.000000       inf  1.000000
2  1.000000  1.000000       inf

df.sum()
A    inf
B    inf
C    inf
dtype: float64

df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A    2
B    2
C    2
dtype: float64

edited Mar 4, 2016 at 0:20

answered Mar 3, 2016 at 21:52

Alexander

111k32 gold badges212 silver badges208 bronze badges

Comments

has2k1 · Accepted Answer · 2019-02-12 18:00:58Z

9

The above solution will modify the infs that are not in the target columns. To remedy that,

lst = [np.inf, -np.inf]
to_replace = {v: lst for v in ['col1', 'col2']}
df.replace(to_replace, np.nan)

edited Feb 12, 2019 at 18:00

answered Aug 10, 2014 at 2:27

has2k1

2,42520 silver badges17 bronze badges

Comments

Ted Petrou · Accepted Answer · 2017-11-03 18:34:37Z

8

Yet another solution would be to use the isin method. Use it to determine whether each value is infinite or missing and then chain the all method to determine if all the values in the rows are infinite or missing.

Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.

all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]

answered Nov 3, 2017 at 18:34

Ted Petrou

62.4k19 gold badges139 silver badges139 bronze badges

Comments

Pulkit Bansal · Accepted Answer · 2021-07-20 16:10:12Z

5

To remove both Nan, and inf using a single command use

df = df[ np.isfinite( df ).all( axis = 1) ]

If for some reason the above doesn't work for you, please try the following 2 steps:

df = df[ ~( df.isnull().any( axis = 1 ) ) ] #to remove nan
df = df[ ~( df.isin( [np.inf, -np.inf]).any(axis =1) )] #to remove inf

answered Jul 20, 2021 at 16:10

Pulkit Bansal

2,0931 gold badge18 silver badges12 bronze badges

Comments

Ian Thompson · Accepted Answer · 2023-02-03 19:26:56Z

5

You can use pd.DataFrame.mask with np.isinf. You should ensure first your dataframe series are all of type float. Then use dropna with your existing logic.

print(df)

       col1      col2
0 -0.441406       inf
1 -0.321105      -inf
2 -0.412857  2.223047
3 -0.356610  2.513048

df = df.mask(np.isinf)

print(df)

       col1      col2
0 -0.441406       NaN
1 -0.321105       NaN
2 -0.412857  2.223047
3 -0.356610  2.513048

edited Feb 3, 2023 at 19:26

Ian Thompson

3,3252 gold badges22 silver badges36 bronze badges

answered Jun 28, 2018 at 15:42

jpp

166k37 gold badges301 silver badges363 bronze badges

Comments

Hari Krishnan · Accepted Answer · 2022-02-01 10:08:21Z

3

Unlike other answers here, this one line code worked for me.

import numpy as np
df= df[df['required_column_name']!= np.inf]

answered Feb 1, 2022 at 10:08

Hari Krishnan

311 bronze badge

Comments

Thomas Moreau · Accepted Answer · 2021-09-16 21:06:40Z

2

Just stumbled upon this one and I found a one line without replace or numpy:

df = pd.DataFrame(
    [[1, np.inf],
     [1, -np.inf],
     [1, 2]],
    columns=['a', 'b']
)
df.query("b not in [inf, -inf]")
>>> a  b
 2  1  2.0

For some version of pandas, one might need to use back ` around the name of the column b.

edited Sep 16, 2021 at 21:06

answered Sep 16, 2021 at 16:43

Thomas Moreau

4,4671 gold badge22 silver badges33 bronze badges

Collectives™ on Stack Overflow

Dropping infinite values from dataframes in pandas?

10 Answers 10

3 Comments

4 Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

3 Comments

4 Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related