19

I have a large dataframe with inf, -inf values in different columns. I want to replace all inf, -inf values with NaN

I can do so column by column. So this works:

df['column name'] = df['column name'].replace(np.inf, np.nan)

But my code to do so in one go across the dataframe does not.

df.replace([np.inf, -np.inf], np.nan)

The output does not replace the inf values

2 Answers 2

25

TL;DR

  • df.replace is fastest for replacing ±inf
  • but you can avoid replacing altogether by just setting mode.use_inf_as_na (deprecated in v2.1.0)

Replacing inf and -inf

df = df.replace([np.inf, -np.inf], np.nan)

Just make sure to assign the results back. (Don't use the inplace approach, which is being deprecated in PDEP-8.)

There are other df.applymap options, but df.replace is fastest:

  • df = df.applymap(lambda x: np.nan if x in [np.inf, -np.inf] else x)
  • df = df.applymap(lambda x: np.nan if np.isinf(x) else x)
  • df = df.applymap(lambda x: x if np.isfinite(x) else np.nan)


Setting mode.use_inf_as_na (deprecated)

  • Deprecated in pandas 2.1.0
  • Will be removed in pandas 3.0

Note that we don't actually have to modify df at all. Setting mode.use_inf_as_na will simply change the way inf and -inf are interpreted:

True means treat None, nan, -inf, inf as null
False means None and nan are null, but inf, -inf are not null (default)

  • Either enable globally

    pd.set_option('mode.use_inf_as_na', True)
    
  • Or locally via context manager

    with pd.option_context('mode.use_inf_as_na', True):
        ...
    
Sign up to request clarification or add additional context in comments.

3 Comments

Use case: when I has set mode.use_inf_as_na I got error "ValueError: Input X contains infinity or a value too large for dtype('float64')." from MinMaxScaler. After it I was back to df.replace().
mode.use_inf_as_na changes only representation of np.inf and np.NINF. But under the hood it still stores them as ±inf. So, if you want to get rid of them, you need to use replace().
mode.use_inf_as_na is flagged as deprecated (see: github.com/pandas-dev/pandas/issues/34093 and github.com/pandas-dev/pandas/issues/51684). So it is better to not use it anymore.
6

pandas.Series.replace doesn't happen in-place.

So the problem with your code to replace the whole dataframe does not work because you need to assign it back or, add inplace=True as a parameter. That's also why your column by column works, because you are assigning it back to the column df['column name'] = ...

Therefore, change df.replace([np.inf, -np.inf], np.nan) to either:

df.replace([np.inf, -np.inf], np.nan,inplace=True)

Or assign back to a new dataframe:

df = df.replace([np.inf, -np.inf], np.nan)

enter image description here

6 Comments

Hmm...I am getting an TypeError: unhashable type: 'list' for both the choices that you gave.
Very strange, I am currently running it on my machine and it works. What pandas version are you using?
Version - Python 3.8.0
and of pandas? pd. __version__ ?
I believe it has something to do with your pandas version. I use 1.2.0. Maybe it's time to update it :). I posted a picture in my answer to illustrate.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.