handling zeros in pandas DataFrames column divisions in Python

Question

What's the best way to handle zero denominators when dividing pandas DataFrame columns by each other in Python? for example:

df = pandas.DataFrame({"a": [1, 2, 0, 1, 5], "b": [0, 10, 20, 30, 50]})
df.a / df.b  # yields error

I'd like the ratios where the denominator is zero to be registered as NA (numpy.nan). How can this be done efficiently in pandas?

Casting to float64 does not work at level of columns:

In [29]: df
Out[29]: 
   a   b
0  1   0
1  2  10
2  0  20
3  1  30
4  5  50

In [30]: df["a"].astype("float64") / df["b"].astype("float64")
...

FloatingPointError: divide by zero encountered in divide

How can I do it just for particular columns and not entire df?

If you just use floats, you'll get +/-inf

askewchan
– askewchan

2013-04-26 20:06:32 +00:00
Commented Apr 26, 2013 at 20:06 — askewchan
– askewchan, Commented Apr 26, 2013 at 20:06
@askewchan: it does not work for me - see edit

user248237
– user248237

2013-04-26 20:32:37 +00:00
Commented Apr 26, 2013 at 20:32 — user248237
– user248237, Commented Apr 26, 2013 at 20:32

Jeff · Accepted Answer · 2013-04-26 20:05:19Z

48

You need to work in floats, otherwise you will have integer division, prob not what you want

In [12]: df = pandas.DataFrame({"a": [1, 2, 0, 1, 5], 
                                "b": [0, 10, 20, 30, 50]}).astype('float64')

In [13]: df
Out[13]: 
   a   b
0  1   0
1  2  10
2  0  20
3  1  30
4  5  50

In [14]: df.dtypes
Out[14]: 
a    float64
b    float64
dtype: object

Here's one way

In [15]: x = df.a/df.b

In [16]: x
Out[16]: 
0         inf
1    0.200000
2    0.000000
3    0.033333
4    0.100000
dtype: float64

In [17]: x[np.isinf(x)] = np.nan

In [18]: x
Out[18]: 
0         NaN
1    0.200000
2    0.000000
3    0.033333
4    0.100000
dtype: float64

Here's another way

In [20]: df.a/df.b.replace({ 0 : np.nan })
Out[20]: 
0         NaN
1    0.200000
2    0.000000
3    0.033333
4    0.100000
dtype: float64

answered Apr 26, 2013 at 20:05

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user248237 Over a year ago

Can I cast just a column of the df into float64 and not the whole dataframe? See my edit. I still get the error with float64 dtype

Jeff Over a year ago

yes in 0.11; 0.10.1 or earlier should work, what version are u using?

Jeff Over a year ago

also are you doing a np.seterr() anywhere in your codebase?

Amelio Vazquez-Reina Over a year ago

To clarify, couldn't one handle this now with pd.DataFrame.div?

Jeff Over a year ago

No, that just broadcasts. The issue here is how to handle 0/0 (and turn it into a nan, as by default it will be inf)

Kyr · Accepted Answer · 2020-07-09 10:36:08Z

Just for completeness, I would like to add the following way of division that uses DataFrame.apply like:

df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)

In full:

In [1]:
df = pd.DataFrame({"a": [1, 2, 0, 1, 5, 0], "b": [0, 10, 20, 30, 50, 0]}).astype('float64')

def div(numerator, denominator):
  return lambda row: 0.0 if row[denominator] == 0 else float(row[numerator]/row[denominator])

df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)

Out[1]:
      a     b         c
0   1.0   0.0  0.000000
1   2.0  10.0  0.200000
2   0.0  20.0  0.000000
3   1.0  30.0  0.033333
4   5.0  50.0  0.100000
5   0.0   0.0  0.000000

This solution is slower than the one proposed by Jeff:

df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)
# 1.27 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

df.loc[:, 'c'] = df.a/df.b.replace({ 0 : np.inf })
# 651 µs ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

kenorb · Accepted Answer · 2020-10-18 18:31:05Z

1

Normally when dividing by zero in Panda the value is set to infinite (np.inf). To avoid infinite values, use divide and replace, e.g.

df['one'].div(df['two']).replace(np.inf, 0)

See:

edited Oct 18, 2020 at 18:31

answered Oct 18, 2020 at 13:56

kenorb

169k95 gold badges712 silver badges796 bronze badges

Comments

constantstranger · Accepted Answer · 2022-07-12 15:15:14Z

1

To spell out the different behavior of pandas for columns of different dtypes, note that pandas supports division by zero for columns with numeric dtype (such as float and int64) by returning a result of inf, but for columns of object type, it raises a ZeroDivisionError exception.

See my answer to a related question for examples.

answered Jul 12, 2022 at 15:15

constantstranger

9,4072 gold badges9 silver badges20 bronze badges

Collectives™ on Stack Overflow

handling zeros in pandas DataFrames column divisions in Python

4 Answers 4

5 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related