Division algorithm for zeros in Pandas dataframe

Question

Say I have a dataframe:

And I want to divide: x = df.a / df.b

Obviously I'll get division by zero errors or inf as a result. But I want to use such an algorithm for division (pseudo code):

def CalcRatio(a, b):
    ratio = a / b
    if (isinf(ratio) or isnan(ratio)):
        ratio = (1 + a) / (1 + b)
    return ratio

How can I do this with pandas? Thanks.

EdChum · Accepted Answer · 2017-01-12 17:02:09Z

3

You can use np.isinf and np.isnan in your code to do what you want using apply row-wise:

In [207]:

def CalcRatio(a, b):
    ratio = a / b
    if (np.isinf(ratio) or np.isnan(ratio)):
        ratio = (1 + a) / (1 + b)
    return ratio

df.apply(lambda x: CalcRatio(x['a'],x['b']), axis=1)

Out[207]:
0   -5.0
1   -2.0
2    1.0
3    2.0
4    0.0
dtype: float64

A vectorised method would be to use np.where and pass the conditions in the True case to return the alternate result, otherwise to perform division as before:

In [208]:
np.where(np.isinf(df['a']/df['b']) | pd.isnull(df['a']/df['b']), (1 + df['a']) / (1 + df['b']), df['a']/df['b'])

Out[208]:
array([-5., -2.,  1.,  2.,  0.])

timings

For a 5K row df:

In [213]:
%timeit df.apply(lambda x: CalcRatio(x['a'],x['b']), axis=1)
%timeit np.where(np.isinf(df['a']/df['b']) | pd.isnull(df['a']/df['b']), (1 + df['a']) / (1 + df['b']), df['a']/df['b'])

1 loops, best of 3: 225 ms per loop
1000 loops, best of 3: 1.32 ms per loop

We can see here that the vectorised method scales much better than apply which is just iterating over each row, here ~170x faster, I expect the numpy method to scale much better for large datasets

new timings

In [218]:
%%timeit 
d1 = df.a / df.b
d2 = df.a.add(1) / df.b.add(1)    
d1.replace(np.inf, np.nan).fillna(d2)

1000 loops, best of 3: 1.06 ms per loop

In [219]:
%%timeit
d1 = df.add(df.b == 0, 0)
d1.a / d1.b

1000 loops, best of 3: 691 µs per loop

The above are @piRSquared's answers which are noticeably faster

edited Jan 12, 2017 at 17:02

answered Jan 12, 2017 at 16:38

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

shda Over a year ago

I don't want NaN or inf for rows 2 and 3. For row 2: 0 / 0 = NaN. So I want to get (1 + 0) / (1 + 0) = 1 For row 3: 1 / 0 = inf. So I want to get (1 + 1) / (1 + 0) = 2

EdChum Over a year ago

What do you want instead?

shda Over a year ago

Thank for the answer though. Sorry for my first incomplete comment. I accidently posted it.

EdChum Over a year ago

Normally it's useful to post the desired output, I took your code and modded to suit your needs, I'll see if I can figure out a vectorised way of doing this

shda Over a year ago

Thanks for timing measures

piRSquared · Accepted Answer · 2017-01-12 17:00:23Z

2

You can take this approach

d1 = df.a / df.b
d2 = df.a.add(1) / df.b.add(1)

d1.replace(np.inf, np.nan).fillna(d2)

0   -5.0
1   -2.0
2    1.0
3    2.0
4    0.0
dtype: float64

Another approach
df.b == 0 evaluates to True when b is zero (obviously). But when you add this column, it only adds 1 to rows where b is zero. Then you do the division.

d1 = df.add(df.b == 0, 0)
d1.a / d1.b

edited Jan 12, 2017 at 17:00

answered Jan 12, 2017 at 16:53

piRSquared

296k68 gold badges509 silver badges654 bronze badges

4 Comments

EdChum Over a year ago

Actually this is much faster than my method: 1000 loops, best of 3: 346 µs per loop +1 about 4X faster, this is timed just on the replace call though, if we time the whole thing: 1000 loops, best of 3: 1.06 ms per loop so marginally faster but faster still by 0.30ms

EdChum Over a year ago

1000 loops, best of 3: 691 µs per loop even faster for last one

piRSquared Over a year ago

Thanks @EdChum :-)

shda Over a year ago

Used the last approach. Thanks.

Collectives™ on Stack Overflow

Division algorithm for zeros in Pandas dataframe

2 Answers 2

5 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related