2

Say I have a dataframe:

 a   b
-5   1
 4  -2
 0   0
 1   0
 0   3

And I want to divide: x = df.a / df.b

Obviously I'll get division by zero errors or inf as a result. But I want to use such an algorithm for division (pseudo code):

def CalcRatio(a, b):
    ratio = a / b
    if (isinf(ratio) or isnan(ratio)):
        ratio = (1 + a) / (1 + b)
    return ratio

How can I do this with pandas? Thanks.

2 Answers 2

3

You can use np.isinf and np.isnan in your code to do what you want using apply row-wise:

In [207]:

def CalcRatio(a, b):
    ratio = a / b
    if (np.isinf(ratio) or np.isnan(ratio)):
        ratio = (1 + a) / (1 + b)
    return ratio
​
df.apply(lambda x: CalcRatio(x['a'],x['b']), axis=1)

Out[207]:
0   -5.0
1   -2.0
2    1.0
3    2.0
4    0.0
dtype: float64

A vectorised method would be to use np.where and pass the conditions in the True case to return the alternate result, otherwise to perform division as before:

In [208]:
np.where(np.isinf(df['a']/df['b']) | pd.isnull(df['a']/df['b']), (1 + df['a']) / (1 + df['b']), df['a']/df['b'])

Out[208]:
array([-5., -2.,  1.,  2.,  0.])

timings

For a 5K row df:

In [213]:
%timeit df.apply(lambda x: CalcRatio(x['a'],x['b']), axis=1)
%timeit np.where(np.isinf(df['a']/df['b']) | pd.isnull(df['a']/df['b']), (1 + df['a']) / (1 + df['b']), df['a']/df['b'])

1 loops, best of 3: 225 ms per loop
1000 loops, best of 3: 1.32 ms per loop

We can see here that the vectorised method scales much better than apply which is just iterating over each row, here ~170x faster, I expect the numpy method to scale much better for large datasets

new timings

In [218]:
%%timeit 
d1 = df.a / df.b
d2 = df.a.add(1) / df.b.add(1)    ​
d1.replace(np.inf, np.nan).fillna(d2)

1000 loops, best of 3: 1.06 ms per loop

In [219]:
%%timeit
d1 = df.add(df.b == 0, 0)
d1.a / d1.b

1000 loops, best of 3: 691 µs per loop

The above are @piRSquared's answers which are noticeably faster

Sign up to request clarification or add additional context in comments.

5 Comments

I don't want NaN or inf for rows 2 and 3. For row 2: 0 / 0 = NaN. So I want to get (1 + 0) / (1 + 0) = 1 For row 3: 1 / 0 = inf. So I want to get (1 + 1) / (1 + 0) = 2
What do you want instead?
Thank for the answer though. Sorry for my first incomplete comment. I accidently posted it.
Normally it's useful to post the desired output, I took your code and modded to suit your needs, I'll see if I can figure out a vectorised way of doing this
Thanks for timing measures
2

You can take this approach

d1 = df.a / df.b
d2 = df.a.add(1) / df.b.add(1)

d1.replace(np.inf, np.nan).fillna(d2)

0   -5.0
1   -2.0
2    1.0
3    2.0
4    0.0
dtype: float64

Another approach
df.b == 0 evaluates to True when b is zero (obviously). But when you add this column, it only adds 1 to rows where b is zero. Then you do the division.

d1 = df.add(df.b == 0, 0)
d1.a / d1.b

4 Comments

Actually this is much faster than my method: 1000 loops, best of 3: 346 µs per loop +1 about 4X faster, this is timed just on the replace call though, if we time the whole thing: 1000 loops, best of 3: 1.06 ms per loop so marginally faster but faster still by 0.30ms
1000 loops, best of 3: 691 µs per loop even faster for last one
Thanks @EdChum :-)
Used the last approach. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.