Division in pandas: multiple columns by another column of the same DataFrame

Question

There are several questions around this topic on SO, but none seem to raise the issue that I am having, I call:

df.div(df.col_name, axis = 'index')

on a dataframe which has 7 columns and 3596 rows, the result is invariably:

ValueError                                Traceback (most recent call last)
<ipython-input-55-5797510566fc> in <module>()

[.. several long calls...]

C:\Users\Ataturk\Anaconda\lib\site-packages\pandas\core\ops.pyc in na_op(x, y)
    752             result = result.reshape(x.shape)
    753
--> 754         result = com._fill_zeros(result, x, y, name, fill_zeros)
    755
    756         return result

C:\Users\Ataturk\Anaconda\lib\site-packages\pandas\core\common.pyc in _fill_zeros(result, x, y, name, fill)
   1252                 signs = np.sign(result)
  1253                 nans = np.isnan(x.ravel())
-> 1254                 np.putmask(result, mask & ~nans, fill)
   1255
   1256                 # if we have a fill of inf, then sign it

ValueError: operands could not be broadcast together with shapes (3596,) (25172,)

Division across specific columns works fine:

df.one_column / df.col_name

But as soon as I go to multiple columns, same error (with a different number in the last set of parentheses):

df[['one_column_name', 'another_column_name']] / df.col_name

I've tried the various possible syntaxes, .div and / and referencing through [] as well as .name, it's all the same. Dimensions fit, but it seems to append all the columns to be divided to each other, creating the second number, which is of course larger by a factor than the column that it then tries to divide by. What am I doing wrong?

df.info():

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3596 entries, 0 to 3595
Data columns (total 7 columns):
bal_cast    3596 non-null int64
Degt        3596 non-null int64
Meln        3596 non-null int64
Levich      3596 non-null int64
Navu        3596 non-null int64
Mitr        3596 non-null int64
Sob         3596 non-null int64
dtypes: int64(7)

bal_cast is the name of the column I am trying to divide by; here is the exact division call, where the relevant dataframe is call result:

In [58]: result.div(result.bal_cast, axis='index')

Current conda install:

         platform : win-64
    conda version : 3.5.2
   python version : 2.7.6.final.0

Pandas: 0.14.0; Numpy: 1.8.1

EDIT: Following the discussion in the comments, smaller slices of the same table divide through without issue.

can you show df.info() your exact division call. and numpy/pandas versions, and platform. — Jeff
– Jeff, Commented Jun 3, 2014 at 21:10
Thanks for the pointer, added all of the above into the question. — Bacchus
– Bacchus, Commented Jun 3, 2014 at 21:16
Yes, result is the name actually used in the code, I wrote df for the purposes of the question as more comprehensible. — Bacchus
– Bacchus, Commented Jun 3, 2014 at 21:22

Jeff · Accepted Answer · 2014-06-03 22:03:52Z

5

Workaround is this:

df.astype('float').div(df['column'].astype('float'),axis='index')

The filling algorithm is choking on this. If you are dividing integers by 0, then you get infs. Their is a bug in that. See here

Casting to float 'solves' this problem as the a float / 0 is handled by numpy directly. Side note: the reasons pandas handles the division is because numpy int division is truncation and gives you back an integer (which is odd).

Integers give a weird/odd result in numpy.

In [10]: Series([1])/0
Out[10]: 
0    inf
dtype: float64

In [11]: Series([1]).values/0
Out[11]: array([0])

Floats are correct in numpy

In [12]: Series([1.])/0
Out[12]: 
0    inf
dtype: float64

In [14]: Series([1.]).values/0
Out[14]: array([ inf])

answered Jun 3, 2014 at 22:03

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bacchus Over a year ago

Interesting, thanks. Frankly, I'm just learning the language and its most prominent tools, so I can go ahead sticking to series which don't have zeroes in them, but it's good to have a workaround handy and maybe others will find this useful. I actually didn't expect this series to have zeroes. Any idea why the error seems to reference incorrect dimensions? Does so quite consistently both in mine and DSM's cases. Or is it just reporting the dimensions of each series for information?

Jeff Over a year ago

this is a bug, the error is a generic one; will be fixed for 0.14.1

Collectives™ on Stack Overflow

Division in pandas: multiple columns by another column of the same DataFrame

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related