Multiplying multiple columns in a DataFrame

Question

I'm trying to multiply N columns in a DataFrame by N columns in the same DataFrame, and then divide the results by a single column. I'm having trouble with the first part, see example below.

import pandas as pd
from numpy import random

foo = pd.DataFrame({'A':random.rand(10),
                    'B':random.rand(10),
                    'C':random.rand(10),
                    'N':random.randint(1,100,10),
                    'X':random.rand(10),
                    'Y':random.rand(10),
                    'Z':random.rand(10), })

foo[['A','B','C']].multiply(foo[['X','Y','Z']], axis=0).divide(foo['N'], axis=0)

What I'm trying to get at is column-wise multiplication (i.e. A*X, B*Y, C*Z) The result is not an N column matrix but a 2N one, where the columns I'm trying to multiply by are added to the DataFrame, and all the entries have NaN values, like so:

    A   B   C   X   Y   Z
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN

What's going on here, and how do I do column-wise multiplication?

The problem here is that alignment will occur where you have column names or index values, here the index values are fine it's the fact you have column names, as you multiplying columns where the column names fo not agree the result is NaN you need to convert to numpy array values by accessing .values attribute like Alexander's answer — EdChum
– EdChum, Commented May 15, 2015 at 7:58

Alexander · Accepted Answer · 2015-05-15 01:31:04Z

This will work using the values from columns X, Y, Z and N, but perhaps it will help you see what the issue is:

>>> (foo[['A','B','C']]
     .multiply(foo[['X','Y','Z']].values)
     .divide(foo['N'].values, axis=0))
          A         B         C
0  0.000452  0.004049  0.010364
1  0.004716  0.001566  0.012881
2  0.001488  0.000296  0.004415
3  0.000269  0.001168  0.000327
4  0.001386  0.008267  0.012048
5  0.000084  0.009588  0.003189
6  0.000099  0.001063  0.006493
7  0.009958  0.035766  0.012618
8  0.001252  0.000860  0.000420
9  0.006422  0.005013  0.004108

The result is indexed on columns A, B, C. It is unclear what the resulting columns should be, which is why you are getting the NaNs.

Appending the function above with .values will give you the result you desire, but it is then up to you to replace the index and columns.

>>> (foo[['A','B','C']]
     .multiply(foo[['X','Y','Z']].values)
     .divide(foo['N'].values, axis=0)).values
array([[  4.51754797e-04,   4.04911292e-03,   1.03638836e-02],
       [  4.71588457e-03,   1.56556402e-03,   1.28805803e-02],
       [  1.48820116e-03,   2.95700572e-04,   4.41516179e-03],
       [  2.68791866e-04,   1.16836123e-03,   3.27217820e-04],
       [  1.38648301e-03,   8.26692582e-03,   1.20482313e-02],
       [  8.38762247e-05,   9.58768066e-03,   3.18903965e-03],
       [  9.94132918e-05,   1.06267623e-03,   6.49315435e-03],
       [  9.95764539e-03,   3.57657737e-02,   1.26179014e-02],
       [  1.25210929e-03,   8.59735215e-04,   4.20124326e-04],
       [  6.42175897e-03,   5.01250179e-03,   4.10783492e-03]])

Collectives™ on Stack Overflow

Multiplying multiple columns in a DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related