4

I'm trying to multiply N columns in a DataFrame by N columns in the same DataFrame, and then divide the results by a single column. I'm having trouble with the first part, see example below.

import pandas as pd
from numpy import random

foo = pd.DataFrame({'A':random.rand(10),
                    'B':random.rand(10),
                    'C':random.rand(10),
                    'N':random.randint(1,100,10),
                    'X':random.rand(10),
                    'Y':random.rand(10),
                    'Z':random.rand(10), })

foo[['A','B','C']].multiply(foo[['X','Y','Z']], axis=0).divide(foo['N'], axis=0)

What I'm trying to get at is column-wise multiplication (i.e. A*X, B*Y, C*Z) The result is not an N column matrix but a 2N one, where the columns I'm trying to multiply by are added to the DataFrame, and all the entries have NaN values, like so:

    A   B   C   X   Y   Z
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN

What's going on here, and how do I do column-wise multiplication?

1
  • The problem here is that alignment will occur where you have column names or index values, here the index values are fine it's the fact you have column names, as you multiplying columns where the column names fo not agree the result is NaN you need to convert to numpy array values by accessing .values attribute like Alexander's answer Commented May 15, 2015 at 7:58

1 Answer 1

1

This will work using the values from columns X, Y, Z and N, but perhaps it will help you see what the issue is:

>>> (foo[['A','B','C']]
     .multiply(foo[['X','Y','Z']].values)
     .divide(foo['N'].values, axis=0))
          A         B         C
0  0.000452  0.004049  0.010364
1  0.004716  0.001566  0.012881
2  0.001488  0.000296  0.004415
3  0.000269  0.001168  0.000327
4  0.001386  0.008267  0.012048
5  0.000084  0.009588  0.003189
6  0.000099  0.001063  0.006493
7  0.009958  0.035766  0.012618
8  0.001252  0.000860  0.000420
9  0.006422  0.005013  0.004108

The result is indexed on columns A, B, C. It is unclear what the resulting columns should be, which is why you are getting the NaNs.

Appending the function above with .values will give you the result you desire, but it is then up to you to replace the index and columns.

>>> (foo[['A','B','C']]
     .multiply(foo[['X','Y','Z']].values)
     .divide(foo['N'].values, axis=0)).values
array([[  4.51754797e-04,   4.04911292e-03,   1.03638836e-02],
       [  4.71588457e-03,   1.56556402e-03,   1.28805803e-02],
       [  1.48820116e-03,   2.95700572e-04,   4.41516179e-03],
       [  2.68791866e-04,   1.16836123e-03,   3.27217820e-04],
       [  1.38648301e-03,   8.26692582e-03,   1.20482313e-02],
       [  8.38762247e-05,   9.58768066e-03,   3.18903965e-03],
       [  9.94132918e-05,   1.06267623e-03,   6.49315435e-03],
       [  9.95764539e-03,   3.57657737e-02,   1.26179014e-02],
       [  1.25210929e-03,   8.59735215e-04,   4.20124326e-04],
       [  6.42175897e-03,   5.01250179e-03,   4.10783492e-03]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.