Python Pandas: Find Sum of Column Based on Value of Two other Columns

Question

While iterating through the variableA column, I want to generate a new column that is the sum of values whenever a row in either variableA or variableB equals the current row values of variableA. Example data:

    values    variableA  variableB
  0  134       1             3
  1  12        2             6
  2  43        1             2
  3  54        3             1
  4  16        2             7

I can select the sum of values whenever variableA matches the current row of variableA using:

df.groupby('variableA')['values'].transform('sum')

but selecting the sum of values whenever variableB matches the current row of variableA eludes me. I tried .loc but it doesn't seem to play well with .groupby. The expected output would be as follows:

    values    variableA  variableB  result
  0  134       1             3      231
  1  12        2             6      71
  2  43        1             2      231
  3  54        3             1      188
  4  16        2             7      71

Thanks!

piRSquared · Accepted Answer · 2017-01-18 00:51:56Z

2

A vectorized approach with numpy broadcasting

vars = df[['variableA', 'variableB']].values
matches = (vars[:, None] == vars[:, [0]]).any(-1)

df.assign(result=df['values'].values @ matches)  # @ operator with python 3
# use this for use python 2
# df.assign(result=df['values'].values.dot(matches))

time testing

answered Jan 18, 2017 at 0:51

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

juanpa.arrivillaga · Accepted Answer · 2017-01-17 23:07:55Z

Well, you could always use .apply, but be warned: it can be slow:

>>> df
   values  variableA  variableB
0     134          1          3
1      12          2          6
2      43          1          2
3      54          3          1
4      16          2          7
>>> df.apply(lambda S: df.loc[(df.variableA == S.variableA) | (df.variableB == S.variableA), 'values'].sum(), axis=1)
0    231
1     71
2    231
3    188
4     71
dtype: int64

Of course, you would have to assign it...

>>> df['result'] = df.apply(lambda S: df.loc[(df.variableA == S.variableA) | (df.variableB == S.variableA), 'values'].sum(), axis=1)
>>> df
   values  variableA  variableB  result
0     134          1          3     231
1      12          2          6      71
2      43          1          2     231
3      54          3          1     188
4      16          2          7      71

Collectives™ on Stack Overflow

Python Pandas: Find Sum of Column Based on Value of Two other Columns

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related