Sum of columns based on grouping

Question

d = {'a': [0,1,0,1,1,1], 'b': [1,1,1,1,1,1], 'c': [0,0,0,0,0,1], 'd': [1,0,1,1,1,0]}
dct = {'a':'top','b':'bot','c':'top','d':'bot'}
df = pd.DataFrame(d)

So I have a df and a dictionary that assigns each column to a group. Having trouble creating a function that creates a total(sum) column for the group. This would be the output of the first 2 rows:

a	b	c	d	total_top	total_bot
0	1	0	1	0	2
1	1	0	0	1	1

and so on.... Ideally taking the dataframe and name of the group as inputs

In the dictionary, a & c have a value 'top'. In the dataframe, I want to sum column a & c — chicagobeast12
– chicagobeast12, Commented Nov 22, 2021 at 19:48

Akshay Sehgal · Accepted Answer · 2021-11-22 19:57:18Z

Try the following -

sums = df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()
df_new = df.join(sums)
df_new

   a  b  c  d  bot  top
0  0  1  0  1    2    0
1  1  1  0  0    1    1
2  0  1  0  1    2    0
3  1  1  0  1    2    1
4  1  1  0  1    2    1
5  1  1  1  0    1    2

Explanation -

df.rename(columns=dct) renames the column names to top and bot based on the dictionary dct

df.rename(columns=dct)

   top  bot  top  bot
0    0    1    0    1
1    1    1    0    0
2    0    1    0    1
3    1    1    0    1
4    1    1    0    1
5    1    1    1    0

Then stack() stacks the rows so you have a single column with 1s and 0s and multiple indexes.

df.rename(columns=dct).stack()

0  top    0
   bot    1
   top    0
   bot    1
1  top    1
   bot    1
   top    0
   bot    0
2  top    0
   bot    1
   top    0
   bot    1
3  top    1
   bot    1
   top    0
   bot    1
4  top    1
   bot    1
   top    0
   bot    1
5  top    1
   bot    1
   top    1
   bot    0
dtype: int64

Groupby with sum() groups on level 0 (row indexes) and level 1 (column with top and bot values) and sums up the 1s

df.rename(columns=dct).stack().groupby(level=(0,1)).sum()

0  bot    2
   top    0
1  bot    1
   top    1
2  bot    2
   top    0
3  bot    2
   top    1
4  bot    2
   top    1
5  bot    1
   top    2
dtype: int64

Unstack to get back the 2 columns top and bot

df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()

   bot  top
0    2    0
1    1    1
2    2    0
3    2    1
4    2    1
5    1    2

Last, just join the original df with this new sums dataframe over indexes.

df.join(sums)

   a  b  c  d  bot  top
0  0  1  0  1    2    0
1  1  1  0  0    1    1
2  0  1  0  1    2    0
3  1  1  0  1    2    1
4  1  1  0  1    2    1
5  1  1  1  0    1    2

not_speshal · Accepted Answer · 2021-11-22 19:56:03Z

2

map the columns names to groups
groupby the resulting values along axis=1 and sum
add_prefix to columns for naming
join to df

>>> df.join(df.groupby(df.columns.map(dct), axis=1).sum().add_prefix("total_"))

   a  b  c  d  total_bot  total_top
0  0  1  0  1          2          0
1  1  1  0  0          1          1
2  0  1  0  1          2          0
3  1  1  0  1          2          1
4  1  1  0  1          2          1
5  1  1  1  0          1          2

answered Nov 22, 2021 at 19:56

not_speshal

23.2k2 gold badges18 silver badges33 bronze badges

Comments

sammywemmy · Accepted Answer · 2021-11-22 20:15:32Z

0

Group by the dct and join back:

grouped = df.groupby(dct, axis = 1).sum().add_prefix('total_')

pd.concat([df, grouped], axis = 1)
 
   a  b  c  d  total_bot  total_top
0  0  1  0  1          2          0
1  1  1  0  0          1          1
2  0  1  0  1          2          0
3  1  1  0  1          2          1
4  1  1  0  1          2          1
5  1  1  1  0          1          2

answered Nov 22, 2021 at 20:15

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Sum of columns based on grouping

3 Answers 3

Explanation -

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Explanation -

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related