0
d = {'a': [0,1,0,1,1,1], 'b': [1,1,1,1,1,1], 'c': [0,0,0,0,0,1], 'd': [1,0,1,1,1,0]}
dct = {'a':'top','b':'bot','c':'top','d':'bot'}
df = pd.DataFrame(d)

So I have a df and a dictionary that assigns each column to a group. Having trouble creating a function that creates a total(sum) column for the group. This would be the output of the first 2 rows:

a b c d total_top total_bot
0 1 0 1 0 2
1 1 0 0 1 1

and so on.... Ideally taking the dataframe and name of the group as inputs

2
  • I don't understand this. How are you calculating the sums? Commented Nov 22, 2021 at 19:42
  • In the dictionary, a & c have a value 'top'. In the dataframe, I want to sum column a & c Commented Nov 22, 2021 at 19:48

3 Answers 3

3

Try the following -

sums = df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()
df_new = df.join(sums)
df_new
   a  b  c  d  bot  top
0  0  1  0  1    2    0
1  1  1  0  0    1    1
2  0  1  0  1    2    0
3  1  1  0  1    2    1
4  1  1  0  1    2    1
5  1  1  1  0    1    2

Explanation -

  1. df.rename(columns=dct) renames the column names to top and bot based on the dictionary dct
df.rename(columns=dct)

   top  bot  top  bot
0    0    1    0    1
1    1    1    0    0
2    0    1    0    1
3    1    1    0    1
4    1    1    0    1
5    1    1    1    0
  1. Then stack() stacks the rows so you have a single column with 1s and 0s and multiple indexes.
df.rename(columns=dct).stack()

0  top    0
   bot    1
   top    0
   bot    1
1  top    1
   bot    1
   top    0
   bot    0
2  top    0
   bot    1
   top    0
   bot    1
3  top    1
   bot    1
   top    0
   bot    1
4  top    1
   bot    1
   top    0
   bot    1
5  top    1
   bot    1
   top    1
   bot    0
dtype: int64
  1. Groupby with sum() groups on level 0 (row indexes) and level 1 (column with top and bot values) and sums up the 1s
df.rename(columns=dct).stack().groupby(level=(0,1)).sum()

0  bot    2
   top    0
1  bot    1
   top    1
2  bot    2
   top    0
3  bot    2
   top    1
4  bot    2
   top    1
5  bot    1
   top    2
dtype: int64
  1. Unstack to get back the 2 columns top and bot
df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()

   bot  top
0    2    0
1    1    1
2    2    0
3    2    1
4    2    1
5    1    2
  1. Last, just join the original df with this new sums dataframe over indexes.
df.join(sums)

   a  b  c  d  bot  top
0  0  1  0  1    2    0
1  1  1  0  0    1    1
2  0  1  0  1    2    0
3  1  1  0  1    2    1
4  1  1  0  1    2    1
5  1  1  1  0    1    2
Sign up to request clarification or add additional context in comments.

Comments

2
  1. map the columns names to groups
  2. groupby the resulting values along axis=1 and sum
  3. add_prefix to columns for naming
  4. join to df
>>> df.join(df.groupby(df.columns.map(dct), axis=1).sum().add_prefix("total_"))

   a  b  c  d  total_bot  total_top
0  0  1  0  1          2          0
1  1  1  0  0          1          1
2  0  1  0  1          2          0
3  1  1  0  1          2          1
4  1  1  0  1          2          1
5  1  1  1  0          1          2

Comments

0

Group by the dct and join back:

grouped = df.groupby(dct, axis = 1).sum().add_prefix('total_')

pd.concat([df, grouped], axis = 1)
 
   a  b  c  d  total_bot  total_top
0  0  1  0  1          2          0
1  1  1  0  0          1          1
2  0  1  0  1          2          0
3  1  1  0  1          2          1
4  1  1  0  1          2          1
5  1  1  1  0          1          2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.