2

I have a dataframe and would like to add sums of specific rows into this dataframe. So for example I have

df = pd.DataFrame({'prod':['a','a','a','b','b','b','c','c','c'], 'attribute':['x','y','z','x','y','z','x','y','z'],
                  'number1':[1,2,2,3,4,3,5,1,1], 'number2':[10,2,3,3,1,2,3,1,1], 'number3':[1,4,3,5,7,1,3,0,1]})

How can I add for each prod a, b and c the sum of number 1/2/3 of the attributes y and z as a new row? So it looks like this

    prod    attribute   number1 number2 number3
0   a       x           1       10      1
1   a       y           2       2       4
2   a       z           2       3       3
3   a       sum_yz      4       5       7
4   b       x           3       3       5
5   b       y           4       1       7
6   b       z           3       2       1
7   b       sum_yz      7       3       8
8   c       x           5       3       3
9   c       y           1       1       0
10  c       z           1       1       1
11  c       sum_yz      2       2       1

6 Answers 6

3

You need concat with a condtional groupby.

You can filter the dataframe by using isin and add a new column with assign.

First let's select the target cols to sum.

cols = [col for col in df.columns if 'number' in col]

df1 = pd.concat(
    [
        df,
        df[df["attribute"].isin(["y", "z"])]
        .groupby("prod")[cols]
        .sum()
        .assign(attribute="sum_yz")
        .reset_index(),
    ]
).sort_values("prod")


print(df1)

  prod attribute  number1  number2  number3
0    a         x        1       10        1
1    a         y        2        2        4
2    a         z        2        3        3
0    a    sum_yz        4        5        7
3    b         x        3        3        5
4    b         y        4        1        7
5    b         z        3        2        1
1    b    sum_yz        7        3        8
6    c         x        5        3        3
7    c         y        1        1        0
8    c         z        1        1        1
2    c    sum_yz        2        2        1
Sign up to request clarification or add additional context in comments.

Comments

2

You could make a separate DataFrane and append it back to the original DataFrame, something like this (this code is untested):

# Filter to the desired attributes
sum_yz = df[df['attribute'].isin(['y', 'z'])]
# Set the new 'attribute' value
sum_yz['attribute'] = 'sum_yz'
# Group by and sum
sum_yz = sum_yz.groupby(['prod', 'attribute']).sum().reset_index()

# Add it the end of the data frame
df = pd.concat([df, sum_yz])

Comments

2

You can use df.groupby() and then combine the groupby-outcome with the original df

# Create groupby DataFrame
df_grp = df[df['attribute'].isin(['y', 'z'])].groupby(['prod']).sum()
df_grp.reset_index(inplace=True)
df_grp['attribute'] = 'sum_yz'

# Combine with original dataframe
df = pd.concat([df, df_grp])

Comments

1

You can use pandas concat after the groupby :

result = df.groupby(["prod", df.attribute.isin(["y", "z"])]).sum().loc[:, True, :]
result = result.reset_index()
result.insert(1, "attribute", "sum_yz")
pd.concat([df, result]).sort_values("prod", ignore_index=True)

  prod  attribute   number1 number2 number3
0   a      x           1    10      1
1   a      y           2    2       4
2   a      z           2    3       3
3   a     sum_yz       4    5       7
4   b      x           3    3       5
5   b      y           4    1       7
6   b      z           3    2       1
7   b    sum_yz        7    3       8
8   c      x           5    3       3
9   c      y           1    1       0
10  c      z           1    1       1
11  c    sum_yz        2    2       1

Comments

1

One idea with dictionaries, but slowier if large DataFrame:

def f(x):
    d =  x[x['attribute'].isin(['y','z'])].sum()
    d1 = {'prod': x.name, 'attribute':'sum_yz'}
    x = x.append({**d, **d1},ignore_index=True)
    return x

df = df.groupby('prod', sort=False).apply(f).reset_index(drop=True)
print (df)
   prod attribute  number1  number2  number3
0     a         x        1       10        1
1     a         y        2        2        4
2     a         z        2        3        3
3     a    sum_yz        4        5        7
4     b         x        3        3        5
5     b         y        4        1        7
6     b         z        3        2        1
7     b    sum_yz        7        3        8
8     c         x        5        3        3
9     c         y        1        1        0
10    c         z        1        1        1
11    c    sum_yz        2        2        1

Or if possible sorting values of product first filter by Series.isin, aggregate sum, add to original with replace NaN by DataFrame.fillna and last sorting by DataFrame.sort_values with ignore_index for default index:

df = (df.append(df[df['attribute'].isin(['y', 'z'])]
                   .groupby('prod', as_index=False)
                   .sum()
         ).fillna({'attribute': 'sum_yz'})
          .sort_values('prod', ignore_index=True))
         
print (df)
   prod attribute  number1  number2  number3
0     a         x        1       10        1
1     a         y        2        2        4
2     a         z        2        3        3
3     a    sum_yz        4        5        7
4     b         x        3        3        5
5     b         y        4        1        7
6     b         z        3        2        1
7     b    sum_yz        7        3        8
8     c         x        5        3        3
9     c         y        1        1        0
10    c         z        1        1        1
11    c    sum_yz        2        2        1

Comments

0

this is simple and works fine

dr=df[df['attribute']!='x'].groupby('prod').sum().reset_index()
dr['attribute']='sum_yz'
result=pd.concat([df,dr]).sort_values('prod')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.