1

this dataframe has been reordered to assure that the categories in column 'x' are grouped together with column 'y' in an ascending order. once I performed these changes, I am trying to sum column 'y' and 'z' for every category 'x' so that each row is always summed to the previous rows within the same category. The output displays the correct result but I am now trying to find a way to simplify the last 12 lines instead of writing a line for each row of the dataframe.

import numpy as np
import pandas as pd
data=[['A', 1,2 ],
        ['B', 5,5],
        ['C', 4,6]
        ,['A', 6,5],
        ['B',9,9],
        ['C', 7,9]
        ,['A', 2,3],
        ['B', 5,1],
        ['C',2,10]
        ,['B', 8,2],
        ['B', 5,4],
        ['C', 8,5 ]]
df = pd.DataFrame(data, columns=['x','y','z'])
df = df.sort_values(by =['x', 'y'], ascending=True)
pd.set_option('display.max_rows',14)
print(df.head(14))
print (sum(df.iloc[0:1]['y'] + df.iloc[0:1]['z']))
print (sum(df.iloc[0:2]['y'] + df.iloc[0:2]['z']))
print (sum(df.iloc[0:3]['y'] + df.iloc[0:3]['z']))

print (sum(df.iloc[3:4]['y'] + df.iloc[3:4]['z']))
print (sum(df.iloc[3:5]['y'] + df.iloc[3:5]['z']))
print (sum(df.iloc[3:6]['y'] + df.iloc[3:6]['z']))
print (sum(df.iloc[3:7]['y'] + df.iloc[3:7]['z']))
print (sum(df.iloc[3:8]['y'] + df.iloc[3:8]['z']))

print (sum(df.iloc[8:9]['y'] + df.iloc[8:9]['z']))
print (sum(df.iloc[8:10]['y'] + df.iloc[8:10]['z']))      
print (sum(df.iloc[8:11]['y'] + df.iloc[8:11]['z']))
print (sum(df.iloc[8:12]['y'] + df.iloc[8:12]['z']))
1
  • Do you just want the final sums themselves and not the whole cumulative sum? Commented Jun 21, 2021 at 22:29

2 Answers 2

3

Use cumsum and sum the result along the columns:

>>> df.groupby('x').cumsum().sum(axis=1)
0      3
6      8
3     19
1     10
7     16
10    25
9     35
4     53
8     12
2     22
5     38
11    51
dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

2

I think what you're looking for is something like:

>>> df.groupby('x')[['y', 'z']].cumsum().eval('sum_yz = y + z')

     y   z  sum_yz
0    1   2       3
6    3   5       8
3    9  10      19
1    5   5      10
7   10   6      16
10  15  10      25
9   23  12      35
4   32  21      53
8    2  10      12
2    6  16      22
5   13  25      38
11  21  30      51

You can make it simpler to understand by having a second line assigning a new column as df['sum_yz'] = df['y'] + df['z'] but df.eval works here. If you want to return this with the rest of the data, I would drop the y and z columns in the new data frame, then pd.concat passing axis=1 with the original data frame.

If you just want the final sums themselves, that is even easier:

>>> df.groupby('x')[['y', 'z']].sum().eval('sum_yz = y + z')

    y   z  sum_yz
x                
A   9  10      19
B  32  21      53
C  21  30      51

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.