python: iteration through a grouped dataframe

Question

this dataframe has been reordered to assure that the categories in column 'x' are grouped together with column 'y' in an ascending order. once I performed these changes, I am trying to sum column 'y' and 'z' for every category 'x' so that each row is always summed to the previous rows within the same category. The output displays the correct result but I am now trying to find a way to simplify the last 12 lines instead of writing a line for each row of the dataframe.

import numpy as np
import pandas as pd
data=[['A', 1,2 ],
        ['B', 5,5],
        ['C', 4,6]
        ,['A', 6,5],
        ['B',9,9],
        ['C', 7,9]
        ,['A', 2,3],
        ['B', 5,1],
        ['C',2,10]
        ,['B', 8,2],
        ['B', 5,4],
        ['C', 8,5 ]]
df = pd.DataFrame(data, columns=['x','y','z'])
df = df.sort_values(by =['x', 'y'], ascending=True)
pd.set_option('display.max_rows',14)
print(df.head(14))
print (sum(df.iloc[0:1]['y'] + df.iloc[0:1]['z']))
print (sum(df.iloc[0:2]['y'] + df.iloc[0:2]['z']))
print (sum(df.iloc[0:3]['y'] + df.iloc[0:3]['z']))

print (sum(df.iloc[3:4]['y'] + df.iloc[3:4]['z']))
print (sum(df.iloc[3:5]['y'] + df.iloc[3:5]['z']))
print (sum(df.iloc[3:6]['y'] + df.iloc[3:6]['z']))
print (sum(df.iloc[3:7]['y'] + df.iloc[3:7]['z']))
print (sum(df.iloc[3:8]['y'] + df.iloc[3:8]['z']))

print (sum(df.iloc[8:9]['y'] + df.iloc[8:9]['z']))
print (sum(df.iloc[8:10]['y'] + df.iloc[8:10]['z']))      
print (sum(df.iloc[8:11]['y'] + df.iloc[8:11]['z']))
print (sum(df.iloc[8:12]['y'] + df.iloc[8:12]['z']))

Do you just want the final sums themselves and not the whole cumulative sum? — ifly6
– ifly6, Commented Jun 21, 2021 at 22:29

Corralien · Accepted Answer · 2021-06-21 22:30:10Z

3

Use cumsum and sum the result along the columns:

>>> df.groupby('x').cumsum().sum(axis=1)
0      3
6      8
3     19
1     10
7     16
10    25
9     35
4     53
8     12
2     22
5     38
11    51
dtype: int64

answered Jun 21, 2021 at 22:30

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ifly6 · Accepted Answer · 2021-06-21 22:33:59Z

I think what you're looking for is something like:

>>> df.groupby('x')[['y', 'z']].cumsum().eval('sum_yz = y + z')

     y   z  sum_yz
0    1   2       3
6    3   5       8
3    9  10      19
1    5   5      10
7   10   6      16
10  15  10      25
9   23  12      35
4   32  21      53
8    2  10      12
2    6  16      22
5   13  25      38
11  21  30      51

You can make it simpler to understand by having a second line assigning a new column as df['sum_yz'] = df['y'] + df['z'] but df.eval works here. If you want to return this with the rest of the data, I would drop the y and z columns in the new data frame, then pd.concat passing axis=1 with the original data frame.

If you just want the final sums themselves, that is even easier:

>>> df.groupby('x')[['y', 'z']].sum().eval('sum_yz = y + z')

    y   z  sum_yz
x                
A   9  10      19
B  32  21      53
C  21  30      51

Collectives™ on Stack Overflow

python: iteration through a grouped dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related