How do I loop over each row in a pandas groupby()?

Question

Let's say I have:

df = pd.DataFrame({'a' : [1, 2, 3, 4, 5] , 'b' : ['cat_1', 'cat_1', 'cat_2', 'cat_2', 'cat_2']})

I perform a groupby:

df.groupby(['b']).agg(['count', 'median'])

I would like to iterate through the rows that this call returns, for example:

for row in ?:
    print(row)

should print something like:

('cat_1', 2, 1.5)
('cat_2', 3, 4)

jpp · Accepted Answer · 2019-01-17 21:11:51Z

7

You've misunderstood: df.groupby(['b']).agg(['count', 'median']) returns an in-memory dataframe, not an iterator of groupwise results.

Your result is often expressed in this way:

res = df.groupby('b')['a'].agg(['count', 'median'])

print(res)

#        count  median
# b                   
# cat_1      2     1.5
# cat_2      3     4.0

Iterating a dataframe is possible via iterrows or, more efficiently, itertuples:

for row in df.groupby('b')['a'].agg(['count', 'median']).itertuples():
    print((row.Index, row.count, row.median))

print(res)

# ('cat_1', 2, 1.5)
# ('cat_2', 3, 4.0)

If you are looking to calculate lazily, iterate a groupby object and perform your calculations on each group independently. For data that fits comfortably in memory, you should expect this to be slower than iterating a dataframe of results.

for key, group in df.groupby('b'):
    print((key, group['a'].count(), group['a'].median()))

# ('cat_1', 2, 1.5)
# ('cat_2', 3, 4.0)

If you do face memory issues, consider dask.dataframe for such tasks.

edited Jan 17, 2019 at 21:11

answered Jan 17, 2019 at 21:06

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marquistador Over a year ago

Thanks for the help!

Marquistador · Accepted Answer · 2019-01-17 20:57:46Z

0

This will do the trick:

for item in df.groupby(['b']).agg(['count', 'median']).reset_index().values:
     # Perform operation on 'item' ...

answered Jan 17, 2019 at 20:57

Marquistador

1,9591 gold badge21 silver badges28 bronze badges

Collectives™ on Stack Overflow

How do I loop over each row in a pandas groupby()?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related