pandas groupby: can I select an agg function by one level of a column MultiIndex?

Question

I have a pandas DataFrame with a MultiIndex of columns:

columns=pd.MultiIndex.from_tuples(
    [(c, i) for c in ['a', 'b'] for i in range(3)])
df = pd.DataFrame(np.random.randn(4, 6),
                  index=[0, 0, 1, 1],
                  columns=columns)
print(df)

          a                             b                    
          0         1         2         0         1         2
0  0.582804  0.753118 -0.900950 -0.914657 -0.333091 -0.965912
0  0.498002 -0.842624  0.155783  0.559730 -0.300136 -1.211412
1  0.727019  1.522160  1.679025  1.738350  0.593361  0.411907
1  1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349

I want to group by the index, and use the 'min' aggregation on the a columns, and the 'sum' aggregation on the b columns.

I know I can do this by creating a dict that specifies the agg function for each column:

agg_dict = {'a': 'min', 'b': 'sum'}
full_agg_dict = {(c, i): agg_dict[c] for c in ['a', 'b'] for i in range(3)}
print(df.groupby(level=0).agg(full_agg_dict))

          a                             b                    
          0         1         2         0         1         2
0  0.498002 -0.842624 -0.900950 -0.354927 -0.633227 -2.177324
1  0.727019 -0.806279 -2.177582  1.639140 -0.246461  0.200558

Is there a simpler way? It seems like there should be a way to do this with agg_dict without using full_agg_dict.

I dont know if there's anything simpler. Probably just make the dictionary more flexible (and easier to read) if it doesn't follow a perfect pattern: {x: agg_dict[x[0]] for x in df.columns} — ALollz
– ALollz, Commented Sep 5, 2019 at 18:50

Quang Hoang · Accepted Answer · 2019-09-05 17:40:56Z

2

I would use your approach as well. But here's another way that (should) work:

(df.stack(level=1)
   .groupby(level=[0,1])
   .agg({'a':'min','b':'sum'})
   .unstack(-1)
)

For some reason groupby(level=[0,1] doesn't work for me, so I came up with:

(df.stack(level=1)
   .reset_index()
   .groupby(['level_0','level_1'])
   .agg({'a':'min','b':'sum'})
   .unstack('level_1')
)

answered Sep 5, 2019 at 17:40

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

stahamtan Over a year ago

Reason the first solution does not work for this dataframe is because of the original dataframe's index is a 1D array of shape (4,). If instead a 2D array (index=[[0, 0, 1, 1]]) was passed, it would work just fine.

Quang Hoang Over a year ago

@ALollz agreed. That's what I said at the very beginning as well.

user3483203 Over a year ago

@SIA it's a bug with stack. Codes are created incorrectly (which are then used in the groupby) when the index has duplicate values. stack currently just uses new_codes = [np.arange(N).repeat(levsize)] to generate new codes, which ignores dupes.

Collectives™ on Stack Overflow

pandas groupby: can I select an agg function by one level of a column MultiIndex?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related