Need help turning pandas dataframe into multiindex by grouping just one column.

Question

I have a pandas dataframe df that looks like this:

>>>df
group A B C
1     1 2 3
1     2 3 6
1     4 9 9
2     8 1 2
2     5 6 4
3     6 5 7

I would like it multi-indexed so it looks like

group 
      A B C
1     1 2 3
      2 3 6
      4 9 9
2     8 1 2
      5 6 4
3     6 5 7

I'd like to access each group number gives me a dataframe of just the values for that group index. What I mean is if I type df[0] then I get

>>>df[0]
A B C
1 2 3
2 3 6
4 9 9

and I can do the usual functions, like take the mean via df[0].mean()

I'm sure this is possible but reading the pandas help pages and looking through forums seems to have solutions for people who already created multi-indexed dataframes with tuples.

cs95 · Accepted Answer · 2018-04-20 13:24:01Z

2

set_index will do this for you.

df = df.set_index('group').set_index(
    df.groupby('group').cumcount(), append=True
)

df
         A  B  C
group           
1     0  1  2  3
      1  2  3  6
      2  4  9  9
2     0  8  1  2
      1  5  6  4
3     0  6  5  7

Alternatively, create a MultiIndex object and assign to df.index. This is a lot more efficient in terms of memory.

i = df['group']
j = df.groupby(df.pop('group')).cumcount()

df.index = pd.MultiIndex.from_arrays([i, j])

And now,

Just Like That™.

If you don't fancy the xs at the end, there's certainly the option of splitting your DataFrame into groups and dumping each one into a dictionary.

The groupby API has been written to mimic the itertools.groupby dict-like idiom, here's what that looks like:

df_dict = {k : g for k, g in df.drop('group', 1).groupby(df.group)}
df_dict[1]

   A  B  C
0  1  2  3
1  2  3  6
2  4  9  9

Note that this is no longer a single DataFrame, but a dictionary of them.

edited Apr 20, 2018 at 13:24

answered Apr 20, 2018 at 13:08

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

cs95 Over a year ago

Downvoter, please let me know why you've spat on this answer. Your feedback will help me correct any mistakes. Thank you.

AstroBen Over a year ago

This works! BUT!!! It's awfully complicated, as I hope you will agree. It would certainly be nice if there were a much shorter method. It would also be useful to not have to refer to the groupings with df.xs() but via a standard df[column]. By the way, I didn't downvote!

cs95 Over a year ago

@MihaiAlexandru-Ionut Thank you, I appreciate your support. You can count on me to return the favour :)

cs95 Over a year ago

@AstroBen Edited with a much more efficient solution.

cs95 Over a year ago

@AstroBen No, and I will explain why. For indexing multiIndexes in the manner you want, you will need to provide slice objects. However, slices are not hashable, so this does not work directly. You will need to go through an accessor like loc or xs to extract slices. The alternative would be keeping a dictionary of groupby objects. Wait, let me edit again :p

|

Collectives™ on Stack Overflow

Need help turning pandas dataframe into multiindex by grouping just one column.

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related