1

I have a pandas dataframe df that looks like this:

>>>df
group A B C
1     1 2 3
1     2 3 6
1     4 9 9
2     8 1 2
2     5 6 4
3     6 5 7

I would like it multi-indexed so it looks like

group 
      A B C
1     1 2 3
      2 3 6
      4 9 9
2     8 1 2
      5 6 4
3     6 5 7

I'd like to access each group number gives me a dataframe of just the values for that group index. What I mean is if I type df[0] then I get

>>>df[0]
A B C
1 2 3
2 3 6
4 9 9

and I can do the usual functions, like take the mean via df[0].mean()

I'm sure this is possible but reading the pandas help pages and looking through forums seems to have solutions for people who already created multi-indexed dataframes with tuples.

1 Answer 1

2

set_index will do this for you.

df = df.set_index('group').set_index(
    df.groupby('group').cumcount(), append=True
)

df
         A  B  C
group           
1     0  1  2  3
      1  2  3  6
      2  4  9  9
2     0  8  1  2
      1  5  6  4
3     0  6  5  7

Alternatively, create a MultiIndex object and assign to df.index. This is a lot more efficient in terms of memory.

i = df['group']
j = df.groupby(df.pop('group')).cumcount()

df.index = pd.MultiIndex.from_arrays([i, j])

And now,

df.xs(1)

   A  B  C
0  1  2  3
1  2  3  6
2  4  9  9

Just Like That™.


If you don't fancy the xs at the end, there's certainly the option of splitting your DataFrame into groups and dumping each one into a dictionary.

The groupby API has been written to mimic the itertools.groupby dict-like idiom, here's what that looks like:

df_dict = {k : g for k, g in df.drop('group', 1).groupby(df.group)}
df_dict[1]

   A  B  C
0  1  2  3
1  2  3  6
2  4  9  9

Note that this is no longer a single DataFrame, but a dictionary of them.

Sign up to request clarification or add additional context in comments.

10 Comments

Downvoter, please let me know why you've spat on this answer. Your feedback will help me correct any mistakes. Thank you.
This works! BUT!!! It's awfully complicated, as I hope you will agree. It would certainly be nice if there were a much shorter method. It would also be useful to not have to refer to the groupings with df.xs() but via a standard df[column]. By the way, I didn't downvote!
@MihaiAlexandru-Ionut Thank you, I appreciate your support. You can count on me to return the favour :)
@AstroBen Edited with a much more efficient solution.
@AstroBen No, and I will explain why. For indexing multiIndexes in the manner you want, you will need to provide slice objects. However, slices are not hashable, so this does not work directly. You will need to go through an accessor like loc or xs to extract slices. The alternative would be keeping a dictionary of groupby objects. Wait, let me edit again :p
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.