Sorting by one column within the groups of a grouped DataFrame

Question

I have searched and found other questions on this general topic, but I cannot find the answer to this specific question.

I have a dataframe created by appending several dataframes together, followed by groupby and agg procedures.

I have constructed an example dataframe below, just by following the process.

# constructing an example dataframe
import numpy as np
import pandas as pd

rand = np.random.RandomState(1)

df1 = pd.DataFrame({'B': ['subgroup1'] * 6,})
df2 = pd.DataFrame({'B': ['subgroup2'] * 6,})
df2['date'] = '1-1-2017'
df2['C'] = rand.rand(6)

df3 = pd.DataFrame({'B': ['subgroup1'] * 6,})
df3['date'] = '1-2-2017'
df3['C'] = rand.rand(6)

df4 = pd.DataFrame({'B': ['subgroup2'] * 6,})
df4['date'] = '1-2-2017'
df4['C'] = rand.rand(6)

df5 = df1.append(df2)
df6 = df5.append(df3)
df7 = df6.append(df4)
print df7

           B      date         C
0  subgroup1  1-1-2017  0.417022
1  subgroup1  1-1-2017  0.720324
2  subgroup1  1-1-2017  0.000114
3  subgroup1  1-1-2017  0.302333
4  subgroup1  1-1-2017  0.146756
5  subgroup1  1-1-2017  0.092339
0  subgroup2  1-1-2017  0.186260
1  subgroup2  1-1-2017  0.345561
2  subgroup2  1-1-2017  0.396767
3  subgroup2  1-1-2017  0.538817
4  subgroup2  1-1-2017  0.419195
5  subgroup2  1-1-2017  0.685220
0  subgroup1  1-2-2017  0.204452
1  subgroup1  1-2-2017  0.878117
2  subgroup1  1-2-2017  0.027388
3  subgroup1  1-2-2017  0.670468
4  subgroup1  1-2-2017  0.417305
5  subgroup1  1-2-2017  0.558690
0  subgroup2  1-2-2017  0.140387
1  subgroup2  1-2-2017  0.198101
2  subgroup2  1-2-2017  0.800745
3  subgroup2  1-2-2017  0.968262
4  subgroup2  1-2-2017  0.313424
5  subgroup2  1-2-2017  0.692323

Next, I group by 2 columns, and add a new column consisting of the mean of column 'C', and a new column counting the values averaged.

group = df7.groupby(['date', 'B'])['C'].agg({'num' : len, 'C_mean' : np.mean})
print group

                    num    C_mean
date     B                       
1-1-2017 subgroup1  6.0  0.279815
         subgroup2  6.0  0.428637
1-2-2017 subgroup1  6.0  0.459403
         subgroup2  6.0  0.518874

The DataFrame 'group' is a small example showing the same stucture I have so far. In practice there would be a large number of rows in each date group.

I would like to sort the grouped DataFrame 'group' by the values in colum 'C_mean' -- but sorted within the groups in the first column 'date'. The sort should be descending.

So if we look at two values in column 'C_mean' in the group '1-1'2017', we would see 0.428637 and 0.279815 sorted descending. And likewise in the next date group '1-2-2017', the values in 'C_mean' would be sorted descending -- 0.518874 and 0.459403

                    num    C_mean
date     B                       
1-1-2017 subgroup1  6.0  0.428637
         subgroup2  6.0  0.279815
1-2-2017 subgroup1  6.0  0.518874
         subgroup2  6.0  0.459403

I have tried everything I can find to achieve this but, in every case, I have ended up with a sort of the whole column 'C_mean' -- I need to sort within the date groups.

Can anybody suggest a solution?

rdh9 · Accepted Answer · 2017-08-10 19:16:40Z

I got no response to this, but I did find a solution. Not very elegant, but it got the job done. I'll post it in case anybody else has a similar problem.

First copy the index to a new column

group['date'] = group.index

Then sort by the new column and the 'C_mean' column

group = group.sort_values(['date', 'C_mean'], ascending=[True, False])

This produces the required result

                    num    C_mean                   date
date     B                                              
1-1-2017 subgroup1  6.0  0.279815  (1-1-2017, subgroup1)
         subgroup2  6.0  0.428637  (1-1-2017, subgroup2)
1-2-2017 subgroup1  6.0  0.459403  (1-2-2017, subgroup1)
         subgroup2  6.0  0.518874  (1-2-2017, subgroup2)

Delete the date column which was added -- no longer required

del group['date']

                    num    C_mean
date     B                       
1-1-2017 subgroup1  6.0  0.279815
         subgroup2  6.0  0.428637
1-2-2017 subgroup1  6.0  0.459403
         subgroup2  6.0  0.518874

Collectives™ on Stack Overflow

Sorting by one column within the groups of a grouped DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related