I have searched and found other questions on this general topic, but I cannot find the answer to this specific question.
I have a dataframe created by appending several dataframes together, followed by groupby and agg procedures.
I have constructed an example dataframe below, just by following the process.
# constructing an example dataframe
import numpy as np
import pandas as pd
rand = np.random.RandomState(1)
df1 = pd.DataFrame({'B': ['subgroup1'] * 6,})
df2 = pd.DataFrame({'B': ['subgroup2'] * 6,})
df2['date'] = '1-1-2017'
df2['C'] = rand.rand(6)
df3 = pd.DataFrame({'B': ['subgroup1'] * 6,})
df3['date'] = '1-2-2017'
df3['C'] = rand.rand(6)
df4 = pd.DataFrame({'B': ['subgroup2'] * 6,})
df4['date'] = '1-2-2017'
df4['C'] = rand.rand(6)
df5 = df1.append(df2)
df6 = df5.append(df3)
df7 = df6.append(df4)
print df7
B date C
0 subgroup1 1-1-2017 0.417022
1 subgroup1 1-1-2017 0.720324
2 subgroup1 1-1-2017 0.000114
3 subgroup1 1-1-2017 0.302333
4 subgroup1 1-1-2017 0.146756
5 subgroup1 1-1-2017 0.092339
0 subgroup2 1-1-2017 0.186260
1 subgroup2 1-1-2017 0.345561
2 subgroup2 1-1-2017 0.396767
3 subgroup2 1-1-2017 0.538817
4 subgroup2 1-1-2017 0.419195
5 subgroup2 1-1-2017 0.685220
0 subgroup1 1-2-2017 0.204452
1 subgroup1 1-2-2017 0.878117
2 subgroup1 1-2-2017 0.027388
3 subgroup1 1-2-2017 0.670468
4 subgroup1 1-2-2017 0.417305
5 subgroup1 1-2-2017 0.558690
0 subgroup2 1-2-2017 0.140387
1 subgroup2 1-2-2017 0.198101
2 subgroup2 1-2-2017 0.800745
3 subgroup2 1-2-2017 0.968262
4 subgroup2 1-2-2017 0.313424
5 subgroup2 1-2-2017 0.692323
Next, I group by 2 columns, and add a new column consisting of the mean of column 'C', and a new column counting the values averaged.
group = df7.groupby(['date', 'B'])['C'].agg({'num' : len, 'C_mean' : np.mean})
print group
num C_mean
date B
1-1-2017 subgroup1 6.0 0.279815
subgroup2 6.0 0.428637
1-2-2017 subgroup1 6.0 0.459403
subgroup2 6.0 0.518874
The DataFrame 'group' is a small example showing the same stucture I have so far. In practice there would be a large number of rows in each date group.
I would like to sort the grouped DataFrame 'group' by the values in colum 'C_mean' -- but sorted within the groups in the first column 'date'. The sort should be descending.
So if we look at two values in column 'C_mean' in the group '1-1'2017', we would see 0.428637 and 0.279815 sorted descending. And likewise in the next date group '1-2-2017', the values in 'C_mean' would be sorted descending -- 0.518874 and 0.459403
num C_mean
date B
1-1-2017 subgroup1 6.0 0.428637
subgroup2 6.0 0.279815
1-2-2017 subgroup1 6.0 0.518874
subgroup2 6.0 0.459403
I have tried everything I can find to achieve this but, in every case, I have ended up with a sort of the whole column 'C_mean' -- I need to sort within the date groups.
Can anybody suggest a solution?