1

I have searched and found other questions on this general topic, but I cannot find the answer to this specific question.

I have a dataframe created by appending several dataframes together, followed by groupby and agg procedures.

I have constructed an example dataframe below, just by following the process.

# constructing an example dataframe
import numpy as np
import pandas as pd

rand = np.random.RandomState(1)

df1 = pd.DataFrame({'B': ['subgroup1'] * 6,})
df2 = pd.DataFrame({'B': ['subgroup2'] * 6,})
df2['date'] = '1-1-2017'
df2['C'] = rand.rand(6)

df3 = pd.DataFrame({'B': ['subgroup1'] * 6,})
df3['date'] = '1-2-2017'
df3['C'] = rand.rand(6)

df4 = pd.DataFrame({'B': ['subgroup2'] * 6,})
df4['date'] = '1-2-2017'
df4['C'] = rand.rand(6)

df5 = df1.append(df2)
df6 = df5.append(df3)
df7 = df6.append(df4)
print df7

           B      date         C
0  subgroup1  1-1-2017  0.417022
1  subgroup1  1-1-2017  0.720324
2  subgroup1  1-1-2017  0.000114
3  subgroup1  1-1-2017  0.302333
4  subgroup1  1-1-2017  0.146756
5  subgroup1  1-1-2017  0.092339
0  subgroup2  1-1-2017  0.186260
1  subgroup2  1-1-2017  0.345561
2  subgroup2  1-1-2017  0.396767
3  subgroup2  1-1-2017  0.538817
4  subgroup2  1-1-2017  0.419195
5  subgroup2  1-1-2017  0.685220
0  subgroup1  1-2-2017  0.204452
1  subgroup1  1-2-2017  0.878117
2  subgroup1  1-2-2017  0.027388
3  subgroup1  1-2-2017  0.670468
4  subgroup1  1-2-2017  0.417305
5  subgroup1  1-2-2017  0.558690
0  subgroup2  1-2-2017  0.140387
1  subgroup2  1-2-2017  0.198101
2  subgroup2  1-2-2017  0.800745
3  subgroup2  1-2-2017  0.968262
4  subgroup2  1-2-2017  0.313424
5  subgroup2  1-2-2017  0.692323

Next, I group by 2 columns, and add a new column consisting of the mean of column 'C', and a new column counting the values averaged.

group = df7.groupby(['date', 'B'])['C'].agg({'num' : len, 'C_mean' : np.mean})
print group

                    num    C_mean
date     B                       
1-1-2017 subgroup1  6.0  0.279815
         subgroup2  6.0  0.428637
1-2-2017 subgroup1  6.0  0.459403
         subgroup2  6.0  0.518874

The DataFrame 'group' is a small example showing the same stucture I have so far. In practice there would be a large number of rows in each date group.

I would like to sort the grouped DataFrame 'group' by the values in colum 'C_mean' -- but sorted within the groups in the first column 'date'. The sort should be descending.

So if we look at two values in column 'C_mean' in the group '1-1'2017', we would see 0.428637 and 0.279815 sorted descending. And likewise in the next date group '1-2-2017', the values in 'C_mean' would be sorted descending -- 0.518874 and 0.459403

                    num    C_mean
date     B                       
1-1-2017 subgroup1  6.0  0.428637
         subgroup2  6.0  0.279815
1-2-2017 subgroup1  6.0  0.518874
         subgroup2  6.0  0.459403

I have tried everything I can find to achieve this but, in every case, I have ended up with a sort of the whole column 'C_mean' -- I need to sort within the date groups.

Can anybody suggest a solution?

1 Answer 1

2

I got no response to this, but I did find a solution. Not very elegant, but it got the job done. I'll post it in case anybody else has a similar problem.

First copy the index to a new column

group['date'] = group.index

Then sort by the new column and the 'C_mean' column

group = group.sort_values(['date', 'C_mean'], ascending=[True, False])

This produces the required result

                    num    C_mean                   date
date     B                                              
1-1-2017 subgroup1  6.0  0.279815  (1-1-2017, subgroup1)
         subgroup2  6.0  0.428637  (1-1-2017, subgroup2)
1-2-2017 subgroup1  6.0  0.459403  (1-2-2017, subgroup1)
         subgroup2  6.0  0.518874  (1-2-2017, subgroup2)

Delete the date column which was added -- no longer required

del group['date']

                    num    C_mean
date     B                       
1-1-2017 subgroup1  6.0  0.279815
         subgroup2  6.0  0.428637
1-2-2017 subgroup1  6.0  0.459403
         subgroup2  6.0  0.518874
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.