2

Some excercise is asking to "add a secondary name" to a df column when applying an aggregation function.

Given following DF:

stype = ['s1','s1','s1','s1',
    's2','s2','s2','s2']
dtype = ['d1','d1','d2','d2',
        'd1','d1','d2','d2']
qty = [2, 1.4, 7, 3, 
        4, 1, 2.0, 3]
df = pd.DataFrame({'s_type':stype,
                    'd_type':dtype,
                    'qty':qty})

When grouping by the first 2 columns and applying an agg function like this

new_df=df.groupby(['s_type','d_type'],sort =  False).agg({'qty':'median'})

And do print(new_df)

I get this output, notice the qty is moved up a line (I assume this is normal behavior) because is the one column where agg function was applied

               qty
s_type d_type     
s1     d1      1.7
       d2      5.0
s2     d1      2.5
       d2      2.5

Question(s ) would it be

Is it possible to setc secondary column name under qty label/name ?, to get an output like this:

               qty
s_type d_type  Median   
s1     d1      1.7
       d2      5.0
s2     d1      2.5
       d2      2.5

Also Why qty moves up 1 line ? (I'm using Spyder and I've seen this in the Anaconda/python console as well, not sure if its like a feature or pandas normal behavior).

One Last update: in case you need to use sort_values you will have to use the tuple for the column you add a secondary index/name, pretty much in the same way when applying the aggregate function (see the answer)

new_df=new_df.sort_values([('qty','MEDIAN')])

So you get the DF sorted by the column where agg function was applied and where we add an index

                 qty
              MEDIAN
s_type d_type       
s1     d1        1.7
s2     d1        2.5
       d2        2.5
s1     d2        5.0

python v3.6

pandas v0.23

Thank you.

1 Answer 1

2

Super simple. Pass "median" as a list inside agg:

result = df.groupby(['s_type','d_type'], sort=False).agg({'qty': ['median']})
result
                 qty
              median
s_type d_type       
s1     d1        1.7
       d2        5.0
s2     d1        2.5
       d2        2.5

As for "why qty moves up 1 line", that means the first two columns are actually the index in the output. Querying result.index will make it obvious:

result.index

MultiIndex(levels=[['s1', 's2'], ['d1', 'd2']],
           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['s_type', 'd_type'])

You can instead specify as_index=False when calling groupby,

df.groupby(['s_type','d_type'], sort=False, as_index=False).agg({'qty': ['median']})

  s_type d_type    qty
                median
0     s1     d1    1.7
1     s1     d2    5.0
2     s2     d1    2.5
3     s2     d2    2.5

So the groupers remain columns in the output.


You can change the name in the output by passing a list of tuples to agg:

df.groupby(['s_type','d_type'], sort=False).agg(
    {'qty': [('MEDIAN', 'median')]}
)

                 qty
              MEDIAN
s_type d_type       
s1     d1        1.7
       d2        5.0
s2     d1        2.5
       d2        2.5
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.