groupby two columns and count unique values from a third column

Question

I have the following df1:

id period color size rate
1    01    red   12   30
1    02    red   12   30
2    01    blue  12   35
3    03    blue  12   35
4    01    blue  12   35
4    02    blue  12   35
5    01    pink  10   40
6    01    pink  10   40

I need to create a new df2 with an index that is an aggregate of 3 columns color-size-rate, then groupby 'period' and get the count of unique ids. My final df should be have the following structure:

index       period   count
red-12-30    01        1
red-12-30    02        1
blue-12-35   01        2
blue-12-35   03        1
blue-12-35   02        1
pink-10-40   01        2

Thank you in advance for your help.

Umar.H · Accepted Answer · 2020-07-22 14:24:06Z

3

try .agg('-'.join) and .groupby

df1 =  df.groupby([df[["color", "size", "rate"]].astype(str)\
            .agg("-".join, 1).rename('index'), "period"])\
                .agg(count=("id", "nunique"))\
                .reset_index()
               
print(df1)

        index  period  count
0  blue-12-35       1      2
1  blue-12-35       2      1
2  blue-12-35       3      1
3  pink-10-40       1      2
4   red-12-30       1      1
5   red-12-30       2      1

edited Jul 22, 2020 at 14:24

answered Jul 22, 2020 at 14:12

Umar.H

23.1k8 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

savi Over a year ago

Thanks Manakin for your quick help. I need the exact structure of the final df as I am further going to use it to create a pivot_table. Any pointers for that?

Umar.H Over a year ago

just need to reset_index @savi

Umar.H Over a year ago

@anky thanks :) i forget you can directly rename an index on a series

Hicham Zouarhi · Accepted Answer · 2020-07-22 14:14:33Z

0

you can achieve this with a groupby

 df2 = df1.groupby(['color', 'size', 'rate', 'period']).count().reset_index();
 df2['index'] = df2.apply(lambda x: '-'.join([x['color'], x['size'], x['rate']]), axis = 1)

answered Jul 22, 2020 at 14:14

Hicham Zouarhi

1,0801 gold badge18 silver badges30 bronze badges

Collectives™ on Stack Overflow

groupby two columns and count unique values from a third column

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related