2

I have the following df1:

id period color size rate
1    01    red   12   30
1    02    red   12   30
2    01    blue  12   35
3    03    blue  12   35
4    01    blue  12   35
4    02    blue  12   35
5    01    pink  10   40
6    01    pink  10   40

I need to create a new df2 with an index that is an aggregate of 3 columns color-size-rate, then groupby 'period' and get the count of unique ids. My final df should be have the following structure:

index       period   count
red-12-30    01        1
red-12-30    02        1
blue-12-35   01        2
blue-12-35   03        1
blue-12-35   02        1
pink-10-40   01        2

Thank you in advance for your help.

2 Answers 2

3

try .agg('-'.join) and .groupby

df1 =  df.groupby([df[["color", "size", "rate"]].astype(str)\
            .agg("-".join, 1).rename('index'), "period"])\
                .agg(count=("id", "nunique"))\
                .reset_index()
               
print(df1)

        index  period  count
0  blue-12-35       1      2
1  blue-12-35       2      1
2  blue-12-35       3      1
3  pink-10-40       1      2
4   red-12-30       1      1
5   red-12-30       2      1
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Manakin for your quick help. I need the exact structure of the final df as I am further going to use it to create a pivot_table. Any pointers for that?
just need to reset_index @savi
@anky thanks :) i forget you can directly rename an index on a series
0

you can achieve this with a groupby

 df2 = df1.groupby(['color', 'size', 'rate', 'period']).count().reset_index();
 df2['index'] = df2.apply(lambda x: '-'.join([x['color'], x['size'], x['rate']]), axis = 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.