Pandas groupby multiple columns with value_counts function

Question

I want to apply value_counts() to multiple columns and reuse the same dataframe further to add more columns. I have the following dataframe as an example.

    id  shop    type    status
0   1   mac      A      open
1   1   mac      B      close
2   1   ikea     B      open
3   1   ikea     A      open
4   1   meta     A      open
5   1   meta     B      close
6   2   meta     B      open
7   2   ikea     B      open
8   2   ikea     B      close
9   3   ikea     A      close
10  3   apple    B      close
11  3   apple    B      open
12  3   apple    A      open
13  4   denim    A      close
14  4   denim    A      close

I want to achieve, the groupby count of both id and shop for each type and status category as shown below.

    id  shop    A    B     close   open
0   1   ikea    1    1      0       2
1   1   mac     1    1      1       1
2   1   meta    1    1      1       1
3   2   ikea    0    2      1       1
4   2   meta    0    1      0       1
5   3   apple   1    2      1       2
6   3   ikea    1    0      1       0
7   4   denim   2    0      2       0

I have tried this so far which works correctly but I don't feel that it is efficient, especially if I have more data and maybe want to use an extra two aggs functions for the same groupby. Also, the merging may not always work in some rare cases.

import pandas as pd
from functools import reduce

df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

df = df.groupby(['id', 'shop'])
df_type = df['type'].value_counts().unstack().reset_index()
df_status = df['status'].value_counts().unstack().reset_index()

df = reduce(lambda df1, df2: pd.merge(df1, df2, how='left', on=['id', 'shop']), [df_type, df_status])

Quang Hoang · Accepted Answer · 2022-09-14 19:14:11Z

3

You can do with groupby() and value_counts:

groups = df.groupby(['id','shop'])
pd.concat([groups['type'].value_counts().unstack(fill_value=0),
           groups['status'].value_counts().unstack(fill_value=0)], 
          axis=1).reset_index()

Or a bit more dynamic:

groups = df.groupby(['id','shop'])
count_cols = ['type','status']
out = pd.concat([groups[c].value_counts().unstack(fill_value=0) 
                for c in count_cols], axis=1).reset_index()

Or with crosstab:

count_cols = ['type','status']
out = pd.concat([pd.crosstab([df['id'],df['shop']], df[c])
                for c in count_cols], axis=1).reset_index()

Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

answered Sep 14, 2022 at 19:14

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ibrahim Sherif Over a year ago

Is it possible to get the sum of all categories for each row using your approach ? For example an extra column AB containing the sum of column A and B

Quang Hoang Over a year ago

@IbrahimSherif crosstab has option margins or you can chain pd.crosstab().assign(**{c + '_total': lambda x: x.sum(axis=1)})

mozway · Accepted Answer · 2022-09-14 19:27:05Z

2

Using crosstab:

out = pd.concat([pd.crosstab([df['id'], df['shop']], df[c])
                 for c in ['type', 'status']],
                axis=1).reset_index()

Or melt+crosstab:

df2 = df.melt(['id', 'shop'])

out = (pd.crosstab([df2['id'], df2['shop']], df2['value'])
         .reset_index()
       )

Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

edited Sep 14, 2022 at 19:27

answered Sep 14, 2022 at 19:19

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Naveed · Accepted Answer · 2022-09-14 19:25:33Z

1

here is one way to do it using pd.get_dummies


(pd.concat(
    [df, #original dataframe
     pd.get_dummies(df[['type','status']], prefix="", prefix_sep='') # created 1,0 column based on the values under type and status
    ], axis=1)
 .groupby(['id','shop']) # group the data
 .sum()
 .reset_index())


id  shop    A   B   close   open
0   1   ikea    1   1   0   2
1   1   mac     1   1   1   1
2   1   meta    1   1   1   1
3   2   ikea    0   2   1   1
4   2   meta    0   1   0   1
5   3   apple   1   2   1   2
6   3   ikea    1   0   1   0
7   4   denim   2   0   2   0

edited Sep 14, 2022 at 19:25

answered Sep 14, 2022 at 19:20

Naveed

11.7k2 gold badges16 silver badges21 bronze badges

Comments

tripleee · Accepted Answer · 2024-05-14 09:27:30Z

Here is the whole process from me and you can run it from your platform.

# Module improt
import pandas as pd
import numpy as np

# Data import
    df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

# Data Pre-process
df_unique = df[['id','shop']].groupby(['id','shop']).count().reset_index()
df_AB = df.groupby(['id','shop','type']).count().reset_index()
df_A = df_AB.loc[df_AB['type'] =='A'].rename(columns={'status':'A'})
df_B = df_AB.loc[df_AB['type'] =='B'].rename(columns={'status':'B'})
df_OC = df.groupby(['id','shop','status']).count().reset_index()
df_O = df_OC.loc[df_OC['status'] =='open'].rename(columns={'type':'open'})
df_C = df_OC.loc[df_OC['status'] =='close'].rename(columns={'type':'close'})

# Merging for your final output
df_final = pd.merge(df_unique,df_A[['id','shop','A']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_B[['id','shop','B']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_C[['id','shop','close']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_O[['id','shop','open']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])

# Data Cleaning
df_final['A'] = df_final['A'].fillna(0)
df_final['B'] = df_final['B'].fillna(0)
df_final['open'] = df_final['open'].fillna(0)
df_final['close'] = df_final['close'].fillna(0)

# Output Display
df_final

Attached the picture of output from me:

Collectives™ on Stack Overflow

Pandas groupby multiple columns with value_counts function

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related