0

I want to apply value_counts() to multiple columns and reuse the same dataframe further to add more columns. I have the following dataframe as an example.

    id  shop    type    status
0   1   mac      A      open
1   1   mac      B      close
2   1   ikea     B      open
3   1   ikea     A      open
4   1   meta     A      open
5   1   meta     B      close
6   2   meta     B      open
7   2   ikea     B      open
8   2   ikea     B      close
9   3   ikea     A      close
10  3   apple    B      close
11  3   apple    B      open
12  3   apple    A      open
13  4   denim    A      close
14  4   denim    A      close

I want to achieve, the groupby count of both id and shop for each type and status category as shown below.

    id  shop    A    B     close   open
0   1   ikea    1    1      0       2
1   1   mac     1    1      1       1
2   1   meta    1    1      1       1
3   2   ikea    0    2      1       1
4   2   meta    0    1      0       1
5   3   apple   1    2      1       2
6   3   ikea    1    0      1       0
7   4   denim   2    0      2       0

I have tried this so far which works correctly but I don't feel that it is efficient, especially if I have more data and maybe want to use an extra two aggs functions for the same groupby. Also, the merging may not always work in some rare cases.

import pandas as pd
from functools import reduce

df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

df = df.groupby(['id', 'shop'])
df_type = df['type'].value_counts().unstack().reset_index()
df_status = df['status'].value_counts().unstack().reset_index()

df = reduce(lambda df1, df2: pd.merge(df1, df2, how='left', on=['id', 'shop']), [df_type, df_status])

4 Answers 4

3

You can do with groupby() and value_counts:

groups = df.groupby(['id','shop'])
pd.concat([groups['type'].value_counts().unstack(fill_value=0),
           groups['status'].value_counts().unstack(fill_value=0)], 
          axis=1).reset_index()

Or a bit more dynamic:

groups = df.groupby(['id','shop'])
count_cols = ['type','status']
out = pd.concat([groups[c].value_counts().unstack(fill_value=0) 
                for c in count_cols], axis=1).reset_index()

Or with crosstab:

count_cols = ['type','status']
out = pd.concat([pd.crosstab([df['id'],df['shop']], df[c])
                for c in count_cols], axis=1).reset_index()

Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0
Sign up to request clarification or add additional context in comments.

2 Comments

Is it possible to get the sum of all categories for each row using your approach ? For example an extra column AB containing the sum of column A and B
@IbrahimSherif crosstab has option margins or you can chain pd.crosstab().assign(**{c + '_total': lambda x: x.sum(axis=1)})
2

Using crosstab:

out = pd.concat([pd.crosstab([df['id'], df['shop']], df[c])
                 for c in ['type', 'status']],
                axis=1).reset_index()

Or melt+crosstab:

df2 = df.melt(['id', 'shop'])

out = (pd.crosstab([df2['id'], df2['shop']], df2['value'])
         .reset_index()
       )

Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

Comments

1

here is one way to do it using pd.get_dummies


(pd.concat(
    [df, #original dataframe
     pd.get_dummies(df[['type','status']], prefix="", prefix_sep='') # created 1,0 column based on the values under type and status
    ], axis=1)
 .groupby(['id','shop']) # group the data
 .sum()
 .reset_index())

id  shop    A   B   close   open
0   1   ikea    1   1   0   2
1   1   mac     1   1   1   1
2   1   meta    1   1   1   1
3   2   ikea    0   2   1   1
4   2   meta    0   1   0   1
5   3   apple   1   2   1   2
6   3   ikea    1   0   1   0
7   4   denim   2   0   2   0

Comments

0

Here is the whole process from me and you can run it from your platform.

# Module improt
import pandas as pd
import numpy as np

# Data import
    df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

# Data Pre-process
df_unique = df[['id','shop']].groupby(['id','shop']).count().reset_index()
df_AB = df.groupby(['id','shop','type']).count().reset_index()
df_A = df_AB.loc[df_AB['type'] =='A'].rename(columns={'status':'A'})
df_B = df_AB.loc[df_AB['type'] =='B'].rename(columns={'status':'B'})
df_OC = df.groupby(['id','shop','status']).count().reset_index()
df_O = df_OC.loc[df_OC['status'] =='open'].rename(columns={'type':'open'})
df_C = df_OC.loc[df_OC['status'] =='close'].rename(columns={'type':'close'})

# Merging for your final output
df_final = pd.merge(df_unique,df_A[['id','shop','A']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_B[['id','shop','B']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_C[['id','shop','close']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_O[['id','shop','open']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])

# Data Cleaning
df_final['A'] = df_final['A'].fillna(0)
df_final['B'] = df_final['B'].fillna(0)
df_final['open'] = df_final['open'].fillna(0)
df_final['close'] = df_final['close'].fillna(0)

# Output Display
df_final

Attached the picture of output from me:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.