How do I group by a column, and count values in separate columns (Pandas)

Question

Here's an example data:

data = [['a1', 1, 'a'], ['b1', 2, 'b'], ['a1', 3, 'a'], ['c1', 4, 'c'], ['b1', 5, 'a'], ['a1', 6, 'b'], ['c1', 7, 'a'], ['a1', 8, 'a']] 

df = pd.DataFrame(data, columns = ['user', 'house', 'type']) 

user house type
a1     1    a
b1     2    b
a1     3    a
c1     4    c
b1     5    a
a1     6    b
c1     7    a
a1     8    a

The final output that I want is this (the types need to be their own columns):

user houses a b c    
a1      4   3 1 0
b1      2   1 1 0
c1      2   1 0 1

Currently, I'm able to get it by using the following code:

house = df.groupby(['user']).agg(houses=('house', 'count'))
a = df[df['type']=='a'].groupby(['user']).agg(a=('type', 'count'))
b = df[df['type']=='b'].groupby(['user']).agg(b=('type', 'count'))
c = df[df['type']=='c'].groupby(['user']).agg(c=('type', 'count'))

final = house.merge(a,on='user', how='left').merge(b,on='user', how='left').merge(c,on='user', how='left')

Is there a simpler, cleaner way to do this?

Side note: well done on posting a well structured question :) — Erfan
– Erfan, Commented Nov 3, 2019 at 15:12

anky · Accepted Answer · 2019-11-03 15:03:32Z

5

Here is one way using get_dummies() with groupby() and sum.

df['house']=1
df.drop('type',axis=1).assign(**pd.get_dummies(df['type'])).groupby('user').sum()

      house  a  b  c
user                
a1        4  3  1  0
b1        2  1  1  0
c1        2  1  0  1

answered Nov 3, 2019 at 15:03

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2019-11-03 15:16:22Z

5

I will do crosstab with margins=True

pd.crosstab(df.user,df.type,margins=True,margins_name='House').drop('House')
Out[51]: 
type  a  b  c  House
user                
a1    3  1  0      4
b1    1  1  0      2
c1    1  0  1      2

edited Nov 3, 2019 at 15:16

answered Nov 3, 2019 at 15:05

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

ameise Over a year ago

This code works for this case, but assumes that each house has been assigned a type (which is a fair assumption for the most part). However, if a house is of type NaN, then the total for the "House" column would be incorrect. But this code is still useful when you do need to sum categories in this way.

Erfan · Accepted Answer · 2019-11-03 15:14:00Z

3

Using GroupBy.size with pd.crosstab and join:

grps = pd.crosstab(df['user'], df['type']).join(df.groupby('user')['house'].size())

      a  b  c  house
user                
a1    3  1  0      4
b1    1  1  0      2
c1    1  0  1      2

If you want user back as column, use reset_index:

print(grps.reset_index())

  user  a  b  c  house
0   a1  3  1  0      4
1   b1  1  1  0      2
2   c1  1  0  1      2

edited Nov 3, 2019 at 15:14

answered Nov 3, 2019 at 15:08

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Collectives™ on Stack Overflow

How do I group by a column, and count values in separate columns (Pandas)

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related