4

Here's an example data:

data = [['a1', 1, 'a'], ['b1', 2, 'b'], ['a1', 3, 'a'], ['c1', 4, 'c'], ['b1', 5, 'a'], ['a1', 6, 'b'], ['c1', 7, 'a'], ['a1', 8, 'a']] 

df = pd.DataFrame(data, columns = ['user', 'house', 'type']) 

user house type
a1     1    a
b1     2    b
a1     3    a
c1     4    c
b1     5    a
a1     6    b
c1     7    a
a1     8    a

The final output that I want is this (the types need to be their own columns):

user houses a b c    
a1      4   3 1 0
b1      2   1 1 0
c1      2   1 0 1

Currently, I'm able to get it by using the following code:

house = df.groupby(['user']).agg(houses=('house', 'count'))
a = df[df['type']=='a'].groupby(['user']).agg(a=('type', 'count'))
b = df[df['type']=='b'].groupby(['user']).agg(b=('type', 'count'))
c = df[df['type']=='c'].groupby(['user']).agg(c=('type', 'count'))

final = house.merge(a,on='user', how='left').merge(b,on='user', how='left').merge(c,on='user', how='left')

Is there a simpler, cleaner way to do this?

1
  • 2
    Side note: well done on posting a well structured question :) Commented Nov 3, 2019 at 15:12

3 Answers 3

5

Here is one way using get_dummies() with groupby() and sum.

df['house']=1
df.drop('type',axis=1).assign(**pd.get_dummies(df['type'])).groupby('user').sum()

      house  a  b  c
user                
a1        4  3  1  0
b1        2  1  1  0
c1        2  1  0  1
Sign up to request clarification or add additional context in comments.

Comments

5

I will do crosstab with margins=True

pd.crosstab(df.user,df.type,margins=True,margins_name='House').drop('House')
Out[51]: 
type  a  b  c  House
user                
a1    3  1  0      4
b1    1  1  0      2
c1    1  0  1      2

1 Comment

This code works for this case, but assumes that each house has been assigned a type (which is a fair assumption for the most part). However, if a house is of type NaN, then the total for the "House" column would be incorrect. But this code is still useful when you do need to sum categories in this way.
3

Using GroupBy.size with pd.crosstab and join:

grps = pd.crosstab(df['user'], df['type']).join(df.groupby('user')['house'].size())

      a  b  c  house
user                
a1    3  1  0      4
b1    1  1  0      2
c1    1  0  1      2

If you want user back as column, use reset_index:

print(grps.reset_index())

  user  a  b  c  house
0   a1  3  1  0      4
1   b1  1  1  0      2
2   c1  1  0  1      2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.