How can I count the frequency of repeated values in dataframe column?

Question

I have a column in a dataframe that is

UC      WR
V001    A, B, C, nan, A, C, D
C001    nan, C, D, A, nan, A
C002    C, B, B, A, A, A
C003    A, C, A, C, B, nan

I'm not sure what I'm doing wrong, but I'm not able to get rid of the nans. From this column, I want to have a different column, or a dictionary that gives me the frequency count of the different values in WR.

UC     WR Count
V001  {A: 2, B:1, C:2, D:1}
C001  {A:2, C:1, D:1}
C002  {A:3, B:2, C:1}
C003  {A:2, B:1, C:2}

or a similar dictionary. Thanks! :)

SpghttCd · Accepted Answer · 2019-03-19 05:00:04Z

3

At first ignoring the nan entries, my approach would be:

df['WR Count'] = df.WR.str.replace(' ', '').str.split(',').apply(Counter)

#                          WR                                    WR Count
# UC                                                                                                        
# V001  A, B, C, nan, A, C, D  {'A': 2, 'B': 1, 'C': 2, 'nan': 1, 'D': 1}                               
# C001   nan, C, D, A, nan, A          {'nan': 2, 'C': 1, 'D': 1, 'A': 2}                               
# C002       C, B, B, A, A, A                    {'C': 1, 'B': 2, 'A': 3}                           
# C003     A, C, A, C, B, nan          {'A': 2, 'C': 2, 'B': 1, 'nan': 1}

Note that if you are sure that the separator is always ', ', then you can hardcode it, which leads to a shorter command:

df['WR Count'] = df.WR.str.split(', ').apply(Counter)

edited Mar 19, 2019 at 5:00

answered Mar 19, 2019 at 0:33

SpghttCd

10.9k2 gold badges23 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ben.T Over a year ago

I think you could directly split by ', ' with a comma and a space, instead of first replacing the space and then split on the comma

SpghttCd Over a year ago

You are right. I kind of automatically made it this way because of so many datasets I saw here that have arbitrary numbers of spaces inbetween... But normally you should be able to rely on some automated pattern, so I'll edit by adding this, thanks.

BENY · Accepted Answer · 2019-03-19 01:26:38Z

1

Just do not make the dict into the cell in pandas, which will make a lots of build-in pandas' nice function not work any more

df.set_index('UC').WR.\
 str.split(', ',expand=True).\
    stack().str.get_dummies().sum(level=0).drop('nan',1)
      A  B  C  D
UC              
V001  2  1  2  1
C001  2  0  1  1
C002  3  2  1  0
C003  2  1  2  0

answered Mar 19, 2019 at 1:26

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Loochie · Accepted Answer · 2019-03-19 09:20:42Z

0

To get the values as dictionaries you may also try:

df['WR Count'] = df['WR'].apply(lambda x: dict(Counter(x.split(', ')))

answered Mar 19, 2019 at 9:20

Loochie

2,47215 silver badges20 bronze badges

Collectives™ on Stack Overflow

How can I count the frequency of repeated values in dataframe column?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related