0

I have a column in a dataframe that is

UC      WR
V001    A, B, C, nan, A, C, D
C001    nan, C, D, A, nan, A
C002    C, B, B, A, A, A
C003    A, C, A, C, B, nan

I'm not sure what I'm doing wrong, but I'm not able to get rid of the nans. From this column, I want to have a different column, or a dictionary that gives me the frequency count of the different values in WR.

UC     WR Count
V001  {A: 2, B:1, C:2, D:1}
C001  {A:2, C:1, D:1}
C002  {A:3, B:2, C:1}
C003  {A:2, B:1, C:2}

or a similar dictionary. Thanks! :)

3 Answers 3

3

At first ignoring the nan entries, my approach would be:

df['WR Count'] = df.WR.str.replace(' ', '').str.split(',').apply(Counter)

#                          WR                                    WR Count
# UC                                                                                                        
# V001  A, B, C, nan, A, C, D  {'A': 2, 'B': 1, 'C': 2, 'nan': 1, 'D': 1}                               
# C001   nan, C, D, A, nan, A          {'nan': 2, 'C': 1, 'D': 1, 'A': 2}                               
# C002       C, B, B, A, A, A                    {'C': 1, 'B': 2, 'A': 3}                           
# C003     A, C, A, C, B, nan          {'A': 2, 'C': 2, 'B': 1, 'nan': 1} 

Note that if you are sure that the separator is always ', ', then you can hardcode it, which leads to a shorter command:

df['WR Count'] = df.WR.str.split(', ').apply(Counter)
Sign up to request clarification or add additional context in comments.

2 Comments

I think you could directly split by ', ' with a comma and a space, instead of first replacing the space and then split on the comma
You are right. I kind of automatically made it this way because of so many datasets I saw here that have arbitrary numbers of spaces inbetween... But normally you should be able to rely on some automated pattern, so I'll edit by adding this, thanks.
1

Just do not make the dict into the cell in pandas, which will make a lots of build-in pandas' nice function not work any more

df.set_index('UC').WR.\
 str.split(', ',expand=True).\
    stack().str.get_dummies().sum(level=0).drop('nan',1)
      A  B  C  D
UC              
V001  2  1  2  1
C001  2  0  1  1
C002  3  2  1  0
C003  2  1  2  0

Comments

0

To get the values as dictionaries you may also try:

df['WR Count'] = df['WR'].apply(lambda x: dict(Counter(x.split(', ')))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.