Counting the frequency a value appears in a Pandas Dataframe ROW

Question

I have a Pandas Dataframe that has a row for each user. Each user took a survey that captured scores from 0 - 5 for a series of survey questions. It looks something like this:

df1 = pd.DataFrame({'User': ['user_a', 'user_b', 'user_c'], 'Cat1_score': [0, 5, 3], 'Cat2_score': [0, 2, 5], 'Cat3_score': [4, 5, 1})

I want to count across the row, not the column and I just can't wrap my head around it to call the right method(s).

If I use:

df1.count(axis='columns')

That just tells me the number of cells that are not NULL.

This following is closer to what I want, but I have 100 different columns to evaluate for each row, and I don't want to manually have to type each out.

df1.value_counts('column_name')

What I would really like is to end up with a data frame that looks something like this:

df2 = pd.DataFrame({'User': ['user_a', 'user_b', 'user_c'], 'zero': [2, 0, 0], 'one': [0, 0, 1], 'two': [0, 1, 0], 'three': [0, 0, 1], 'four': [1, 0, 0], 'five': [0, 2, 1]})

I want to count the frequency of how many of the users' respons = 0, or = 1, or = 5 ect. This might be a case of Friday-afternoon-at-work-lack-of-creative-thinking-brain if the answer is obvious.

UPDATE: The suggested answer found in this thread doesn't produce the best output for my needs. The code below produces a very clean data frame that I can use to join with other user tables I have and then save the resulting table to Excel.

possible duplication of count frequency of element in dataframe on row wise and How to get the frequency of a specific value in each row of pandas dataframe — Mario
– Mario, Commented Aug 11, 2023 at 19:35
Does this answer your question? count frequency of element in dataframe on row wise — Mario
– Mario, Commented Aug 11, 2023 at 19:43

mozway · Accepted Answer · 2023-08-11 18:29:47Z

2

Using a crosstab:

s = df1.set_index('User').stack()
out = (pd.crosstab(s.index.get_level_values('User'), s)
         .rename_axis(index='User', columns=None).reset_index()
      )

Variant:

tmp = df1.melt('User')

out = (pd.crosstab(tmp['User'], tmp['value'])
         .rename_axis(columns=None).reset_index()
      )

Output:

     User  0  1  2  3  4  5
0  user_a  2  0  0  0  1  0
1  user_b  0  0  1  0  0  2
2  user_c  0  1  0  1  0  1

edited Aug 11, 2023 at 18:29

answered Aug 11, 2023 at 18:24

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mario Over a year ago

Here they offered out = pd.crosstab(tmp.User, tmp.value.fillna('NaN')) once using melt + crosstab approach.

Mario · Accepted Answer · 2023-08-11 20:09:43Z

0

You can do this in one line using Series.value_counts per rows in DataFrame.apply:

out = df1.set_index('User').apply(lambda x: x.value_counts(), axis=1).fillna(0).astype(int)

print(out)
#        0  1  2  3  4  5
#User                    
#user_a  2  0  0  0  1  0
#user_b  0  0  1  0  0  2
#user_c  0  1  0  1  0  1

Alternatively using DataFrame.melt with DataFrame.pivot_table:

out = df1.melt('User').pivot_table(index='User', columns='value', aggfunc='size', fill_value=0)

print(out)
#value   0  1  2  3  4  5
#User                    
#user_a  2  0  0  0  1  0
#user_b  0  0  1  0  0  2
#user_c  0  1  0  1  0  1

This answer is inspired by @jezrael; Ref.

B̶a̶s̶e̶d̶ o̶n̶ [̶t̶h̶i̶s̶ i̶n̶v̶e̶s̶t̶i̶g̶a̶t̶i̶o̶n̶]̶(̶h̶t̶t̶p̶s̶:̶//s̶t̶a̶c̶k̶o̶v̶e̶r̶f̶l̶o̶w̶.c̶o̶m̶/a̶/7̶6̶0̶0̶2̶0̶6̶1̶/1̶0̶4̶5̶2̶7̶0̶0̶)̶, i̶t̶ s̶e̶e̶m̶s̶ ̶v̶a̶l̶u̶e̶_̶c̶o̶u̶n̶t̶s̶(̶)̶̶ i̶s̶ t̶h̶e̶ b̶e̶t̶t̶e̶r̶ w̶a̶y̶ t̶o̶ g̶o̶ i̶n̶ t̶e̶r̶m̶s̶ o̶f̶ e̶f̶f̶i̶c̶i̶e̶n̶c̶y̶!̶

edited Aug 11, 2023 at 20:09

answered Aug 11, 2023 at 19:25

Mario

2,0944 gold badges34 silver badges81 bronze badges

2 Comments

mozway Over a year ago

Using apply on axis=1 is inefficient. crosstab is the equivalent of value_counts when you have groups. The scenario for the timings which you referenced is different, this is without groups. In this case value_counts is indeed the most efficient.

Mario Over a year ago

oh, thanks I was not aware of with\without group scenario. I edited my answer. Thanks for this input.

Collectives™ on Stack Overflow

Counting the frequency a value appears in a Pandas Dataframe ROW

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related