Compute values frequency in pandas dataframe

Question

I have a pandas dataframe df with this settings

col1 col2
v1   i1
v1   i50
v2   i60
v2   i1
v2   i8 
v10  i8
v10  i1 
v10  i2 
..

I would like to compute how many elments of col1 has a value of col2. And store the results into a dataframe with this setting

col1 frequency
i1   80
i2   195
...  ...

I tried to do this in pandas,

 item_frequency = pd.unique(relevant_data[relevant_data['col2'].isin(pd.unique(relevant_data['col2'].values.ravel()))]['col1'].values.ravel())

which is yielding the error

raise ValueError('Lengths must match to compare')
ValueError: Lengths must match to compare

PS: I'd like to do this in a vectorized manner.

Could you clarify your task, with exact small-size input and result you want to get from this input? — roman
– roman, Commented Sep 29, 2015 at 10:32
Your desired output doesn't match your statement, are you counting purely item frequency or item frequency per transaction? — EdChum
– EdChum, Commented Sep 29, 2015 at 10:43
@RomanPekar, actually item is unique per transaction, is it's irrelevant to put the col1 information. — Mohamed Ali JAMAOUI
– Mohamed Ali JAMAOUI, Commented Sep 29, 2015 at 10:46
@EdChum I have transactions (col1) and items (col2) and I would like to compute how many transactions have an item — Mohamed Ali JAMAOUI
– Mohamed Ali JAMAOUI, Commented Sep 29, 2015 at 10:48

roman · Accepted Answer · 2015-09-29 10:51:39Z

1

It's not quite clear what result you want to get, so if you want to col1, col2, frequency - then you can use groupby() and size():

In [5]: df.groupby(['col1', 'col2']).size()
Out[5]: 
col1  col2
v1    i1      1
      i50     1
v10   i1      1
      i2      1
      i8      1
v2    i1      1
      i60     1
      i8      1

If you want just calculate count of col2, then value_counts() will work:

In [6]: df['col2'].value_counts()
Out[6]: 
i1     3
i8     2
i60    1
i2     1
i50    1
dtype: int64

update

After you updated your description, I see that value_counts() could give you wrong answer if it's possible to have one value more than once per transasction. But you can solve this with drop_duplicates():

In [9]: df.drop_duplicates()['col2'].value_counts()
Out[9]: 
i1     3
i8     2
i60    1
i2     1
i50    1
dtype: int64

edited Sep 29, 2015 at 10:51

answered Sep 29, 2015 at 10:46

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mohamed Ali JAMAOUI Over a year ago

Thanks for ur help, I need to compute the df[col1 where col2 = some value].values_counts()

roman Over a year ago

see updated, I think that this will give you desired answer

Mohamed Ali JAMAOUI Over a year ago

The requirement isn't easy to wrap once mind around, so sorry If I am not being clear. The data doesn't have duplicates when keying by (col1,col2) so no need for drop_duplicates().

Mohamed Ali JAMAOUI Over a year ago

Actually you are right, I did compute df['col2'].value_counts() before, but it gave me number of item occurence more than the number of transactions, which is wrong. I based my analysis on the fact that there were no duplicates, that were I went wrong, thanks

Collectives™ on Stack Overflow

Compute values frequency in pandas dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related