2

I have a pandas dataframe df with this settings

col1 col2
v1   i1
v1   i50
v2   i60
v2   i1
v2   i8 
v10  i8
v10  i1 
v10  i2 
..

I would like to compute how many elments of col1 has a value of col2. And store the results into a dataframe with this setting

col1 frequency
i1   80
i2   195
...  ...

I tried to do this in pandas,

 item_frequency = pd.unique(relevant_data[relevant_data['col2'].isin(pd.unique(relevant_data['col2'].values.ravel()))]['col1'].values.ravel())

which is yielding the error

raise ValueError('Lengths must match to compare')
ValueError: Lengths must match to compare

PS: I'd like to do this in a vectorized manner.

6
  • 1
    Could you clarify your task, with exact small-size input and result you want to get from this input? Commented Sep 29, 2015 at 10:32
  • so result should be col1, col2, frequency? Commented Sep 29, 2015 at 10:43
  • 1
    Your desired output doesn't match your statement, are you counting purely item frequency or item frequency per transaction? Commented Sep 29, 2015 at 10:43
  • @RomanPekar, actually item is unique per transaction, is it's irrelevant to put the col1 information. Commented Sep 29, 2015 at 10:46
  • @EdChum I have transactions (col1) and items (col2) and I would like to compute how many transactions have an item Commented Sep 29, 2015 at 10:48

1 Answer 1

1

It's not quite clear what result you want to get, so if you want to col1, col2, frequency - then you can use groupby() and size():

In [5]: df.groupby(['col1', 'col2']).size()
Out[5]: 
col1  col2
v1    i1      1
      i50     1
v10   i1      1
      i2      1
      i8      1
v2    i1      1
      i60     1
      i8      1

If you want just calculate count of col2, then value_counts() will work:

In [6]: df['col2'].value_counts()
Out[6]: 
i1     3
i8     2
i60    1
i2     1
i50    1
dtype: int64

update

After you updated your description, I see that value_counts() could give you wrong answer if it's possible to have one value more than once per transasction. But you can solve this with drop_duplicates():

In [9]: df.drop_duplicates()['col2'].value_counts()
Out[9]: 
i1     3
i8     2
i60    1
i2     1
i50    1
dtype: int64
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for ur help, I need to compute the df[col1 where col2 = some value].values_counts()
see updated, I think that this will give you desired answer
The requirement isn't easy to wrap once mind around, so sorry If I am not being clear. The data doesn't have duplicates when keying by (col1,col2) so no need for drop_duplicates().
Actually you are right, I did compute df['col2'].value_counts() before, but it gave me number of item occurence more than the number of transactions, which is wrong. I based my analysis on the fact that there were no duplicates, that were I went wrong, thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.