Unique values of two columns for pandas dataframe [duplicate]

Question

Suppose I have pandas data frame with 2 columns:

df: Col1  Col2
      1     1
      1     2
      1     2
      1     2
      3     4
      3     4

Then I want to keep only the unique couple values (col1, col2) of these two columns and give their frequncy:

df2: Col1  Col2  Freq
      1     1     1
      1     2     3
      3     4     2

I think to use df['Col1', 'Col2'].value_counts() but it works only for one column. Does it exist a function to deal with many columns?

Ambiguous title: this does not find the unique values in either Col1 or Col2, but the unique combinations of values in both Col1 and Col2, i.e. the Cartesian product. This might not be what you want, esp, for columns with higher cardinality than boolean (only two values). — smci
– smci, Commented Apr 8, 2020 at 21:33

End genocide - save Gaza · Accepted Answer · 2019-03-20 12:57:23Z

73

You need groupby + size + Series.reset_index:

df = df.groupby(['Col1', 'Col2']).size().reset_index(name='Freq')
print (df)
   Col1  Col2  Freq
0     1     1     1
1     1     2     3
2     3     4     2

edited Mar 20, 2019 at 12:57

End genocide - save Gaza

25k10 gold badges113 silver badges133 bronze badges

answered Jul 4, 2017 at 13:04

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Bowen Liu Over a year ago

Thanks for the amazing answer. I'm trying to understand your answer by running it bit by bit and I have a couple of questions: 1. What shall I use if I only need the Col1 and Col2, namely I only need the unique pairs of value for the first two columns, would your answer still be the most optimal method? 2. Why does df.groupby(['Col1', 'Col2']).size() return data series for me? Thanks again.

jezrael Over a year ago

@BowenLiu - 1. I think it is really fast, maybe some numpy solution should be faster. 2. In my opinion it return Series by design - there is not necessary another column like aggregating mean, sum (df.groupby(['Col1', 'Col2'])['Col3'].sum()), because output is counted by columns define in groupby - Col1 and Col3 - it grouping and also count in same columns. For sum it grouping by Col1 and Col2 and aggregate Col3 - column(s) in list after groupby or if omited like df.groupby(['Col1', 'Col2']).sum() it aggregate sum in all columns.

Bowen Liu Over a year ago

Can't believe I didn't see your reply. Reading it after using pandas for several months makes a lot more sense for me now. The only thing I still don't get is the reset_index(name = 'Freq') part. In the pandas documentation, name is not a kwarg for reset_index. How did you get name the column that was not an index in the groupby result in this way? Thanks.

jezrael Over a year ago

@BowenLiu - oops, there is bad link, need Series.reset_index - name parameter working only with Series

Bowen Liu Over a year ago

Thanks a lot. I realize my understanding of group is quite superficial therefore am trying to deduct some general rules of it. Are there any other aggregate functions, like .size(), that can generate a series without specifying columns (in the format of df.groupby(['Col1']).function()). BTW many of your posts have proved immensely helpful to me. I wonder if you can share how you manage to have such a deep and systematic understanding of Pandas.

|

Quickbeam2k1 · Accepted Answer · 2019-11-16 22:15:18Z

15

You could try

df.groupby(['Col1', 'Col2']).size()

for a different visual output in comparison to jez's answer, you can extend that solution with

pd.DataFrame(df.groupby(['Col1', 'Col2']).size().rename('Freq'))

gives

           Freq
Col1 Col2      
1    1        1
     2        3
3    4        2

edited Nov 16, 2019 at 22:15

answered Jul 4, 2017 at 13:02

Quickbeam2k1

5,4573 gold badges31 silver badges43 bronze badges

Collectives™ on Stack Overflow

Unique values of two columns for pandas dataframe [duplicate]

2 Answers 2

10 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

Comments

Linked

Related