pandas get frequency of values on multiple columns

Question

I have this kind of dataframe df:

df = pd.DataFrame({'c':[1,1,2,2,3,3],'L0':['a','a','b','c','d','e'],'L1':['a','b','c','e','f','e']})

I'm now trying to get the frequency of each values in columns L0 and L1 for each value of c. The expected output would be:

c    a    b    c    d    e    f
1    3    1    0    0    0    0
2    0    1    2    0    1    0
3    0    0    0    1    2    1

I thought I could use something like:

df.pivot_table(index='c', columns=np.unique(['L0','L1']), aggfunc=f)

but I can't figured out how to describe f that should be a function able to get the value_counts() over multiple columns.

Nickil Maveli · Accepted Answer · 2017-02-02 10:40:04Z

3

You could use the crosstab method which by default computes the frequency table of the factors as shown:

>>> df_1 = pd.crosstab(df.c, df.L0)
>>> df_2 = pd.crosstab(df.c, df.L1)

>>> df_1.add(df_2, fill_value=0)

   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1

edited Feb 2, 2017 at 10:40

answered May 17, 2016 at 11:48

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

knagaev · Accepted Answer · 2016-05-17 10:02:39Z

1

Didn't try to describe f but want to solve your problem the other way

In [356]: df.set_index('c').stack().reset_index().groupby(['c', 0]).count().unstack().fillna(0)
Out[356]: 
  level_1                         
0       a    b    c    d    e    f
c                                 
1     3.0  1.0  0.0  0.0  0.0  0.0
2     0.0  1.0  2.0  0.0  1.0  0.0
3     0.0  0.0  0.0  1.0  2.0  1.0

answered May 17, 2016 at 10:02

knagaev

2,96518 silver badges20 bronze badges

Comments

IanS · Accepted Answer · 2016-05-17 10:34:09Z

Edit: This is slightly simpler:

In[48]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).unstack().value_counts().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)

See below for explanations.

The function you're looking for is groupby, not pivot. You can then use value_counts on each dataframe, grouped by c values, individually.

This is close to what you were looking for:

In[39] : df.groupby('c').apply(lambda df1: 
             df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose()
         )
Out[39]: 
       a   b   c   d   e   f
c                           
1 L0   2 NaN NaN NaN NaN NaN
  L1   1   1 NaN NaN NaN NaN
2 L0 NaN   1   1 NaN NaN NaN
  L1 NaN NaN   1 NaN   1 NaN
3 L0 NaN NaN NaN   1   1 NaN
  L1 NaN NaN NaN NaN   1   1

In order to sum the values, the end result is surprisingly complicated:

In[46]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose().sum().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)
Out[46]: 
   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1

Collectives™ on Stack Overflow

pandas get frequency of values on multiple columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related