2

I have this kind of dataframe df:

df = pd.DataFrame({'c':[1,1,2,2,3,3],'L0':['a','a','b','c','d','e'],'L1':['a','b','c','e','f','e']})

I'm now trying to get the frequency of each values in columns L0 and L1 for each value of c. The expected output would be:

c    a    b    c    d    e    f
1    3    1    0    0    0    0
2    0    1    2    0    1    0
3    0    0    0    1    2    1

I thought I could use something like:

df.pivot_table(index='c', columns=np.unique(['L0','L1']), aggfunc=f)

but I can't figured out how to describe f that should be a function able to get the value_counts() over multiple columns.

3 Answers 3

3

You could use the crosstab method which by default computes the frequency table of the factors as shown:

>>> df_1 = pd.crosstab(df.c, df.L0)
>>> df_2 = pd.crosstab(df.c, df.L1)

>>> df_1.add(df_2, fill_value=0)

   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1
Sign up to request clarification or add additional context in comments.

Comments

1

Didn't try to describe f but want to solve your problem the other way

In [356]: df.set_index('c').stack().reset_index().groupby(['c', 0]).count().unstack().fillna(0)
Out[356]: 
  level_1                         
0       a    b    c    d    e    f
c                                 
1     3.0  1.0  0.0  0.0  0.0  0.0
2     0.0  1.0  2.0  0.0  1.0  0.0
3     0.0  0.0  0.0  1.0  2.0  1.0

Comments

1

Edit: This is slightly simpler:

In[48]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).unstack().value_counts().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)

See below for explanations.


The function you're looking for is groupby, not pivot. You can then use value_counts on each dataframe, grouped by c values, individually.

This is close to what you were looking for:

In[39] : df.groupby('c').apply(lambda df1: 
             df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose()
         )
Out[39]: 
       a   b   c   d   e   f
c                           
1 L0   2 NaN NaN NaN NaN NaN
  L1   1   1 NaN NaN NaN NaN
2 L0 NaN   1   1 NaN NaN NaN
  L1 NaN NaN   1 NaN   1 NaN
3 L0 NaN NaN NaN   1   1 NaN
  L1 NaN NaN NaN NaN   1   1

In order to sum the values, the end result is surprisingly complicated:

In[46]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose().sum().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)
Out[46]: 
   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.