I have a data frame with 3 columns. Each column contains yes,no, or nan.
I am trying to find the frequency of each column based on column a. I was able to do this with describe().
import pandas as pd, numpy as np
df2 = pd.DataFrame({'a':['yes','yes','no','yes','no','yes'],
'b':['no','yes','no','yes','no','no'],
'c':['yes','yes','yes','no','no', np.nan]})
df2.groupby('a').describe().transpose()
a no yes
count unique top freq count unique top freq
b 2 1 no 2 4 2 no 2
c 2 2 no 1 3 2 yes 2
I am having trouble selecting the describe columns I want. Below is an example of how I would like it to look. The freq/total_count column is the freq over total freq of the row. For example, b & no is 2/6.
a no yes
count top freq freq/total_count count top freq freq/total_count
b 2 no 2 33% 4 no 2 33%
c 2 no 1 20% 3 yes 2 40%
Please let me know if more information is needed.
2+2=4and last row is1+2=32+4=6and2+3=5because I wanted to do it over the total number of observations