Modify describe function to remove and add columns

Question

I have a data frame with 3 columns. Each column contains yes,no, or nan. I am trying to find the frequency of each column based on column a. I was able to do this with describe().

import pandas as pd, numpy as np

df2 = pd.DataFrame({'a':['yes','yes','no','yes','no','yes'],
                        'b':['no','yes','no','yes','no','no'],
                        'c':['yes','yes','yes','no','no', np.nan]})

df2.groupby('a').describe().transpose()

a    no                   yes                 
  count unique top freq count unique  top freq
b     2      1  no    2     4      2   no    2
c     2      2  no    1     3      2  yes    2

I am having trouble selecting the describe columns I want. Below is an example of how I would like it to look. The freq/total_count column is the freq over total freq of the row. For example, b & no is 2/6.

a    no                                      yes                
  count top freq freq/total_count   count top freq freq/total_count
b     2  no    2     33%             4    no    2     33% 
c     2  no    1     20%             3   yes    2     40%

Please let me know if more information is needed.

Sorry but why aren't the expected values 50% 50%, 0.333% 0.666%? as the first row total is 2+2=4 and last row is 1+2=3 — EdChum
– EdChum, Commented Feb 16, 2016 at 15:50
I wanted to do it over 2+4=6 and 2+3=5 because I wanted to do it over the total number of observations — collarblind
– collarblind, Commented Feb 16, 2016 at 15:59

grey_ranger · Accepted Answer · 2016-02-16 17:40:37Z

2

You're on the right track. The df2.groupby('a').describe().transpose() command gives a DataFrame with a MultiIndex. To select/manipulate individual pieces of the DataFrame, you have to first select the 'yes' or 'no' index, then the column index.

import pandas as pd, numpy as np

df2 = pd.DataFrame({'a':['yes','yes','no','yes','no','yes'],
                    'b':['no','yes','no','yes','no','no'],
                    'c':['yes','yes','yes','no','no', np.nan]})

data = df2.groupby('a').describe().transpose()

data['no','freq/total_count']=np.nan
data['yes','freq/total_count']=np.nan

for ind in data.index:
    data['no','freq/total_count'][ind] = data['no']['freq'][ind]/(data['no']['count'][ind]+data['yes']['count'][ind])*100
    data['yes','freq/total_count'][ind] = data['yes']['freq'][ind]/(data['no']['count'][ind]+data['yes']['count'][ind])*100


data['no','freq/total_count'] = data['no','freq/total_count'].map('{0:.0f}%'.format)
data['yes','freq/total_count'] = data['yes','freq/total_count'].map('{0:.0f}%'.format)

The output is

a   no                          yes                           no                 yes
    count  unique  top   freq   count   unique   top   freq   freq/total_count   freq/total_count
b   2      1       no    2      4       2        no    2      33%                33%
c   2      2       no    1      3       2        yes   2      20%                40%

To pretty print this, we want to remove the 'unique' column header. Then put the 'no' section together and the 'yes' section together.

del data['no','unique']
del data['yes','unique']
pd.concat([data['no'],data['yes']],axis=1,keys=['no','yes'])

Giving the final output:

a   no                                     yes
    count  top   freq   freq/total_count   count   top   freq   freq/total_count
b   2      no    2      33%                4       no    2      33%
c   2      no    1      20%                3       yes   2      40%

edited Feb 16, 2016 at 17:40

answered Feb 16, 2016 at 15:57

grey_ranger

1,0321 gold badge10 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

collarblind Over a year ago

thanks, but how do you get rid of unique for all columns?

grey_ranger Over a year ago

The command del data['no', 'unique'] will delete the unique in the 'no' section. Do the same with 'yes' as well

Collectives™ on Stack Overflow

Modify describe function to remove and add columns

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related