1

I have a dataframe where I have a variable 'Gender' (0 or 1) indicating if one is Male or Female, and another variable 'Dis' which says the state of the Disease (0,1,2 or 3).

> df.head()
   Gender  Dis
0     1     2
1     0     0
2     0     1
3     1     3
4     0     0
5     0     1

I want to make a barplot with the count values for each one of the'Dis' values but I want it to be separated by Gender, i.e, I want two bars for each one of the states of the disease. I want this:

enter image description here

However, I can't do this barplot automatically without manually writing the count values of each one. I had to check the count values for each one of the combinations aside. I produced this plot manually with the following:

 X = ['0','1','2','3']
 M = [43,9,20,11]
 F = [118,21,168,20]

 X_axis = np.arange(len(X))

plt.bar(X_axis - 0.2, M, 0.4, label = 'Male')
plt.bar(X_axis + 0.2, F, 0.4, label = 'Female')

plt.xticks(X_axis, X)
plt.xlabel("")
plt.ylabel("")
plt.legend()
plt.title("title")
def autolabel(rects):
   for rect in rects:
      h = rect.get_height()
      ax.text(rect.get_x()+rect.get_width()/2., 1.05*h, '%d'%int(h),
              ha='center', va='bottom')
plt.show()

Can I do something more "automatic" directly from the dataframe? Also, can I also display the count values on top of each bar?

2 Answers 2

1

Let's try with crosstab + DataFrame.plot:

plot_df = (
    pd.crosstab(df['Dis'], df['Gender'])
        .rename(columns={0: 'Male', 1: 'Female'})
)

ax = plot_df.plot(kind='bar', rot=0, xlabel='', ylabel='', title='title')
plt.show()

crosstab will produce the counts for Male/Female per Dis.

rename is used to turn the column names 0/1 to Male/Female:

plot_df:

Gender  Male  Female
Dis                 
0        119     128
1        140     121
2        124     120
3        112     136

plot

Moving legend, and values on top of bars:

ax = plot_df.plot(kind='bar', rot=0, xlabel='', ylabel='', title='title')
for container in ax.containers:
    ax.bar_label(container)

plt.legend(title='Gender', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

plot 2


To add percentages to the top of the columns:

  1. divide plot_df by the column totals
  2. format as desired
  3. zip with containers to add bar labels
plot_df = (
    pd.crosstab(df['Dis'], df['Gender'])
        .rename(columns={0: 'Male', 1: 'Female'})
)

# Calculate Percentages and format
labels_df = (
    plot_df.div(plot_df.sum(axis=0)).mul(100).applymap('{:.2f}%'.format)
)
ax = plot_df.plot(kind='bar', rot=0, figsize=(9, 6), width=0.8,
                  xlabel='', ylabel='', title='title')

for container, col in zip(ax.containers, labels_df):
    ax.bar_label(container, labels=labels_df[col])

plt.legend(title='Gender', bbox_to_anchor=(1.01, 1), loc='upper left')
plt.tight_layout()
plt.show()

labels_df:

Gender    Male  Female
Dis                   
0       24.04%  25.35%
1       28.28%  23.96%
2       25.05%  23.76%
3       22.63%  26.93%

plot 3


Sample Data and imports used:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

np.random.seed(5)
df = pd.DataFrame({'Gender': np.random.choice([0, 1], 1000),
                   'Dis': np.random.choice([0, 1, 2, 3], 1000)})
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, is there a way to put percetange instead?
Thanks. but I referring to show the percentage for each Gender.. For example the blue bar would be 25% at 0; 28% at 1; 27% at 2; 20% at 3 and similar to the orange bar.
Easy fix, just sum on axis=0 instead see the edits. @Numbermind
0

If you want to do this with a for loop:

import pandas as pd  
import numpy as np
import matplotlib.pyplot as plt

# assign data of lists.  
data = {'Gender': [1,0,0,1,0,0,1,1], 'Dis': [2,0,1,3,0,1,0,1]}  

# Create DataFrame  
df = pd.DataFrame(data)  

# Print the output.  
print(df)  

Then you create empty variables:

number_males_dis_0 = 0
number_females_dis_0 = 0

number_males_dis_1 = 0
number_females_dis_1 = 0

number_males_dis_2 = 0
number_females_dis_2 = 0

number_males_dis_3 = 0
number_females_dis_3 = 0

for i in range(0,len(data['Dis'])):
  #print(i)
  #dis = 0
  if data['Dis'][i] == 0 and data['Gender'][i] == 0:
    number_males_dis_0 += 1
  elif data['Dis'][i] == 0 and data['Gender'][i] == 1:
    number_females_dis_0 += 1
  #dis = 1
  elif data['Dis'][i] == 1 and data['Gender'][i] == 0:
    number_males_dis_1 += 1
  elif data['Dis'][i] == 1 and data['Gender'][i] == 1:
    number_females_dis_1 += 1
  #dis = 2
  elif data['Dis'][i] == 2 and data['Gender'][i] == 0:
    number_males_dis_2 += 1
  elif data['Dis'][i] == 2 and data['Gender'][i] == 1:
    number_females_dis_2 += 1
  #dis = 3
  elif data['Dis'][i] == 3 and data['Gender'][i] == 0:
    number_males_dis_3 += 1
  elif data['Dis'][i] == 3 and data['Gender'][i] == 1:
    number_females_dis_3 += 1

Then the plot:

X = ['0','1','2','3']
M = [number_males_dis_0,number_males_dis_1,number_males_dis_2,number_males_dis_3]
F = [number_females_dis_0,number_females_dis_1,number_females_dis_2,number_females_dis_3]

X_axis = np.arange(len(X))

plt.bar(X_axis - 0.2, M, 0.4, label = 'Male')
plt.bar(X_axis + 0.2, F, 0.4, label = 'Female')

plt.xticks(X_axis, X)
plt.xlabel("")
plt.ylabel("")
plt.ylim(0,max([max(F),max(M)])+0.5)
plt.legend()
plt.title("title")
# Text on the top of each bar
for i in range(0,4):
    plt.text(x = i - 0.25 , y = M[i] + 0.05, s = M[i], size = 10)
    plt.text(x = i + 0.15 , y = F[i] + 0.05, s = F[i], size = 10)
plt.show()

Result: Result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.