3

I have the following dataframe and I am trying to create a stacked bar plot

import os
from pprint import pprint
import matplotlib.pyplot as plt
import pandas as pd


def classify_data():
    race = ['race1','race1','race1','race1','race2','race2','race2', 'race2']
    qualifier = ['last','first','first','first','last','last','first','first']
    participant = ['rat','rat','cat','cat','rat','dog','dog','dog']
    df = pd.DataFrame(
        {'race':race,
         'qualifier':qualifier,
         'participant':participant

        }
    )
    pprint(df)
    df2 = df.groupby(['race','qualifier'])['race'].count().unstack('qualifier').fillna(0)
    df2[['first','last']].plot(kind='bar', stacked=True)
    plt.show()



classify_data()

I could manage to obtain the following plot. But , I want to create two plots out of my dataframe

One plot containing the following data for the qualifier 'last'

Race1 rat 1
Race1 cat 0
Race1 dog 0 
Race2 rat 1
Race2 dog 1
Race2 cat 0

So the first bar plot would have 2 bars and each bar coded with a different color for the count of participant

Likewise a second plot for qualifier 'first'

EDIT:

  Race1 rat 1
  Race1 cat 2
  Race1 dog 0 
  Race2 rat 0
  Race2 dog 2
  Race2 cat 0

From the original dataframe , I have to create the above two dataframe for creating the stacked plots plot

I am not sure how to use the groupby function and get the count of 'participant' for each 'qualifier' for a given 'race'

EDIT 2 : For qualifier 'last' the desired plot would look like( blue for rat , red for dog).

enter image description here

For qualifier 'first'

enter image description here

Could someone suggest me on how to proceed from here?

4
  • So for each column, you want a stacked plot? Commented Jun 27, 2019 at 15:15
  • @QuangHoang . No not for each column. For the qualifier , first and last, I want two separate stacked plots. So the first bar plot would have 2 bars and each bar coded with a different color for the count of participant Commented Jun 27, 2019 at 15:16
  • it's really hard to follow what you want to achieve, is there a way you can manually create the plots for clarity sake? Commented Jun 27, 2019 at 15:44
  • @ Yuca please check the edit Commented Jun 27, 2019 at 15:56

1 Answer 1

2

IIUC, this is what you want:

df2 = (df.groupby(['race','qualifier','participant'])
         .size()
         .unstack(level=-1)
         .reset_index()
      )

fig,axes = plt.subplots(1,2,figsize=(12,6),sharey=True)
for ax,q in zip(axes.ravel(),['first','last']):
    tmp_df = df2[df2.qualifier.eq(q)]
    tmp_df.plot.bar(x='race', ax=ax, stacked=True)

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

12 Comments

@Natasha see edit for correct coloring of participants.
Excellent! thanks a ton. Can this be plotted in plotly using df2?
Sorry, I'm not familiar with plotly.
If you want them all, you can replace ['first','last'] by, for example: df2.qualifier.unique(). But then you need to match the number of qualifiers with that of axes.
It's the same with .unstack('participant') (note that participant was last in the groupby list. I want participant to be columns so df.plot.bar will color them. Finally subplot title: ax.set_title().
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.