0

I have the following barplot. It is showing the distribution of the letters in my dataset (x) in percentage (y). Inside this barplot I want to add that, for example, the 10% of L is 'male' and the 60% is female,10% is neutral,10% is other and 10% missing. For all the letters, like the second attached plot; meaning: in all the L analyzed the 10% is male etc. A stacked barplot inside a barplot, maybe using the female etc percentage label inside the bar since it is on a different scale (each letters sum to 100%). How can I do that in python? Thanks a lot!

enter image description here

enter image description here The reproducible code:

data=
{'L': 0.10128343899798979,
 'A': 0.04587392402453482,
 'G': 0.05204199096266515,
 'V': 0.08343212549181313,
 'E': 0.07848392694534645,
 'S': 0.03242100922632854,
 'I': 0.05353675927357696,
 'K': 0.07614727763173719,
 'R': 0.0878305241997835,
 'D': 0.05932683882274109,
 'T': 0.06166348813635036,
 'P': 0.033915777537240344,
 'N': 0.04120062539731629,
 'Q': 0.03858907616445887,
 'F': 0.033073896534542895,
 'Y': 0.04503204302183736,
 'M': 0.018126213425424805,
 'H': 0.04008384447537069,
 'C': 0.0014947683109118087,
 'W': 0.016442451420029897}

import matplotlib.pyplot as plt
plt.bar(range(len(data)), list(data.values()), align='center')
plt.xticks(range(len(data)), list(data.keys()))

#stacked bar plot data subset
index,female,male,neutral,other,missing
L,0.40816326530612246,0.30612244897959184,0.02040816326530612,0.0,0.2653061224489796
A,0.34615384615384615,0.34615384615384615,0.0,0.0,0.3076923076923077
G,0.2962962962962963,0.1111111111111111,0.037037037037037035,0.0,0.5555555555555556
V,0.20833333333333334,0.5625,0.020833333333333332,0.0,0.20833333333333334
E,0.5,0.225,0.025,0.0,0.25

2

2 Answers 2

2

You can multiply the dataframe with the detail by the totals. And from there create a stacked bar plot.

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from io import StringIO

df_details_str = '''index,female,male,neutral,other,missing
L,0.40816326530612246,0.30612244897959184,0.02040816326530612,0.0,0.2653061224489796
A,0.34615384615384615,0.34615384615384615,0.0,0.0,0.3076923076923077
G,0.2962962962962963,0.1111111111111111,0.037037037037037035,0.0,0.5555555555555556
V,0.20833333333333334,0.5625,0.020833333333333332,0.0,0.20833333333333334
E,0.5,0.225,0.025,0.0,0.25'''

data = {'L': 0.10128343899798979, 'A': 0.04587392402453482, 'G': 0.05204199096266515, 'V': 0.08343212549181313, 'E': 0.07848392694534645, 'S': 0.03242100922632854, 'I': 0.05353675927357696, 'K': 0.07614727763173719, 'R': 0.0878305241997835, 'D': 0.05932683882274109, 'T': 0.06166348813635036, 'P': 0.033915777537240344, 'N': 0.04120062539731629, 'Q': 0.03858907616445887, 'F': 0.033073896534542895, 'Y': 0.04503204302183736, 'M': 0.018126213425424805, 'H': 0.04008384447537069, 'C': 0.0014947683109118087, 'W': 0.016442451420029897}
df_total = pd.DataFrame(pd.Series(data, name='total'))
df_details = pd.read_csv(StringIO(df_details_str), index_col=0)


df_multiplied = df_details.multiply(df_total['total'], axis='index').loc[df_details.index]
df_multiplied.plot.bar(stacked=True, rot=0, cmap='Set3')
plt.xlabel('')
plt.tight_layout()
plt.show()

stacked bar plot from multiplied dataframes

Sign up to request clarification or add additional context in comments.

Comments

1

Same idea with JohanC's answer, use Pandas to prepare data for visualisation.

import pandas as pd
import seaborn as sns
import seaborn.objects as so

sns.set_theme()

First consider the combination of letter distribution & gender distribution.

df_letter_ratio = pd.DataFrame(pd.Series(data, name="ratio"))

# For letters missing gender info, assigining 100% to "missing",
# You would have the complete data; so can just load it to `df_per_letter_gender_ratio`.
df_per_letter_gender_ratio_missing = pd.read_csv("/path/to/per-letter-gender-ratio.csv", index_col=0)
df_per_letter_gender_ratio = pd.concat([
    df_per_letter_gender_ratio_missing,
    pd.DataFrame({"missing": 1},
        index=df_letter_ratio.index.difference(df_per_letter_gender_ratio_missing.index)
    )
]).fillna(0)

df_gender_ratio = (
    df_per_letter_gender_ratio
        .mul(df_letter_ratio.ratio, axis="index")
        .reindex(df_letter_ratio.index)   # To keep the original index order
)

Since seaborn is also tagged for this question, I use Seaborn to render the figure. With Seaborn's object interface, available from v0.12, we can do something like this.

Here, I'm converting the data frame into long-form with pandas.DataFrame.melt.

(
    so.Plot(
        df_gender_ratio
            .rename_axis("letter")
            .melt(
                var_name=["gender"], value_name="ratio",
                ignore_index=False
            ),
        x="letter", y="ratio", color="gender"
    )
    .add(so.Bar(), so.Stack())
)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.