0

I am trying to create a boxplot, where on the x-axis I will refer to the two columns of the dataframe, while on y-axis I will show values of the 3rd column.

Let me refer to an example dataframe:

  Lvl1  Lvl2  value
0    A     1      1
1    A     2      2
2    A     1      3
3    B     2      4
4    B     1      5
5    B     2      6

Now, I want to have boxplots for the groups corresponding to Lvl1 and Lvl2. For example for group represented by (Lvl1 = A, Lvl2 = 1) boxplot would be calculated of values of {1,3}.

I know I can create a new column like Lvl0 which would be something like Lvl1 + Lvl2, but is there a way to create a boxplot without such operation?

On the following code:

import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.DataFrame(
    {'Lvl1': ['A', 'A', 'A', 'B', 'B', 'B'], 'Lvl2': [1, 2, 1, 2, 1, 2], 'value': [1, 2, 3, 4, 5, 6]})
grouped = dataset.groupby(['Lvl1', 'Lvl2'])
grouped.boxplot()
plt.show()

I get an error:
KeyError: "None of [Index(['A', 1], dtype='object')] are in the [index]"

Thank you in advance!

2 Answers 2

1

Try to use seaborn for an easier solution. I think it was answered here: Grouping boxplots in seaborn when input is a DataFrame

With your data:

import seaborn as sns
import pandas as pd

data = pd.DataFrame({'lvl1': ['A', 'A', 'A', 'B', 'B', 'B'], 
                     'lvl2': [1, 2, 1, 2, 1, 2], 
                     'value': [1, 2, 3, 4, 5, 6]})

df_long = pd.melt(data, "lvl1", var_name="lvl2", value_name="result")

sns.boxplot(x="lvl1", hue="lvl2", y="result", data=df_long)

We get:

enter image description here

If you need more levels, try to combine plots with sns.FacetGrid (https://seaborn.pydata.org/generated/seaborn.FacetGrid.html). Here I propose using sns.catplot:

data = pd.DataFrame({'lvl1': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'B'], 
                 'group': ['1', '2', '1', '2', '1', '2', '2', '1'], 
                 'has_something': [True, False, False, True, True, False, True, False], 
                 'before': [3, 4, 5, 5, 3, 4, 2, 6],
                 'after': [1, 2, 3, 4, 5, 6, 2, 3], 
                 'baseline': [1, 0, 0, 1, 1, 0, 0, 1]})

df = pd.melt(data, ["lvl1", 'group', 'has_something'], value_name="result")

sns.catplot(data=df, x='lvl1', y='result',
col='group', kind='box', hue='variable', col_wrap=2, margin_titles=True)

And result here:

enter image description here

In order to include 'has_something' variable to the plot, you can use FacetGrid or separate data by 'has_something' and make two plots with different filtered data.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. How could I extend it if there would be more then 2 levels? I don't see exactly how melt could help us here.
I added part about more levels to the answer. Hope it would help.
0

You can do it through seaborn. Following code works for me on your data:

import pandas as pd
import seaborn as sns

dataset = pd.DataFrame(
    {
        'Lvl1': ['A', 'A', 'A', 'B', 'B', 'B'], 'Lvl2': [1, 2, 1, 2, 1, 2], 
        'value': [1, 2, 3, 4, 5, 6]
    }
)
ax = sns.boxplot(x='Lvl1', y='value', hue="Lvl2", data=dataset)

Expired output:

1 Comment

Thank you. Is there an extension for the case where there are more then 2 columns that we take into account? Like let's say 3 levels or more?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.