Plotly.Express + Pandas multi-index column

Question

I am relatively new to Pandas and Plotly. I will pose my question directly with a MWE of what I want to do:

import pandas
import plotly.express as px

df = pandas.DataFrame(
    {
        'n': [1,1,1,1,2,2,2,3,3,3,4,4],
        'x': [0,0,0,0,1,1,1,2,2,2,3,3],
        'y': [1,2,1,1,2,3,3,3,4,3,4,5],
    }
)

mean_df = df.groupby(by=['n']).agg(['mean','std'])

fig = px.scatter(
    mean_df,
    x = ('x','mean'),
    y = ('y','mean'),
    error_y = ('y','std'),
)
fig.show()

This code is not doing what I want. The mean_df dataframe looks like this:

     x              y          
  mean  std      mean       std
n                              
1    0  0.0  1.250000  0.500000
2    1  0.0  2.666667  0.577350
3    2  0.0  3.333333  0.577350
4    3  0.0  4.500000  0.707107

I want to plot x_mean vs y_mean, with error bars in y, using plotly.express. I am not sure how to do this when there are sub-columns in the data frame...

After some research I have found that mean_df.columns = [' '.join(col).strip() for col in mean_df.columns.values] converts the previous dataframe into

   x mean  x std    y mean     y std
n                                   
1       0    0.0  1.250000  0.500000
2       1    0.0  2.666667  0.577350
3       2    0.0  3.333333  0.577350
4       3    0.0  4.500000  0.707107

so now I can just do

fig = px.scatter(
    mean_df,
    x = 'x mean',
    y = 'y mean',
    error_y = 'y std',
)

to obtain the desired result. However, despite this does exactly what I want to do, it does not feel like the way to go...

Rob Raymond · Accepted Answer · 2021-10-26 18:59:31Z

4

Similar to your research, flatten the multi-level columns. You can use index slicing. This does leave mean_df unchanged.

Further updates below. Plotly Express is designed as a simple API. It is a typical use case to simplify / structure a dataframe fits concept of x, y, color, hover_name, etc as columns addressed as strings. Where multi-index columns are in use, a series or array can be passed. Two variations of this demonstrated below.

import pandas
import plotly.express as px

df = pandas.DataFrame(
    {
        'n': [1,1,1,1,2,2,2,3,3,3,4,4],
        'x': [0,0,0,0,1,1,1,2,2,2,3,3],
        'y': [1,2,1,1,2,3,3,3,4,3,4,5],
    }
)

mean_df = df.groupby(by=['n']).agg(['mean','std'])

fig = px.scatter(
    mean_df.loc[:,pd.IndexSlice[:,"mean"]].droplevel(1,1),
    x = "x",
    y = "y",
)
fig

using multiple level 1 keys

fig = px.scatter(
    mean_df.loc[:,pd.IndexSlice[:,"mean"]].droplevel(1,1),
    x = "x",
    y = "y",
    error_y = mean_df.loc[:,("y","std")].values
)
fig.show()
px.scatter(
    x = mean_df.loc[:,("x","mean")],
    y = mean_df.loc[:,("y","mean")],
    error_y = mean_df.loc[:,("y","std")]
)

edited Oct 26, 2021 at 18:59

answered Oct 26, 2021 at 17:50

Rob Raymond

31.5k3 gold badges19 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user171780 Over a year ago

Thanks for your answer. I will up-vote as it does what I posted in the question. However I was looking for a solution that would not drop columns as I want the std ones to plot too. I will update my question to reflect this, sorry.

Rob Raymond Over a year ago

updated to cover your question amendment. IMHO it's good to stay simple, so if there are columns that can't be referenced by a string, pass a series or array instead

Collectives™ on Stack Overflow

Plotly.Express + Pandas multi-index column

1 Answer 1

using multiple level 1 keys

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

using multiple level 1 keys

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related