5

I am relatively new to Pandas and Plotly. I will pose my question directly with a MWE of what I want to do:

import pandas
import plotly.express as px

df = pandas.DataFrame(
    {
        'n': [1,1,1,1,2,2,2,3,3,3,4,4],
        'x': [0,0,0,0,1,1,1,2,2,2,3,3],
        'y': [1,2,1,1,2,3,3,3,4,3,4,5],
    }
)

mean_df = df.groupby(by=['n']).agg(['mean','std'])

fig = px.scatter(
    mean_df,
    x = ('x','mean'),
    y = ('y','mean'),
    error_y = ('y','std'),
)
fig.show()

This code is not doing what I want. The mean_df dataframe looks like this:

     x              y          
  mean  std      mean       std
n                              
1    0  0.0  1.250000  0.500000
2    1  0.0  2.666667  0.577350
3    2  0.0  3.333333  0.577350
4    3  0.0  4.500000  0.707107

I want to plot x_mean vs y_mean, with error bars in y, using plotly.express. I am not sure how to do this when there are sub-columns in the data frame...

After some research I have found that mean_df.columns = [' '.join(col).strip() for col in mean_df.columns.values] converts the previous dataframe into

   x mean  x std    y mean     y std
n                                   
1       0    0.0  1.250000  0.500000
2       1    0.0  2.666667  0.577350
3       2    0.0  3.333333  0.577350
4       3    0.0  4.500000  0.707107

so now I can just do

fig = px.scatter(
    mean_df,
    x = 'x mean',
    y = 'y mean',
    error_y = 'y std',
)

to obtain the desired result. However, despite this does exactly what I want to do, it does not feel like the way to go...

1 Answer 1

4

Similar to your research, flatten the multi-level columns. You can use index slicing. This does leave mean_df unchanged.

Further updates below. Plotly Express is designed as a simple API. It is a typical use case to simplify / structure a dataframe fits concept of x, y, color, hover_name, etc as columns addressed as strings. Where multi-index columns are in use, a series or array can be passed. Two variations of this demonstrated below.

import pandas
import plotly.express as px

df = pandas.DataFrame(
    {
        'n': [1,1,1,1,2,2,2,3,3,3,4,4],
        'x': [0,0,0,0,1,1,1,2,2,2,3,3],
        'y': [1,2,1,1,2,3,3,3,4,3,4,5],
    }
)

mean_df = df.groupby(by=['n']).agg(['mean','std'])

fig = px.scatter(
    mean_df.loc[:,pd.IndexSlice[:,"mean"]].droplevel(1,1),
    x = "x",
    y = "y",
)
fig

using multiple level 1 keys

fig = px.scatter(
    mean_df.loc[:,pd.IndexSlice[:,"mean"]].droplevel(1,1),
    x = "x",
    y = "y",
    error_y = mean_df.loc[:,("y","std")].values
)
fig.show()
px.scatter(
    x = mean_df.loc[:,("x","mean")],
    y = mean_df.loc[:,("y","mean")],
    error_y = mean_df.loc[:,("y","std")]
)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer. I will up-vote as it does what I posted in the question. However I was looking for a solution that would not drop columns as I want the std ones to plot too. I will update my question to reflect this, sorry.
updated to cover your question amendment. IMHO it's good to stay simple, so if there are columns that can't be referenced by a string, pass a series or array instead

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.