Python/Plotly: How to make each data point on Scatter plot represent median value?

Question

Here is my dataset:

ob1=np.linspace(1, 10, 13).round(2).tolist()
ob2=np.linspace(10, 1, 12).round(2).tolist()
ob=ob1+ob2

ex_dic={'Vendor':['A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
       'Month':[1,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
       'Observation':ob
       }
ex_df=pd.DataFrame.from_dict(ex_dic)

It looks like this:

Here is code for my Plotly visualization:

ex_month_list=ex_df.Month.unique().tolist()
ex_vendor_list=ex_df.Vendor.unique().tolist()

fig=go.Figure()

for i in ex_vendor_list:
    by_vendor_df=ex_df.loc[ex_df['Vendor']==i]
    fig.add_trace(go.Scatter(x=by_vendor_df.Month, y=by_vendor_df.Observation, name=str(i),
                             mode='lines+markers', marker_line_width=2, marker_size=8))

It will show something like this: Y-axis shows the observations (1-10), X-axis shows months (1-12)

Here is where the problem is:

I have tried applying median() here and there but cannot manage to make my plot represent median observations for each month... For example here is what I came up with so far (in terms of logic):

for i in vendor_list:
    vendor_df=some_df.loc[some_df['Vendor']==i]
    for m in month_list:
        month_df=vendor_df.loc[vendor_df['Month']==m]
        by_month_observations=month_df['Observation'].to_list()
        median_val=stat.median(by_month_observations)
        print(median_val)

Code above does return median values and it works all good, BUT now that some values went from 2 observations to 1 - I cannot append it back to dataframe since lengths are not the same anymore...Therefore, not sure if this is the best way to go with.

Please let me know by looking at the code above what is the smart way to go about this so that each datapoint that is printed is a median value for each month by vendor. Help is really appreciated!

I think I'm on the right path right now with doing this: for i in vendor_list: vendor_df=some_df.loc[some_df['Vendor']==i] a=vendor_df.groupby(vendor_df.Month)[['Observation']].median() It seems like this will give the unique dataframes that I can plot from... — DGomonov
– DGomonov, Commented Nov 12, 2019 at 18:47
@vestland I forgot to include it in my sample code, will update now... I wish the problem was that easy — DGomonov
– DGomonov, Commented Nov 12, 2019 at 19:00

DGomonov · Accepted Answer · 2019-11-12 19:15:56Z

1

Well, I figured myself the way to do it - simple use of .groupby() did the job!

Here is the the df I used trying to solve my problem:

some_dic={'Vendor':['A','A','A','A','B','B','B','B','B'],
       'Month':[6,7,8,8,6,7,8,8,8],
       'Observation':[1,2,3,4,10,8,6,3,1]
         }
some_df=pd.DataFrame.from_dict(some_dic)

Here is the code that generated successfully plot with median values:

...
grouped_df=vendor_df.groupby(vendor_df.Month)[['Observation']].median()
grouped_df.reset_index(inplace=True)
...

answered Nov 12, 2019 at 19:15

DGomonov

8552 gold badges8 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python/Plotly: How to make each data point on Scatter plot represent median value?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related