1

Here is my dataset:

ob1=np.linspace(1, 10, 13).round(2).tolist()
ob2=np.linspace(10, 1, 12).round(2).tolist()
ob=ob1+ob2

ex_dic={'Vendor':['A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
       'Month':[1,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
       'Observation':ob
       }
ex_df=pd.DataFrame.from_dict(ex_dic)

It looks like this:

dataframe

Here is code for my Plotly visualization:

ex_month_list=ex_df.Month.unique().tolist()
ex_vendor_list=ex_df.Vendor.unique().tolist()

fig=go.Figure()

for i in ex_vendor_list:
    by_vendor_df=ex_df.loc[ex_df['Vendor']==i]
    fig.add_trace(go.Scatter(x=by_vendor_df.Month, y=by_vendor_df.Observation, name=str(i),
                             mode='lines+markers', marker_line_width=2, marker_size=8))

It will show something like this: Scatter plot Y-axis shows the observations (1-10), X-axis shows months (1-12)

Here is where the problem is:

enter image description here

I have tried applying median() here and there but cannot manage to make my plot represent median observations for each month... For example here is what I came up with so far (in terms of logic):

for i in vendor_list:
    vendor_df=some_df.loc[some_df['Vendor']==i]
    for m in month_list:
        month_df=vendor_df.loc[vendor_df['Month']==m]
        by_month_observations=month_df['Observation'].to_list()
        median_val=stat.median(by_month_observations)
        print(median_val)

Code above does return median values and it works all good, BUT now that some values went from 2 observations to 1 - I cannot append it back to dataframe since lengths are not the same anymore...Therefore, not sure if this is the best way to go with.

Please let me know by looking at the code above what is the smart way to go about this so that each datapoint that is printed is a median value for each month by vendor. Help is really appreciated!

6
  • I think I'm on the right path right now with doing this: for i in vendor_list: vendor_df=some_df.loc[some_df['Vendor']==i] a=vendor_df.groupby(vendor_df.Month)[['Observation']].median() It seems like this will give the unique dataframes that I can plot from... Commented Nov 12, 2019 at 18:47
  • I think your data sample is missing a definition for ob1 Commented Nov 12, 2019 at 18:52
  • @vestland I forgot to include it in my sample code, will update now... I wish the problem was that easy Commented Nov 12, 2019 at 19:00
  • Never meant to say that it was =) Commented Nov 12, 2019 at 19:13
  • @vestland all good! I just figured the answer myself :) Commented Nov 12, 2019 at 19:17

1 Answer 1

1

Well, I figured myself the way to do it - simple use of .groupby() did the job!

Here is the the df I used trying to solve my problem:

some_dic={'Vendor':['A','A','A','A','B','B','B','B','B'],
       'Month':[6,7,8,8,6,7,8,8,8],
       'Observation':[1,2,3,4,10,8,6,3,1]
         }
some_df=pd.DataFrame.from_dict(some_dic)

Here is the code that generated successfully plot with median values:

...
grouped_df=vendor_df.groupby(vendor_df.Month)[['Observation']].median()
grouped_df.reset_index(inplace=True)
...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.