How to join Multi-level dataframe on values in single-level dataframe

Question

what I have so far is a normal transactional dataframe with the following columns:

store | item | year | month | day | sales

'year' can be 2015, 2016, 2017.

With that I created a summary dataframe:

store_item_years = df.groupby(
['store','item','year'])['sales'].agg(
[np.sum, np.mean, np.std, np.median, np.min, np.max]).unstack(
fill_value=0)

The last one results in a Multi-Index with 2 levels, like this:

               sum                  mean
        year | 2015 | 2016 | 2017 | 2015 | 2016 | 2017 | ... 
store | item   sum1    ...   ...    mean1  mean2  ...  | ...

Now I'd like to merge the summary table back onto the transactional one:

store | item | year | month | day | sales | + | sum+'by'+year | mean+'by'+year
               2015                              sum1              mean1
               2016                              sum2              mean2
               2017                              ...                ...

I am trying to merge with the following:

df = pd.merge(df, store_item_years, 
             left_on=['store', 'item', 'year'], 
             right_on=['store', 'item', 'year'],
             how='left')

which results in the following error:

KeyError: 'year'

Any ideas? I am just getting my head around groupby. I haven't looked into PivotTable yet.

Please keep in mind that the problem is simplified. The number of store_item combinations is 200+K and other groupbys having 300+ columns. But always the same principle.

Thanks a lot in advance.

jezrael · Accepted Answer · 2017-11-18 19:06:00Z

2

I think you need first remove unstack and then use join for left join:

store_item_years = df.groupby(
['store','item','year'])['sales'].agg(
[np.sum, np.mean, np.std, np.median, np.min, np.max])

df = df.join(store_item_years, on=['store','item','year'])

answered Nov 18, 2017 at 19:06

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user319436 Over a year ago

Thanks! Yes, I just noticed it as well. I will accept your answer since you beat me to mine :)

user319436 · Accepted Answer · 2017-11-18 19:08:24Z

0

found the culprit. Removed .unstack().

store_item_years = df.groupby(
   ['store','item','year'])['sales'].agg(
   [np.sum, np.mean, np.std, np.median, np.min, np.max])

The following to keep the context:

store_item_years.columns = store_item_years.columns+'_by_year'

And merge like this:

pd.merge(df, store_item_years.reset_index(), 
         left_on=['store', 'item', 'year'], 
         right_on=['store', 'item', 'year'],
         how='left')

answered Nov 18, 2017 at 19:08

user319436

17312 bronze badges

Collectives™ on Stack Overflow

How to join Multi-level dataframe on values in single-level dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related