0

Python newbie here

In the dataset below:


import pandas as pd
import numpy as np

data = {'Gender':['M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F'],
        'Location':['NE','NE','NE','NE','SW','SW','SW','SW','SE','SE','SE','SE','NC','NC','NC','NC'],
        'Type':['L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R'],
         'PDP':['<10','<10','<10','<10',10,10,10,10,20,20,20,20,'>20','>20','>20','>20'],
         'PDP_code':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
     'diff':[-1,-1,-1,-1,0,0,0,0,1,1,1,1,3,3,3,3],
     'series':[1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8],
    'Revenue_YR1':[1150.78,1162.34,1188.53,1197.69,2108.07,2117.76,2129.48,1319.51,1416.87,1812.54,1819.57,1991.97,2219.28,2414.73,2169.91,2149.19],
     'Revenue_YR2':[250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.12],
     'Revenue_YR3':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
     'Revenue_YR4':[270.84,282.14,298.53,306.69,318.73,327.47,369.63,389.59,398.75,432.18,449.78,473.55,494.85,509.39,515.52,539.23],
      'Revenue_YR5':[251.78,221.34,282.53,272.69,310.07,317.7,329.81,333.15,334.87,332.54,336.59,339.97,329.28,334.73,336.91,334.12],
     'Revenue_YR6':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
     'Revenue_YR7':[27.84,28.14,29.53,30.69,18.73,27.47,36.63,38.59,38.75,24.18,24.78,21.55,13.85,9.39,15.52,39.23],
      'Revenue_YR8':[279.84,289.14,299.53,309.69,318.73,327.47,336.63,398.59,398.75,324.18,324.78,321.55,333.85,339.39,315.52,319.23],
}

df = pd.DataFrame(data,columns = ['Gender','Location','Type','PDP','PDP_code','diff','series',
                                'Revenue_YR1','Revenue_YR2','Revenue_YR3','Revenue_YR4','Revenue_YR5','Revenue_YR6',
                                'Revenue_YR7','Revenue_YR8'])
df.head(5)


I want a pythonic way of doing the following :

  1. subset df into 4 dataframes / lists based on unique Location resulting in NE,SW,SE & NC dataframes

  2. aggregating all the Revenue_YR columns while GroupBy series and PDP_code columns and export all the aggregated dataframes (NE,SW,SE & NC) as multiple sheets of one xlsx file

My attempt

### this code returns output of 1 df instead of 4 dfs, I need help aggregating each of the 4 dataframes and export them to 4 sheets of 12312021_output.xlsx

for i, part_df in df.groupby('Location'): 
    part_df.groupby(['series','PDP_code'])[['Revenue_YR1', 'Revenue_YR2','Revenue_YR3', 
        'Revenue_YR4', 'Revenue_YR5', 'Revenue_YR6', 'Revenue_YR7']].mean().unstack().style.background_gradient(cmap='Blues').to_excel('12312021_output.xlsx')
    

Please share your code.

2
  • You can define pd.ExcelWriter above the for i, part_df line and pass it as an argument to to_excel along with sheet_name parameter. Commented Jan 1, 2022 at 5:44
  • @hilberts_drinking_problem. For better understanding, please can you share your code. Thanks Commented Jan 1, 2022 at 5:47

1 Answer 1

1

You can use pandas.ExcelWriter, and your loop (which I improved slightly for readability):

import pandas as pd

with pd.ExcelWriter("output.xlsx") as writer:
    cols = df.filter(like='Revenue_YR').columns
    
    for g, d in df.groupby('Location'):
        (d.groupby(['series','PDP_code'])[cols].mean().unstack()
          .style.background_gradient(cmap='Blues')
         ).to_excel(writer, sheet_name=g)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.