Pivot table with Multi Index Dataframe

Question

I am struggling with how to pivot the dataframe with multi-indexed columns. First i import the data from an .xlsx file and from then i've tried to generate a certain Dataframe.

Note: I'm not allowed to embed images so that's the reason of the links

import pandas as pd
import numpy as np

# Read Excel file
df = pd.read_excel("myFile.xlsx", header=[0])

Output: Click

If you want, here you can see the File: Link to File

# Get Month and Year of the Dataframe
month_year = df.iloc[:, 5:-1].columns
month_list = []
year_list = []

for x in range(len(month_year)-1):
    if("Unnamed" not in month_year[x]):
        month_list.append(month_year[x].split()[0])
        year_list.append(month_year[x].split()[1])   


# Read Excel file with headers 1 & 2
df = pd.read_excel(path, header=[0,1])

Output: Click

# Join both indexes excluding the ones with Unnamed
df.columns = [str(x[0] + " " + x[1]) if("Unnamed" not in x[1]) else str(x[0]) for x in df.columns ]

Output: Click

# Adding month and list columns to the DataFrame
df['Month'] = month_list
df['Year'] = year_list

Output: Click

I want the output DataFrame to be like the following: Desire Output

FYI, instead of images of the resulting dataframes, use df.to_clipboard(excel=False) and then paste that as code-formatted text. — aneroid
– aneroid, Commented Apr 5, 2021 at 21:53

sammywemmy · Accepted Answer · 2021-04-05 22:31:21Z

You should clean it up a bit, because I do not know how the Total column should be handled.

The code below reads the excel file as a MultiIndex, a bit of column names modification, before stacking and extracting the Year and Month columns.

df = pd.read_excel("Downloads/myFile.xlsx",  header=[0,1], index_col=[0, 1, 2])
df.index.names = ['Project', 'KPI', 'Metric']
df.columns = df.columns.delete(-1).union([('Total', 'Total')])
df.columns.names = ['Month_Year', 'Values']
(df
 .stack(level = 0)
 .rename_axis(columns=None)
 .reset_index()
 .assign(Year = lambda df: df.Month_Year.str.split(" ").str[-1],
         Month = lambda df: df.Month_Year.str.split(" ").str[0])
 .drop(columns='Month_Year')
)

      Project      KPI            Metric    Real   Target  Total   Year     Month
0   Project 1  Numeric  Project 1 Metric    10.0     30.0    NaN   2019     April
1   Project 1  Numeric  Project 1 Metric   651.0  51651.0    NaN   2019  February
2   Project 1  Numeric  Project 1 Metric   200.0    215.0    NaN   2019   January
3   Project 1  Numeric  Project 1 Metric     2.0      5.0    NaN   2019     March
4   Project 1  Numeric  Project 1 Metric     NaN      NaN    9.0  Total     Total
5   Project 2  General  Project 2 Metric    20.0     10.0    NaN   2019     April
6   Project 2  General  Project 2 Metric   500.0    100.0    NaN   2019  February
7   Project 2  General  Project 2 Metric   749.0     12.0    NaN   2019   January
8   Project 2  General  Project 2 Metric     1.0      7.0    NaN   2019     March
9   Project 2  General  Project 2 Metric     NaN      NaN    7.0  Total     Total
10  Project 3  Numeric  Project 3 Metric    30.0     20.0    NaN   2019     April
11  Project 3  Numeric  Project 3 Metric   200.0     55.0    NaN   2019  February
12  Project 3  Numeric  Project 3 Metric  5583.0     36.0    NaN   2019   January
13  Project 3  Numeric  Project 3 Metric     3.0      7.0    NaN   2019     March
14  Project 3  Numeric  Project 3 Metric     NaN      NaN    4.0  Total     Total

Thanks for your help, it worked. Now as you said, i will handle the Total Column.

Collectives™ on Stack Overflow

Pivot table with Multi Index Dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related