2

I am struggling with how to pivot the dataframe with multi-indexed columns. First i import the data from an .xlsx file and from then i've tried to generate a certain Dataframe.

Note: I'm not allowed to embed images so that's the reason of the links

import pandas as pd
import numpy as np

# Read Excel file
df = pd.read_excel("myFile.xlsx", header=[0])

Output: Click

If you want, here you can see the File: Link to File

# Get Month and Year of the Dataframe
month_year = df.iloc[:, 5:-1].columns
month_list = []
year_list = []

for x in range(len(month_year)-1):
    if("Unnamed" not in month_year[x]):
        month_list.append(month_year[x].split()[0])
        year_list.append(month_year[x].split()[1])   


# Read Excel file with headers 1 & 2
df = pd.read_excel(path, header=[0,1])

Output: Click

# Join both indexes excluding the ones with Unnamed
df.columns = [str(x[0] + " " + x[1]) if("Unnamed" not in x[1]) else str(x[0]) for x in df.columns ]

Output: Click

# Adding month and list columns to the DataFrame
df['Month'] = month_list
df['Year'] = year_list

Output: Click

I want the output DataFrame to be like the following: Desire Output

2
  • 1
    FYI, instead of images of the resulting dataframes, use df.to_clipboard(excel=False) and then paste that as code-formatted text. Commented Apr 5, 2021 at 21:53
  • Thanks, i'll have it in mind for the future. Commented Apr 5, 2021 at 22:46

1 Answer 1

1

You should clean it up a bit, because I do not know how the Total column should be handled.

The code below reads the excel file as a MultiIndex, a bit of column names modification, before stacking and extracting the Year and Month columns.

df = pd.read_excel("Downloads/myFile.xlsx",  header=[0,1], index_col=[0, 1, 2])
df.index.names = ['Project', 'KPI', 'Metric']
df.columns = df.columns.delete(-1).union([('Total', 'Total')])
df.columns.names = ['Month_Year', 'Values']
(df
 .stack(level = 0)
 .rename_axis(columns=None)
 .reset_index()
 .assign(Year = lambda df: df.Month_Year.str.split(" ").str[-1],
         Month = lambda df: df.Month_Year.str.split(" ").str[0])
 .drop(columns='Month_Year')
)

      Project      KPI            Metric    Real   Target  Total   Year     Month
0   Project 1  Numeric  Project 1 Metric    10.0     30.0    NaN   2019     April
1   Project 1  Numeric  Project 1 Metric   651.0  51651.0    NaN   2019  February
2   Project 1  Numeric  Project 1 Metric   200.0    215.0    NaN   2019   January
3   Project 1  Numeric  Project 1 Metric     2.0      5.0    NaN   2019     March
4   Project 1  Numeric  Project 1 Metric     NaN      NaN    9.0  Total     Total
5   Project 2  General  Project 2 Metric    20.0     10.0    NaN   2019     April
6   Project 2  General  Project 2 Metric   500.0    100.0    NaN   2019  February
7   Project 2  General  Project 2 Metric   749.0     12.0    NaN   2019   January
8   Project 2  General  Project 2 Metric     1.0      7.0    NaN   2019     March
9   Project 2  General  Project 2 Metric     NaN      NaN    7.0  Total     Total
10  Project 3  Numeric  Project 3 Metric    30.0     20.0    NaN   2019     April
11  Project 3  Numeric  Project 3 Metric   200.0     55.0    NaN   2019  February
12  Project 3  Numeric  Project 3 Metric  5583.0     36.0    NaN   2019   January
13  Project 3  Numeric  Project 3 Metric     3.0      7.0    NaN   2019     March
14  Project 3  Numeric  Project 3 Metric     NaN      NaN    4.0  Total     Total
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your help, it worked. Now as you said, i will handle the Total Column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.