Python Pandas - Split Excel Spreadsheet By Empty Rows

Question

Given the following input file ("ToSplit2.xlsx"):

+-----------------+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section One     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|           |     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section Two     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|           |     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section   Three |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+

And the following Python code:

import pandas as pd
import numpy as np

spreadsheetPath = "ToSplit2.xlsx"
xls = pd.ExcelFile(spreadsheetPath)

# Iterate through worksheets in opened Excel file
for sheet in xls.sheet_names:
    # Create a Pandas dataframe from the Excel worksheet (with no headers)
    excel_data_df = pd.read_excel(
        spreadsheetPath, sheet_name=sheet, header=None)

    # Return a list of dataframe index values where entire row is blank
    indexList = excel_data_df[excel_data_df.isnull().all(1)].index.tolist()

    # Prints [11, 23]
    print(indexList)

    # Initiate a dictionary
    dataframeDictionary = {}

    # For every index value in the list
    for index in indexList:
        # Split and add the result to the dictionary of Panda's dataframes
        dataframeDictionary = np.array_split(excel_data_df, index)

    # For every pandas dataframe in the dataframe dictionary
    for dataframe in dataframeDictionary:
        # Write the pandas dataframe to Excel with a worksheet name equal to dataframe address 0,0
        dataframe.to_excel("output.xlsx",sheet_name=str(dataframe.iloc[0][0]))

I am trying to split the Excel worksheet into multiple spreadsheets based on the blank rows. E.g.:

Section One: (there would also be Section Two and Section Three worksheets)

+-----------------+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section One     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+

I believe I am really close, but seem to be slipping up on the data frame splitting.

you can use a loop to find the spaces and then terminating the worksheet and simultaneously creating a new worksheet. — sadbro
– sadbro, Commented Sep 21, 2020 at 19:35

Abhay · Accepted Answer · 2020-09-22 18:43:16Z

3

Make changes according to your file name.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Read excel file
df = pd.read_excel('ToSplit2.xlsx', skip_blank_lines=False, header=None)

# Split by blank rows
df_list = np.split(df, df[df.isnull().all(1)].index)

# Create new excel to write the dataframes
writer = pd.ExcelWriter('Excel_one.xlsx', engine='xlsxwriter')
for i in range(1, len(df_list) + 1):
    df_list[i - 1] = df_list[i - 1].dropna(how='all')
    df_list[i - 1].to_excel(writer, sheet_name='Sheet{}'.format(i), header=None, index=False)
    
# Save the excel file
writer.save()

edited Sep 22, 2020 at 18:43

answered Sep 22, 2020 at 11:55

Abhay

6154 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Pandas - Split Excel Spreadsheet By Empty Rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related