I have excel spreadsheets I would like to concatenate into a pandas dataframe, however the table ranges entered into the spreadsheets are irregular. The data entered might begin at say, C5, D8, G4 etc. in each spreadsheet. The example below shows that it starts at B5.
I would not know where the table would begin in each spreadsheet or specify which sheet in each workbook, as there's a few hundred. I intend to compile all sheets into a dataframe, then extract the rows of data which I need. The data is mostly in the same format but I would also need to bear in mind any notes within the spreadsheets.
It would be simpler if the data in each spreadsheet was aligned together, then I could extract the rows I need with index labels. Is there a way to align all of the data in each spreadsheet to begin in the first column of each spreadsheet?
Here is what I have so far:
import os
import pandas as pd
import glob
import numpy as np
path =r'dir'
allFiles = glob.glob(path + "/*.xlsx")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_excel(file_,index_col=None, header=0)
list_.append(df)
frame = pd.concat(list_)
print(list_)


