I have list of excel files that are read into pandas dataframes. However, some files (dataframes) have different headers in different rows. Therefore, I would like to have a user input, which will help me to set dataframe headers for each DataFrame.
Lets say my first (Excel file) dataframe looks like this,
0 245 867
1 Reddit Facebook
2 ColumnNeeded ColumnNeeded
3 RedditInsight FacbookInsights
4 RedditText FacbookText
Now, I want to the user to look at this and then input row 2 (index 1) as the number, then my output dataframe will be like this,
Reddit Facebook
0 ColumnNeeded ColumnNeeded
1 RedditInsight FacbookInsights
2 RedditText FacbookText
This way, I can create headers for each dataframe.
This is how I have,
excel_file_dfs = []
for file in glob.glob(r'path\*.xlsx'):
df = pd.read_excel(file)
## Not sure how to show the DataFrame here so, user can select the row to be the header
ask_user = input("What raw do you want to make it header? ")
header_number = ask_user
df = pd.read_excel(file, header=[header_number])
excel_file_dfs.append(df)
I am getting this error:
ValueError: Invalid file path or buffer object type:
from line df = pd.read_excel(each_file, header=[ask_user]).
I know I am reading pd.read_excel() two times, which might be causing lot of memory and processing.
Anyhow, I want the user to see each DataFrame and then input the row number to select the header. How can I do it in pandas?