I have to read an excel file to dataframe for doing some analysis. But this excel file has header and footer and need to be removed. How can I remove them once I read them as datagram.
3 Answers
The header and footer of the dataframe can't be removed when you read the file, but you can slice the old one and assigned it to a new dataframe.
For the screenshot what you can do is.
temp_data = pd.read_csv("filename.csv")
data = temp_date[12:]
check this document: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-with-labels
Comments
The easiest solution is to create a new sheet in Excel with only the data. :)
Option 1: Ignore the header with header offset
excel_data_df = pandas.read_excel('File.xlsx', sheetname='Sheet1', header=18)
Explanation:
If you pass the header value as an integer, let’s say 3. Then the third row will be treated as the header row and the values will be read from the next row onwards. Any data before the header row will be discarded.
source: https://www.journaldev.com/33306/pandas-read_excel-reading-excel-file-in-python
Option 2: Remove header with drop
df.drop(df.head(18).index,inplace=True) # drop first 18 rows
Drop Footer
Then simply drop the last rows:
df.drop(df.tail(n).index,inplace=True) # drop last n rows
source: How to delete the last row of data of a pandas dataframe
