Pandas combines empty rows in Excel file to a single row in dataframe

Question

I have different excel files that I am processing with Pandas. I need to remove a certain number of rows from the top of each file. These extra rows could be empty or they could contain text. Pandas is combining some of the rows so I am not sure how many need to be removed. For example:

Here is an example excel file (represented as csv):

,,
,,
some text,,
,,
,,
,,
name, date, task
Jason,1-Jan,swim 
Aem,2-Jan,workout

Here is my current python script:

import pandas as pd 
xl = pd.ExcelFile('extra_rows.xlsx') 
dfs = xl.parse(xl.sheet_names[0]) 
print ("dfs: ", dfs)

Here is the results when I print the dataframe:

dfs:          Unnamed: 0           Unnamed: 1 Unnamed: 2
0  some other text                  NaN        NaN
1              NaN                  NaN        NaN
2              NaN                  NaN        NaN
3              NaN                  NaN        NaN
4             name                 date       task
5            Jason  2016-01-01 00:00:00       swim
6              Aem  2016-01-02 00:00:00    workout

From the file, I would remove the first 6 rows. However, from the dataframe I would only remove 4. Is there a way to read in the Excel file with the data in its raw state so the number of rows remains consistent?

rojeeer · Accepted Answer · 2016-10-27 01:40:37Z

2

I used python3 and pandas-0.18.1. The Excel load function is pandas.read_csv. You can try set the parameter header=None to achieve. Here are sample codes:

(1) With default parameters, result will ignore leading blank lines:

In [12]: pd.read_excel('test.xlsx')
Out[12]: 
  Unnamed: 0 Unnamed: 1 Unnamed: 2
0      text1        NaN        NaN
1        NaN        NaN        NaN
2         n1         t2         c3
3        NaN        NaN        NaN
4        NaN        NaN        NaN
5        jim        sum        tim

(2) With header=None, result will keep leading blank lines.

In [13]: pd.read_excel('test.xlsx', header=None)
Out[13]: 
       0    1    2
0    NaN  NaN  NaN
1    NaN  NaN  NaN
2  text1  NaN  NaN
3    NaN  NaN  NaN
4     n1   t2   c3
5    NaN  NaN  NaN
6    NaN  NaN  NaN
7    jim  sum  tim

answered Oct 27, 2016 at 1:40

rojeeer

2,0111 gold badge13 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mkhanoyan · Accepted Answer · 2016-10-27 01:35:40Z

2

Here is what you are looking for:

import pandas as pd 
xl = pd.ExcelFile('extra_rows.xlsx') 
dfs = xl.parse(skiprows=6) 
print ("dfs: ", dfs)

Check the docs on ExcelFile for more details.

answered Oct 27, 2016 at 1:35

mkhanoyan

2,06919 silver badges15 bronze badges

Comments

miriamsimone · Accepted Answer · 2016-10-27 01:37:07Z

2

If you read your file in with pd.read_excel and pass header=None, the blank rows should be included:

In [286]: df = pd.read_excel("test.xlsx", header=None)

In [287]: df
Out[287]:
           0     1      2
0        NaN   NaN    NaN
1        NaN   NaN    NaN
2  something   NaN    NaN
3        NaN   NaN    NaN
4       name  date  other
5          1     2      3

answered Oct 27, 2016 at 1:37

miriamsimone

36.7k12 gold badges97 silver badges121 bronze badges

Collectives™ on Stack Overflow

Pandas combines empty rows in Excel file to a single row in dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related