Prevent pandas read_csv treating first row as header of column names

Question

I'm reading in a pandas DataFrame using pd.read_csv. I want to keep the first row as data, however it keeps getting converted to column names.

I tried header=False but this just deleted it entirely.

(Note on my input data: I have a string (st = '\n'.join(lst)) that I convert to a file-like object (io.StringIO(st)), then build the csv from that file object.)

EdChum · Accepted Answer · 2016-11-23 16:51:19Z

You want header=None the False gets type promoted to int into 0 see the docs emphasis mine:

header : int or list of ints, default ‘infer’ Row number(s) to use as the column names, and the start of the data. Default behavior is as if set to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

You can see the difference in behaviour, first with header=0:

In [95]:
import io
import pandas as pd
t="""a,b,c
0,1,2
3,4,5"""
pd.read_csv(io.StringIO(t), header=0)

Out[95]:
   a  b  c
0  0  1  2
1  3  4  5

Now with None:

In [96]:
pd.read_csv(io.StringIO(t), header=None)

Out[96]:
   0  1  2
0  a  b  c
1  0  1  2
2  3  4  5

Note that in latest version 0.19.1, this will now raise a TypeError:

In [98]:
pd.read_csv(io.StringIO(t), header=False)

TypeError: Passing a bool to header is invalid. Use header=None for no header or header=int or list-like of ints to specify the row(s) making up the column names

jezrael · Accepted Answer · 2016-11-23 16:38:01Z

11

I think you need parameter header=None to read_csv:

Sample:

import pandas as pd
from pandas.compat import StringIO

temp=u"""a,b
2,1
1,1"""

df = pd.read_csv(StringIO(temp),header=None)
print (df)
   0  1
0  a  b
1  2  1
2  1  1

answered Nov 23, 2016 at 16:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Everyone Over a year ago

This just removes the first row altogether, it doesn't treat it as numerical values.

mpriya · Accepted Answer · 2021-05-27 17:40:57Z

0

If you're using pd.ExcelFile to read all the excel file sheets then:

df = pd.ExcelFile("path_to_file.xlsx")    
df.sheet_names                       # Provide the sheet names in the excel file

df = df.parse(2, header=None)        # Parsing the 2nd sheet in the file with header = None
df

Output:

answered May 27, 2021 at 17:40

mpriya

8911 gold badge12 silver badges16 bronze badges

Comments

Dharman · Accepted Answer · 2021-12-02 15:48:47Z

0

You can set custom column name in order to prevent this:

Let say if you have two columns in your dataset then:

df = pd.read_csv(your_file_path, names = ['first column', 'second column'])

You can also generate programmatically column names if you have more than and can pass a list in front of names attribute.

edited Dec 2, 2021 at 15:48

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Dec 2, 2021 at 15:43

Muhammad Talha

8521 gold badge7 silver badges10 bronze badges

Collectives™ on Stack Overflow

Prevent pandas read_csv treating first row as header of column names

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related