How to read a no header csv with variable length csv using pandas

Question

I have a csv file which has no header columns and it has variable length records in each line.

Each record can go upto 398 fields and I want to keep only 256 fields in my dataframe.As I need only those fields to process.

Below is a slim version of the file.

1,2,3,4,5,6
12,34,45,65
34,34,24

In the above I would like to keep only 3 fields(analogous to 256 above) from each line while calling the read_csv.

I tried the below

import pandas as pd
df = pd.read_csv('sample.csv',header=None)

I get the following error as pandas taking the 1st to generate the metadata.

  File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 10

Only solution I can think of is using

names = ['column1','column2','column3','column4','column5','column6']

while creating the data frame.

But for the real files which can be upto 50MB I don't want to do that as that is taking a lot of memory and I am trying to run it using aws lambda which will incur more cost. I have to process a large number of files daily.

My question is can I just create a dataframe using the slimmer 256 field while reading the csv alone? Can that be my step one ?

I am very new to pandas so kindly bear my ignorance. I tried to look for a solution for a long time but could find one.

try using usecols (read more in the docs)... but since csv's are just text files pandas still has to load and read the full file to identify the columns, usecols just controls what is parsed into the dataframe — RichieV
– RichieV, Commented Sep 2, 2020 at 20:41
consider using .to_hdf for quick columnar access with a binary file — RichieV
– RichieV, Commented Sep 2, 2020 at 20:43

Danila Ganchar · Accepted Answer · 2020-09-03 18:42:02Z

1

# only 3 columns
df = pd.read_csv('sample.csv', header=None, usecols=range(3))
print(df)
#     0   1   2
# 0   1   2   3
# 1  12  34  45
# 2  34  34  24

So just change range value.

answered Sep 3, 2020 at 18:42

Danila Ganchar

11.5k13 gold badges69 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to read a no header csv with variable length csv using pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related