Pandas Read CSV Missing Rows

Question

I have a big CSV file containing 16M+ rows as shown below:

with open(r'file.csv') as fp:
    count = 0
    for _ in fp:
        count += 1
    print(count)

16817381

However, when I read it using pandas.read_csv, I only see 15M + rows

df = pd.read_csv(r'file.csv', low_memory = False, usecols = [0, 13, 4, 5, 6, 7, 8, 11])
df.shape[0]

15234809

The file format quality is bad. It has 27 columns in total, but some rows have values in additional columns. I suspect this causes the error.

For example, I see below error if I don't specify anything in usecols:

Error tokenizing data. C error: Expected 27 fields in line 189, saw 28

I checked similar questions and tried adding arguments like error_bad_lines=False, but nothing works.

Can anyone please advise? Thanks!

CSVs can include multiline fields, if the field is encapsulated in quotes. This means that CSVs with encapsulated text fields will have fewer rows than the count of newlines in the file. Check your data for this condition. — Dave
– Dave, Commented May 12, 2020 at 16:01
If the format is not fixed, try reading the file with read_fwf method and check if that works for you. — Mayank Porwal
– Mayank Porwal, Commented May 12, 2020 at 16:03

Mit · Accepted Answer · 2020-05-12 16:05:54Z

1

Try something like this:

import pandas as pd
import csv

def ReadRows(stream, max_length=None):
    #get data in rows from stream
    rows = csv.reader(stream)
    #set max length
    if max_length is None:
        rows = list(rows)
        max_length = max(len(row) for row in rows)
    for row in rows:
        yield row + [None] * (max_length - len(row))

with open('yourFile.csv') as f:
    df = pd.DataFrame.from_records(list(ReadRows(f)))

answered May 12, 2020 at 16:05

Mit

7167 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas Read CSV Missing Rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related