16

I have data that is over 400,000 lines long. When running this code:

f=pd.read_csv(filename,error_bad_lines=False)

I get the following error:

pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 454751

My data by the end of the file looks like this:

BTC 9948    8718    1.57E+12    ASK
BTC 52      8718    1.57E+12    ASK
BTC 120     8718    1.57E+12    ASK
BTC 200     8718    1.57E+12    ASK
BTC 150     8718    1.57E+12    ASK
BTC 50      8718    1.57E+12    ASK
BTC 10      8718    1.57E+12    ASK
BTC 57      8718    1.57E+12    ASK
BTC 50      8718    1.57E+12    ASK
BTC 50191   8718    

Line 454751 is this one: BTC 50 8718 1.57E+12 ASK
I tried running error_bad_lines=False as seen above but that still doesnt work. I also searched for quotes in my file but I do not have any.

0

3 Answers 3

28

Changing the Parser engine from C to Python should solve your problem. Use the following line to read your csv:

f = pd.read_csv(filename, error_bad_lines=False, engine="python")

From the read_csv documentation:

engine : {‘c’, ‘python’}, optional

Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.

Sign up to request clarification or add additional context in comments.

3 Comments

This works, thanks! However, it processes the data but shows this error after each run Skipping line 454752: unexpected end of data
Have you found the reason or the solution for this? @AspiringCoder
Generally speaking, the handy thing with [temporary] swapping to the Python parser is that the Python parser may give a better idea of the underlying error. After fixing this, it's potentially possible to swap back to the C parser.
0

please split and read file like below.

import pandas as pd

for chunk in pd.read_csv(filename, chunksize=<your_chunksize_here>)
    do_processing()
    train_algorithm()

Comments

0

In the new versions of Python, the argument on_bad_lines='False' is replaced by on_bad_lines='skip'. So try the below code:

f = pd.read_csv(filename, on_bad_lines='skip', engine='python')

It solved my error.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.