20

I'm trying to read a dataset using pd.read_csv() am getting an error. Excel can open it just fine.

reviews = pd.read_csv('br.csv') gives the error ParserError: Error tokenizing data. C error: EOF inside string starting at line 312074

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8') returns ParserError: unexpected end of data

What can I do to fix this?

Edit: This is the dataset - https://www.kaggle.com/gnanesh/goodreads-book-reviews

3
  • 2
    Can you share the data? I'm guessing that, if you were to open it in a text editor, you'd see that there are unbalanced quotation marks. Commented Aug 30, 2018 at 21:38
  • 2
    Or maybe just share line 312074 of that file Commented Aug 30, 2018 at 21:39
  • This is the data: kaggle.com/gnanesh/goodreads-book-reviews Commented Aug 30, 2018 at 21:41

3 Answers 3

34

For me adding this fixed it:

error_bad_lines=False

It just skips the last line. So instead of

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)

Sign up to request clarification or add additional context in comments.

1 Comment

error_bad_lines is now deprecated, so you can instead use on_bad_lines e.g. on_bad_lines='warn' or on_bad_lines='skip' to not warn or on_bad_lines='error' to generate an exception
4

In my case, I don't want to skip lines, since my task is required to count the number of data records in the csv file. The solution that works for me is using the Quote_None from csv library. I try this from reading on some websites that I did not remember, but it works.

To describe my case, previouly I have the error: EOF .... Then I tried using the parameter engine='python'. But that introduce another bug for next step of using the dataframe. Then I try quoting=csv.Quote_None, and it's ok now. I hope this helps

import csv    
read_file = read_csv(full_path, delimiter='~', encoding='utf-16 BE', header=0, quoting=csv.QUOTE_NONE)

Comments

0

I used the following code and my issue was solved:

df = pd.read_csv(<filename>, engine="python", encoding='utf-8', on_bad_lines='skip')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.