ParserError: Error tokenizing data C error

Question

I am trying to run this code which removes unnecessary columns from a dataframe for later processing. It loops through the first files then gives the error below. Before it was running fine. I saw something about it maybe being a corrupted file, so I deleted all previous files and have gone through producing all the files in the steps again, but I am still getting the error. Sorry if it is long winded, I need to show each step for my thesis and also I am still very much a novice programmer. Can anyone help with fixing this issue?

The code is:

import pandas as pd
import os

path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
    df = pd.read_csv(path+file)
    df = df.drop('Hits', axis=1)
    df = df.drop('Score', axis=1)
    df = df.drop('Score.1', axis=1)
    print(df)
    filename = os.path.splitext(file)
    (f, ext) = filename
    print(f)
    df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)

The error message is as follows:

Traceback (most recent call last):
  File "/home/sandra/git/trees/trees/remove_columns.py", line 9, in <module>
    df = pd.read_csv(path+file)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 737, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'

.

Have you tried doing what the error suggests? df = pd.read_csv(path+file, engine='python') — It_is_Chris
– It_is_Chris, Commented Oct 10, 2018 at 12:56
@Chris you'd be shocked at how many people don't read error messages, even when they say exactly how to fix their particular problem. — Matt Messersmith
– Matt Messersmith, Commented Oct 10, 2018 at 13:17
Thank you, I will try that. I had read the error message, I am sorry I am a novice programmer and I don't always understand what is being asked of me. I was also confused by the fact that it works for the first files and then fails, although all the files have been produced by the same means. Thank you. — Sandra Young
– Sandra Young, Commented Oct 10, 2018 at 13:30
This solution didn't work for me. The one below however did. — Sandra Young
– Sandra Young, Commented Oct 10, 2018 at 13:42

Vishnudev Krishnadas · Accepted Answer · 2018-10-10 13:14:12Z

2

This error is usually raised when the file read using pandas is either corrupted or not in a readable state. Modifying code as below should work:

import pandas as pd
import os

path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
    if file.endswith('.csv'):
        df = pd.read_csv(path+file)
        df = df.drop('Hits', axis=1)
        df = df.drop('Score', axis=1)
        df = df.drop('Score.1', axis=1)
        filename = os.path.splitext(file)
        (f, ext) = filename
        df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)

answered Oct 10, 2018 at 13:14

Vishnudev Krishnadas

11k2 gold badges29 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sandra Young Over a year ago

Thank you! That worked! So basically if it gives that and you know the files are OK, it is because Python isn't recognising what type of file they are?

Vishnudev Krishnadas Over a year ago

If you know that the file isn't corrupted and the delimiter used in the text file i.e. a comma, a tab etc then there shouldn't be a problem. It's not that python is not recognizing the file type, it is that the read_csv function is not able to automatically find the seperator. Read

Sandra Young Over a year ago

Thank you, your help has been very useful.

Collectives™ on Stack Overflow

ParserError: Error tokenizing data C error

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related