1

I am trying to run this code which removes unnecessary columns from a dataframe for later processing. It loops through the first files then gives the error below. Before it was running fine. I saw something about it maybe being a corrupted file, so I deleted all previous files and have gone through producing all the files in the steps again, but I am still getting the error. Sorry if it is long winded, I need to show each step for my thesis and also I am still very much a novice programmer. Can anyone help with fixing this issue?

The code is:

import pandas as pd
import os

path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
    df = pd.read_csv(path+file)
    df = df.drop('Hits', axis=1)
    df = df.drop('Score', axis=1)
    df = df.drop('Score.1', axis=1)
    print(df)
    filename = os.path.splitext(file)
    (f, ext) = filename
    print(f)
    df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)

The error message is as follows:

Traceback (most recent call last):
  File "/home/sandra/git/trees/trees/remove_columns.py", line 9, in <module>
    df = pd.read_csv(path+file)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 737, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'

.

5
  • 1
    Have you tried doing what the error suggests? df = pd.read_csv(path+file, engine='python') Commented Oct 10, 2018 at 12:56
  • Do you have any files in the folder which are not csv? Commented Oct 10, 2018 at 12:59
  • @Chris you'd be shocked at how many people don't read error messages, even when they say exactly how to fix their particular problem. Commented Oct 10, 2018 at 13:17
  • Thank you, I will try that. I had read the error message, I am sorry I am a novice programmer and I don't always understand what is being asked of me. I was also confused by the fact that it works for the first files and then fails, although all the files have been produced by the same means. Thank you. Commented Oct 10, 2018 at 13:30
  • This solution didn't work for me. The one below however did. Commented Oct 10, 2018 at 13:42

1 Answer 1

2

This error is usually raised when the file read using pandas is either corrupted or not in a readable state. Modifying code as below should work:

import pandas as pd
import os

path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
    if file.endswith('.csv'):
        df = pd.read_csv(path+file)
        df = df.drop('Hits', axis=1)
        df = df.drop('Score', axis=1)
        df = df.drop('Score.1', axis=1)
        filename = os.path.splitext(file)
        (f, ext) = filename
        df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! That worked! So basically if it gives that and you know the files are OK, it is because Python isn't recognising what type of file they are?
If you know that the file isn't corrupted and the delimiter used in the text file i.e. a comma, a tab etc then there shouldn't be a problem. It's not that python is not recognizing the file type, it is that the read_csv function is not able to automatically find the seperator. Read
Thank you, your help has been very useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.