2

I’m trying to read an unknown large csv file with pandas. I came across some errors so I added the following arguments:

df = pd.read_csv(csv_file, engine="python", error_bad_lines=False, warn_bad_lines=True)

It is working good and skipping offending lines, and errors are prompted to the terminal correctly, such as:

Skipping line 31175: field larger than field limit (131072)

However, I’d like to save all errors to a variable instead of printing them. How can I do it?

Note that I have a big program here and can't change the output of all logs from file=sys.stdout to something else. I need a case specific solution.

Thanks!

2
  • So do you really need to save the errors to a variable? Or are you asking how to log the errors to anywhere you want rather than just printing? For the logging part, you can look into how to redirect stderr (not stdout) to a file. Commented Feb 2, 2022 at 20:00
  • I need to both save it to a variable (I later send the errors to the user via API) and to keep them on stdout for my own use. Commented Feb 2, 2022 at 20:03

1 Answer 1

1

use on_bad_lines capability instead (available in pandas 1.4+):

badlines_list = []
def badlines_collect (bad_line: list[str]) -> None:
        badlines_list.append(bad_line)
        return None

df = pd.read_csv(csv_file, engine="python",on_bad_lines=badlines_collect)
   
Sign up to request clarification or add additional context in comments.

3 Comments

At first I couldn't use it due to deprecated pandas version. After upgrading the package I wrote exactly what you suggested, except for the : list[str]) - > list[str]: (which raised an 'type' object is not subscriptable error for me) and it didn't work... What is this part used for? Can I do it without this part?
@Itayst you need version 1.4 for that feature
Yeah as I said I upgraded the package, the issue is not with pandas but the syntax of the function. I'm getting error on the (bad_line: list[str]) -> None: part. How can I fix it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.