2

I have python code that loops over a file. I'm getting a UTF-8 error (invalid continuation byte) when I read over the file. I just want my program to ignore that.

I've tried using a try except around the code inside, but that won't work since the error is in the condition of the for loop. I've also tried using a try except around the loop but then when it catches the error it doesn't start the loop again.

with open(input_file_path, "r") as input_file:
    for line in input_file:
        # code irrelevant to question

What happens is it gives this error on for line in input_file:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 5: invalid continuation byte`

I want it to skip that line and move to the next one. Essentially, a try catch on the condition of my for loop.

2
  • Are you sure the file should be decoded as a UTF-8 stream? Adding the correct encoding the open function may be the correct answer. Commented Oct 5, 2019 at 16:20
  • @chepner Yes I am. Commented Oct 5, 2019 at 16:23

3 Answers 3

3

Does this work? (edited to solution OP found)

with open(input_file_path, "r", encoding="utf8", errors="surrogateescape") as input_file:
    for line in input_file:
        try:
            yourcode
        except:
            continue
Sign up to request clarification or add additional context in comments.

5 Comments

Yes I did and it doesn't work since te error is on the line line in input_file
then its not a line issue but prob a file issue? is the encoding correct?
a quick search came up with this. stackoverflow.com/questions/19699367/…
Yes I did see that.
If you change your answer to this: with open(input_file_path, "r", encoding="utf8", errors="surrogateescape") as input_file: I'll accept it. That worked for me
1

Have you tried something like this, when the UnicodeDecoceError is raised, the loop will continue with the next iteration.

with open(input_file_path, "rb") as input_file:
    for line in input_file:
        try:
            line_i = line.decode(encoding='utf-8')
        except UnicodeDecodeError:
            continue

2 Comments

I tried that, but it doesn't work since the error is on the line for line in input_file
@SheshankS. Allright, maybe then read the file content as binary and convert at a later moment.
0

You can use

with open(input_file_path, "r", encoding="ISO-8859-1") as input_file:
    for line in input_file:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.