0

I am trying to read a tab-separated file and collect all characters except control characters. If a control character is hit, the remainder of line should be ignored too. I've tried the following code in Python 3.5, using a for..else loop:

import curses.ascii

input_file = ...
chars = set()
with open(input_file) as file:
    for line in file.readlines():
        source, target = line.split("\t")

        for c in source.strip() + target.strip():
            if curses.ascii.iscntrl(c):
                print("Control char hit.")
                break
            chars.add(c)
        else:
            print("Line contains control character:\n" + line)
            continue

        print("Line contains no control character:\n" + line.strip())

I'd expect this to check each character for being a control character and if it hits one (break is triggered), skip to the next line, hence trigger the else/continue statement.

What happens instead is that continue is always triggered, even if the break statement in the if clause is never reached for a line. Consequently, the final print statement is never reached either.

What am I doing wrong?

5
  • 2
    the else is triggered only when break is not triggered. Commented Jun 26, 2016 at 17:38
  • 2
    Hmmm, I suggest you read more about the for...else in python: How can I make sense of the else statement in Python loops? Commented Jun 26, 2016 at 17:41
  • Check this out if it helps - stackoverflow.com/questions/9979970/… Commented Jun 26, 2016 at 18:04
  • Thanks to all of you, your hints have helped. I find the terminology somewhat confusing though. Commented Jun 26, 2016 at 18:06
  • If you try to pair else with for, it could be confusing. I don't think the keyword else was a great choice for this syntax, But if you pair else with break, you can see it actually makes sense. Let me show how it works in human language. -- for each person in a group of suspects if anyone is the criminal break the investigation. else report failure. Commented May 31, 2018 at 14:51

1 Answer 1

3

The else block of a for loop is only executed if the for loop never was interrupted. You'll only see the continue statement in the else block executed if there were no control characters in the line. From the for statement documentation:

When the items are exhausted (which is immediately when the sequence is empty or an iterator raises a StopIteration exception), the suite in the else clause, if present, is executed, and the loop terminates.

A break statement executed in the first suite terminates the loop without executing the else clause’s suite.

A better test to see if there are control characters in a line is to use the any() function with a generator expression:

if any(curses.ascii.iscntrl(c) for c in source.strip() + target.strip()):
    print("Line contains control character:\n" + line)
    continue

or you could use a regular expression; this'll be faster as the looping over text is done in C code without having to box each individual character in a new str object:

import re

control_char = re.compile(r'[\x00-\x31]')

if control_char.search(source.strip() + target.strip()):
    print("Line contains control character:\n" + line)
    continue
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.