0

I'm trying to remove some items from a list,

list1 = ["CCC-C", "CCC-P", "CCC-A-P", "CCC-A-H", "CCC-J", "CCC-S-X"]
new_list = [i for i in list1 if (len(i) == 5 or len(i) == 7 or i[6] != "H")]

Unless any item in list1 has length 5 or 7, or its 7th character is "H", it shouldn't be in new_list.

But the code above includes "CCC-A-H" item in new_list. Besides, it doesn't give "IndexError: string index out of range" error when checking i[6] for item "CCC-C". Any ideas?

Regards,

6
  • len('CCC-A-H') = 7, that's why it is in the new list Commented Nov 30, 2011 at 7:05
  • Can there be items of length 0-4 or 6 in your list? Commented Nov 30, 2011 at 7:39
  • No, only items of length 5 and 7 are allowed. And items of length 7 shouldn't have "H" as its 7th character. Commented Nov 30, 2011 at 7:46
  • My question was not clear enough. I know that you just want to keep those, but could the others occur in your data source? Commented Nov 30, 2011 at 7:51
  • Yes, any items can occur in list1. Commented Nov 30, 2011 at 7:54

4 Answers 4

2

Boolean expression in Python is executed in following order:

>>> A() or B()

If A() returns True, there is no need to check B()

>>> A() and B()

If A() return False, there is no need to check B()

I hope it gives you some idea.

Sign up to request clarification or add additional context in comments.

3 Comments

Then I have to write it this way?: new_list = [i for i in ranks if ( i[6] != "H" or len(i) == 5 or len(i) == 7)]
@alwbtc: If you want to encounter an IndexError, then yes.
I don't understand what it is that you really want to do. If items of length 5 are supposed to be allowed, then isn't it better that you don't try to check i[6]?
2

Do this:

new_list = [i for i in list1 if len(i)==5 or (len(i)==7 and i[6]!="H")]

That way, you only get the items that are of length 5 (condition len(i)==5) or items that are of length 7, unless the last character is an H (condition (len(i)==7 and i[6]!="H")).

The potentially IndexError-prone condition i[6]!="H" will only be evaluated if the string is of length 7, ensuring that you're not going to get this error.

1 Comment

I think it might just be possible that you're psychic.
0

Besides, it doesn't give "IndexError: string index out of range" error when checking i[6] for item "CCC-C".

or is short circuiting. If the first condition given for an or condition is True, then it doesn't compute the second, as irrespective of the the truth value of the second argument, the overall evaluated value remains True.

Also, "CCC-A-H" matches because, it's length is 7. If you don't want those strings that end with H in the 7th position irrespective of their length, you should redo your logical expression:

new_list = [i for i in list1 if (len(i) == 5 or len(i) == 7) and (i[6] != "H")]

2 Comments

But items that have length 5 cannot be checked with i[6], right? Thus, there will be an error.
The bracketing should be the other way, but you still want and rather than or the second time... if I've finally figured out what you mean.
0

As others have pointed out, len("CCC-A-H") == 7 and python employs short-circuit evaluation on boolean operations. The end result is that:

(len("CCC-A-H") == 5 or len("CCC-A-H") == 7 or "CCC-A-H"[6] != "H")

will return true because len("CCC-A-H") == 7 evaluates to true before "CCC-A-H"[6] != "H" is evaluated.

This may be easier to see by using the filter(...) function instead of a list comprehension:

list1 = ["CCC-C", "CCC-P", "CCC-A-P", "CCC-A-H", "CCC-J", "CCC-S-X"]
def len57notHWrong(item):
    return len(item) == 5 or len(item) == 7 or item[6] != "H"

print "Wrong           : ", filter(len57notHWrong, list1)

This is a simple direct translation of the list comprehension you used to using the filter(...) function.

If we were to rewrite this using if ... elif ... else constructs, it would look something like this:

def len57notHWrongExpanded(item):
    if len(item) == 5:     # first check if length is 5
        return True
    elif len(item) == 7:   # now check if length is 7
        return True        # it's 7? Short-circuit, return True
    elif item[6] != "H": # This will never get seen (on this particular dataset)
        return True

    return False

print "Wrong (Expanded): ", filter(len57notHWrongExpanded, list1)

A correct expression would look like:

def len57notH(item):
    return len(item) == 5 or (len(item) == 7 and item[6] != "H")

print "Correct         : ", filter(len57notH, list1)

Expanded:

def len57notHExpanded(item):
    if len(item) == 5:
        return True
    elif len(item) == 7:
        if item[6] != "H":
            return True
    return False

print "Correct (Expand): ", filter(len57notHExpanded, list1)

This would make the list comprehension look like:

new_list = [i for i in list1 if (len(i) == 5 or (len(i) == 7 and i[6] != "H"))]

The reason your code doesn't raise an IndexError is because all of your data items are either 5 or 7 elements long. This causes the code to short circuit prior to hitting the i[6] != "H" expression. If you try this code on a list that contains data items that are not of length 5 or 7 and less than 7 elements long, then IndexError is raised:

list2 = ["CCC-C", "CCC-P", "CCC", "CCC-A-P", "CCC-A-H", "CCC-J", "CCC-S-X"]
new_list = [i for i in list2 if (len(i) == 5 or len(i) == 7 or i[6] != "H")]

Traceback (most recent call last):
  File "C:/Users/xxxxxxxx/Desktop/t.py", line 44, in <module>
    new_list = [i for i in list2 if (len(i) == 5 or len(i) == 7 or i[6] != "H")]
IndexError: string index out of range

Sorry, it's a bit of a long answer...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.