0

I was reading an interesting post on Short-Circuiting in Python and wondered if this was true for the in operator. My simple testing would conclude that it does not:

%%timeit -n 1000
0 in list(range(10))
1000 loops, best of 3: 639 ns per loop

%%timeit -n 1000
0 in list(range(1000))
1000 loops, best of 3: 23.7 µs per loop
# larger the list, the longer it takes. however, i do notice that a higher 
# value does take longer.

%%timeit -n 1000
999 in list(range(1000))
1000 loops, best of 3: 45.1 µs per loop

Is there a detailed explanation of why 999 takes longer than 0. Is the in operator like a loop?

Also, is there a way to tell the in operator to "stop the loop" once the value is found (or is this the already defaulted behavior that I'm not seeing)?

Lastly- Is there another operator/function that I am skipping over that does what I'm talking about in regards to "short-circuiting" in?

9
  • Is the in operator like a loop? Yes, when you call in with a list, a C loop is invoked, I believe. Commented Jan 4, 2018 at 22:24
  • Also, in is a conditional operator, not a logical operator, so "short circuiting" does not apply here. Commented Jan 4, 2018 at 22:25
  • @cᴏʟᴅsᴘᴇᴇᴅ, ah! okay. I can't find that in the docs when I Google it. do you have any citations on this? Commented Jan 4, 2018 at 22:26
  • 3
    in list will iterate through the list until it finds the element. If that's what you mean, it is short-circuited: it doesn't carry on looking through the list once the element has been found. Commented Jan 4, 2018 at 22:29
  • 1
    I think this answer might be interesting too (it is slightly different, to what you're asking since you're converting to list, but it is a very interesting explanation on how range works) Commented Jan 4, 2018 at 22:32

3 Answers 3

4

Short circuiting does occur. The in operator calls the __contains__ method, which in turn is implemented differently per class (in your case list). Searching for 999 takes around double the time as searching for 0, since half of the work is creating the list, and the other half is iterating through it, which is short circuited in the case of 0.

Sign up to request clarification or add additional context in comments.

5 Comments

ah! so the double in time would be due to list(range(x)) being set up?
Yes. This is the case.
which in turn is implemented differently per class - could you elaborate on this? I feel like there is something important packed inside this statement!
When creating a new class, Python supports operator overloading. Statements such as a = A() + 5 can work by implementing A.__add__(self, other). Similarly, when Python runs into x in y, it checks if ys class implemented the __contains__ method and if so, calls y.__contains__(x). If, on the other hand, the method wasn't implemented, a TypeError exception will be thrown.
@StefanPochmann, your comment is correct, my explanation was a simplified version. If __contains__ isn't impleneted, then it'll try iterating through the object, looking for the requested value. This means it's enough to implement __iter__, or even just __getitem__. See here for details: docs.python.org/3/reference/datamodel.html#object.__contains__.
3

The implementation of in for list objects is found in list_contains. It performs a scan of the list and does exit early if the last comparison has found the element, there's no point in continuing there.

The loop involved is:

for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
    cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                       Py_EQ);

If cmp is 1 (the value returned from PyObject_RichCompareBool for a match), the for loop condition (cmp == 0 && i < Py_SIZE(a)) becomes false and terminates.

For list objects, which are built-in, what is called for in is a C function (for CPython). For other implementations of Python, this can be a different language using different language constructs.

For user-defined classes in Python, what is called is defined in the Membership test operations of the Reference Manual, take a look there for a run-down of what gets called.


You could also come to this conclusion by timing:

l = [*range(1000)]    
%timeit 1 in l
85.8 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 999 in l
22 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The furthest the element the more you need to scan. If it didn't short-circuit, all in operations would result in similar timings.

5 Comments

I believe I created a Red Herring when i wrapped list() around range(). with l already defined, this clears it up! Does in call different "functions" for different data types based on the latter half of the in operator? (please excuse my bad nomenclature if I've used the wrong terms)
@MattR yes, it does. You can take a look at Membership test operations in the Reference Manual for the full run-down of what is called (which is too much for a comment to include).
I really appreciate the links to the documentation. I'll read up some more.
You're welcome @MattR. Thankfully Python's docs are quite good and readable. It's worth reading through them.
It will also be educating to time the operations for the "range" object alone, without converting it to a list.
0

Here's another look with a hashed object, set:

from time import time

qlist = list(range(1000))
qset = set(qlist)

start = time()
for i in range(1000):
    0 in qlist
print time() - start

start = time()
for i in range(1000):
    999 in qlist
print time() - start

start = time()
for i in range(1000):
    0 in qset
print time() - start

start = time()
for i in range(1000):
    999 in qset
print time() - start

Output:

0.000172853469849    0 in list
0.0399038791656    999 in list
0.000147104263306i   0 in set
0.000195980072021  999 in set

As others have said, the list implementation must do a sequential search. Set inclusion uses a hashed value, and is on par with finding the item in the first element checked.

3 Comments

in is faster in a set than a list, but what does it show regarding short-circuiting?
I appreciate the optimization piece.
@Loquacious: this merely highlights the effect of early loop exit with a list; set in doesn't short-circuit at all, because the reference is direct.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.