1

I was playing around with some python and came up with this code:

import time
N = 10000000
t1 = time.time()
for _ in range(N):
    if 'lol' in ['lol']:
        pass
print(time.time() - t1)

t1 = time.time()
for _ in range(N):
    if 'lol' == 'lol':
        pass
print(time.time() - t1)

so, if I use python2:

(test) C:\Users\test>python test.py
0.530999898911
0.5

(test) C:\Users\test>python test.py
0.531000137329
0.5

(test) C:\Users\test>python test.py
0.528000116348
0.501000165939

And it is nice - I like that second variant is quicker and I should use 'lol' == 'lol' as it is more pythonic way to compare two strings. But what happens if I use python3:

(test) C:\Users\test>python3 test.py
0.37500524520874023
0.3880295753479004

(test) C:\Users\test>python3 test.py
0.3690001964569092
0.3780345916748047

(test) C:\User\test>python3 test.py
0.37799692153930664
0.38797974586486816

using timeit:

(test) C:\Users\test>python3 -m timeit "'lol' in ['lol']"
100000000 loops, best of 3: 0.0183 usec per loop

(test) C:\Users\test>python3 -m timeit "'lol' == 'lol'"
100000000 loops, best of 3: 0.019 usec per loop

O my god! Why first variant is quicker? So should I use ugly style like 'lol' in ['lol'] when i use python3?

7
  • 7
    Show your timings with the timeit module Commented Feb 15, 2018 at 15:00
  • 3
    You can run these quickly on the command line with python3 -m timeit "'lol' == 'lol'" Commented Feb 15, 2018 at 15:01
  • 4
    You're talking about fractions of a second here. If your application is that performance critical you probably shouldn't be using python in the first place. Focus on making your code easy to understand. Commented Feb 15, 2018 at 15:02
  • 4
    The answer to your question is absolutely NO. You should code in a readable way and optimize if you need to. Commented Feb 15, 2018 at 15:02
  • 2
    "So should I use ugly style like 'lol' in ['lol'] when i use python3?" - No, no, no, no, no, no. As others have already said, you primary goal should be making readable code. Only optimize when you have to. Remember, code is written once but read many times. Also, I have doubts if using in is truly faster. Commented Feb 15, 2018 at 15:04

2 Answers 2

4

The bulk of your python2 time is in constructing a huge list by calling range. Change it to xrange in Python 2, or use the timeit module which is properly written. Once you've done that, you will not find an appreciable difference that will motivate writing strange-looking code.

Sign up to request clarification or add additional context in comments.

4 Comments

Out of interest, do you have any insights on why the in operator is (fractionally) faster than an equality check?
@ACascarino I think the reason is that the in operator tests not only equality == but also identity is and python3 -m timeit "'lol' is 'lol'" is fastest. However, of course is is not required to give the correct result for more complex strings (depends on string interning), so it's not recommended for these purposes
@ACascarino Some evidence: 'lol' is ''.join('lol') >> False, python3 -m timeit "'lol' == ''.join('lol')" >> 0.244 and python3 -m timeit "'lol' in [''.join('lol')]" >> 0.277. So the negligible time advantage of in is reversed when 2 strings being compared are not identical
There's an even finer distinction if the unidentical strings have the same length or a different length.
3

So should I use ugly style like 'lol' in ['lol'] when i use python3?

No, readability counts.

Also, as others have noted, your test case has weaknesses:

$ python3 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.024 usec per loop
$ python3 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0214 usec per loop
$ python2 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.0258 usec per loop
$ python2 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0212 usec per loop

There is no difference between python2 and python3 when it comes to which comparison is faster.


Another source of confusion might be due to the opaque behavior of python interpreters[1] when it comes to string caching/interning. As a rule of thumb, strings shorter than four characters are interned and will refer to the same object. It can be tested with something like

a = 'lol'
b = 'lol'
a is b  # tests for object id instead of applying an equality comparison
>> True

Other strings may also be interned, but an easy counterexample is one of a string with 4 characters that includes special characters:

a = '####'
b = '####'
a is b
>> False

Of course, testing for object ids is faster than making an actual comparison, and your test using in did just that. Even though the code itself looks straight forward, the actual operation was unexpected. That also means that slightly different scenarios may lead to surprising results and Funny Bugs.

In conclusion I'd repeat once more: No, you should not prefer the second variant of comparison over the first.

[1]: Only CPython. I do not know if other python interpreters do something similar.

2 Comments

It feels like you took this edit idea directly from my comments. Anyway, you've explained it wrong, there is no is in example. The in operator for lists tests first identity and second (if the first fails) equality, hence why it can be very marginally faster as I explained under Josh's answer
@Chris_Rands Thanks for spotting the typo. I wanted to address interning from the start, but since it is not what OP asked about I added it as an afterthought, and not in the main answer,

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.