How to compare strings in python 2 and 3?

Question

I was playing around with some python and came up with this code:

import time
N = 10000000
t1 = time.time()
for _ in range(N):
    if 'lol' in ['lol']:
        pass
print(time.time() - t1)

t1 = time.time()
for _ in range(N):
    if 'lol' == 'lol':
        pass
print(time.time() - t1)

so, if I use python2:

(test) C:\Users\test>python test.py
0.530999898911
0.5

(test) C:\Users\test>python test.py
0.531000137329
0.5

(test) C:\Users\test>python test.py
0.528000116348
0.501000165939

And it is nice - I like that second variant is quicker and I should use 'lol' == 'lol' as it is more pythonic way to compare two strings. But what happens if I use python3:

(test) C:\Users\test>python3 test.py
0.37500524520874023
0.3880295753479004

(test) C:\Users\test>python3 test.py
0.3690001964569092
0.3780345916748047

(test) C:\User\test>python3 test.py
0.37799692153930664
0.38797974586486816

using timeit:

(test) C:\Users\test>python3 -m timeit "'lol' in ['lol']"
100000000 loops, best of 3: 0.0183 usec per loop

(test) C:\Users\test>python3 -m timeit "'lol' == 'lol'"
100000000 loops, best of 3: 0.019 usec per loop

O my god! Why first variant is quicker? So should I use ugly style like 'lol' in ['lol'] when i use python3?

You can run these quickly on the command line with python3 -m timeit "'lol' == 'lol'" — Josh Lee
– Josh Lee, Commented Feb 15, 2018 at 15:01
You're talking about fractions of a second here. If your application is that performance critical you probably shouldn't be using python in the first place. Focus on making your code easy to understand. — 0x5453
– 0x5453, Commented Feb 15, 2018 at 15:02
The answer to your question is absolutely NO. You should code in a readable way and optimize if you need to. — Alex
– Alex, Commented Feb 15, 2018 at 15:02
"So should I use ugly style like 'lol' in ['lol'] when i use python3?" - No, no, no, no, no, no. As others have already said, you primary goal should be making readable code. Only optimize when you have to. Remember, code is written once but read many times. Also, I have doubts if using in is truly faster. — Chris
– Chris, Commented Feb 15, 2018 at 15:04

Josh Lee · Accepted Answer · 2018-02-15 15:04:04Z

4

The bulk of your python2 time is in constructing a huge list by calling range. Change it to xrange in Python 2, or use the timeit module which is properly written. Once you've done that, you will not find an appreciable difference that will motivate writing strange-looking code.

answered Feb 15, 2018 at 15:04

Josh Lee

179k39 gold badges278 silver badges282 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ACascarino Over a year ago

Out of interest, do you have any insights on why the in operator is (fractionally) faster than an equality check?

Chris_Rands Over a year ago

@ACascarino I think the reason is that the in operator tests not only equality == but also identity is and python3 -m timeit "'lol' is 'lol'" is fastest. However, of course is is not required to give the correct result for more complex strings (depends on string interning), so it's not recommended for these purposes

Chris_Rands Over a year ago

@ACascarino Some evidence: 'lol' is ''.join('lol') >> False, python3 -m timeit "'lol' == ''.join('lol')" >> 0.244 and python3 -m timeit "'lol' in [''.join('lol')]" >> 0.277. So the negligible time advantage of in is reversed when 2 strings being compared are not identical

Josh Lee Over a year ago

There's an even finer distinction if the unidentical strings have the same length or a different length.

Arne · Accepted Answer · 2018-02-16 07:44:13Z

3

So should I use ugly style like 'lol' in ['lol'] when i use python3?

No, readability counts.

Also, as others have noted, your test case has weaknesses:

$ python3 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.024 usec per loop
$ python3 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0214 usec per loop
$ python2 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.0258 usec per loop
$ python2 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0212 usec per loop

There is no difference between python2 and python3 when it comes to which comparison is faster.

Another source of confusion might be due to the opaque behavior of python interpreters[1] when it comes to string caching/interning. As a rule of thumb, strings shorter than four characters are interned and will refer to the same object. It can be tested with something like

a = 'lol'
b = 'lol'
a is b  # tests for object id instead of applying an equality comparison
>> True

Other strings may also be interned, but an easy counterexample is one of a string with 4 characters that includes special characters:

a = '####'
b = '####'
a is b
>> False

Of course, testing for object ids is faster than making an actual comparison, and your test using in did just that. Even though the code itself looks straight forward, the actual operation was unexpected. That also means that slightly different scenarios may lead to surprising results and Funny Bugs.

In conclusion I'd repeat once more: No, you should not prefer the second variant of comparison over the first.

[1]: Only CPython. I do not know if other python interpreters do something similar.

edited Feb 16, 2018 at 7:44

answered Feb 15, 2018 at 15:09

Arne

20.7k11 gold badges101 silver badges107 bronze badges

2 Comments

Chris_Rands Over a year ago

It feels like you took this edit idea directly from my comments. Anyway, you've explained it wrong, there is no is in example. The in operator for lists tests first identity and second (if the first fails) equality, hence why it can be very marginally faster as I explained under Josh's answer

Arne Over a year ago

@Chris_Rands Thanks for spotting the typo. I wanted to address interning from the start, but since it is not what OP asked about I added it as an afterthought, and not in the main answer,

Collectives™ on Stack Overflow

How to compare strings in python 2 and 3?

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related