Follow up question, what changes when this code is in an interpreted language instead of compiled? Then, does the optimization matter or does it have the same result?
Let's measure! I've transcribed the examples to Python:
def IsSumInRangeWithVar(a, b):
s = a + b
if s > 1000 or s < -1000:
return False
else:
return True
def IsSumInRangeWithoutVar(a, b):
if a + b > 1000 or a + b < -1000:
return False
else:
return True
def IsSumInRangeSuperOptimized(a, b):
return abs(a + b) <= 1000
from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)
print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)
print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)
print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))
Run with Python 3.5.2, this produces the output:
IsSumInRangeWithVar
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 BINARY_ADD
7 STORE_FAST 2 (s)
3 10 LOAD_FAST 2 (s)
13 LOAD_CONST 1 (1000)
16 COMPARE_OP 4 (>)
19 POP_JUMP_IF_TRUE 34
22 LOAD_FAST 2 (s)
25 LOAD_CONST 4 (-1000)
28 COMPARE_OP 0 (<)
31 POP_JUMP_IF_FALSE 38
4 >> 34 LOAD_CONST 2 (False)
37 RETURN_VALUE
6 >> 38 LOAD_CONST 3 (True)
41 RETURN_VALUE
42 LOAD_CONST 0 (None)
45 RETURN_VALUE
IsSumInRangeWithoutVar
9 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 BINARY_ADD
7 LOAD_CONST 1 (1000)
10 COMPARE_OP 4 (>)
13 POP_JUMP_IF_TRUE 32
16 LOAD_FAST 0 (a)
19 LOAD_FAST 1 (b)
22 BINARY_ADD
23 LOAD_CONST 4 (-1000)
26 COMPARE_OP 0 (<)
29 POP_JUMP_IF_FALSE 36
10 >> 32 LOAD_CONST 2 (False)
35 RETURN_VALUE
12 >> 36 LOAD_CONST 3 (True)
39 RETURN_VALUE
40 LOAD_CONST 0 (None)
43 RETURN_VALUE
IsSumInRangeSuperOptimized
15 0 LOAD_GLOBAL 0 (abs)
3 LOAD_FAST 0 (a)
6 LOAD_FAST 1 (b)
9 BINARY_ADD
10 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
13 LOAD_CONST 1 (1000)
16 COMPARE_OP 1 (<=)
19 RETURN_VALUE
Benchmarking
IsSumInRangeWithVar: 0.019361s
IsSumInRangeWithoutVar: 0.020917s
IsSumInRangeSuperOptimized: 0.020171s
Disassembly in Python isn't terribly interesting, since the bytecode "compiler" doesn't do much in the way of optimization.
The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain.
However, if this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:
IsSumInRangeWithVar: 0.000180s
IsSumInRangeWithoutVar: 0.001175s
IsSumInRangeSuperOptimized: 0.001306s
Just using pypy, which uses JIT compilation to eliminate a lot of the interpreter overhead, has yielded a performance improvement of 1 or 2 orders of magnitude. I was quite shocked to see IsSumInRangeWithVar() is an order of magnitude faster than the others. So I changed the order of the benchmarks and ran again:
IsSumInRangeSuperOptimized: 0.000191s
IsSumInRangeWithoutVar: 0.001174s
IsSumInRangeWithVar: 0.001265s
So it seems it's not actually anything about the implementation that makes it fast, but rather the order in which I do the benchmarking!
I'd love to dig in to this more deeply, because honestly I don't know why this happens. But I believe the point has been made: micro-optimizations like whether to declare an intermediate value as a variable or not are rarely relevant. With an interpreted language or highly optimized compiler, the first objective is still to write clear code.
If further optimization might be required, benchmark. Remember that the best optimizations come not from the little details but the bigger algorithmic picture: pypy is going to be an order of magnitude faster for repeated evaluation of the same function than cpython because it uses faster algorithms (JIT compiler vs interpretation) to evaluate the program. And there's the coded algorithm to consider as well: a search through a B-tree will be faster than a linked list.
After ensuring you're using the right tools and algorithms for the job, be prepared to dive deep into the details of the system. The results can be very surprising, even for experienced developers, and this is why you must have a benchmark to quantify the changes.