Skip to main content
Add Python syntax highlighting
Source Link
def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))
def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))
def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))
def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))
added 247 characters in body
Source Link
Phil Frost
  • 1.6k
  • 2
  • 11
  • 12

The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain. Though I'll add as I was trying different parameters to timeit, sometimes IsSumInRangeSuperOptimized() came out fastest, so I suspect it may be external factors responsible for the difference, rather than any intrinsic advantage of any implementation.

However, ifIf this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:

The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain.

However, if this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:

The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain. Though I'll add as I was trying different parameters to timeit, sometimes IsSumInRangeSuperOptimized() came out fastest, so I suspect it may be external factors responsible for the difference, rather than any intrinsic advantage of any implementation.

If this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:

answer followup question
Source Link
Phil Frost
  • 1.6k
  • 2
  • 11
  • 12

Follow up question, what changes when this code is in an interpreted language instead of compiled? Then, does the optimization matter or does it have the same result?

Let's measure! I've transcribed the examples to Python:

def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))

Run with Python 3.5.2, this produces the output:

IsSumInRangeWithVar
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD
              7 STORE_FAST               2 (s)

  3          10 LOAD_FAST                2 (s)
             13 LOAD_CONST               1 (1000)
             16 COMPARE_OP               4 (>)
             19 POP_JUMP_IF_TRUE        34
             22 LOAD_FAST                2 (s)
             25 LOAD_CONST               4 (-1000)
             28 COMPARE_OP               0 (<)
             31 POP_JUMP_IF_FALSE       38

  4     >>   34 LOAD_CONST               2 (False)
             37 RETURN_VALUE

  6     >>   38 LOAD_CONST               3 (True)
             41 RETURN_VALUE
             42 LOAD_CONST               0 (None)
             45 RETURN_VALUE

IsSumInRangeWithoutVar
  9           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD
              7 LOAD_CONST               1 (1000)
             10 COMPARE_OP               4 (>)
             13 POP_JUMP_IF_TRUE        32
             16 LOAD_FAST                0 (a)
             19 LOAD_FAST                1 (b)
             22 BINARY_ADD
             23 LOAD_CONST               4 (-1000)
             26 COMPARE_OP               0 (<)
             29 POP_JUMP_IF_FALSE       36

 10     >>   32 LOAD_CONST               2 (False)
             35 RETURN_VALUE

 12     >>   36 LOAD_CONST               3 (True)
             39 RETURN_VALUE
             40 LOAD_CONST               0 (None)
             43 RETURN_VALUE

IsSumInRangeSuperOptimized
 15           0 LOAD_GLOBAL              0 (abs)
              3 LOAD_FAST                0 (a)
              6 LOAD_FAST                1 (b)
              9 BINARY_ADD
             10 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             13 LOAD_CONST               1 (1000)
             16 COMPARE_OP               1 (<=)
             19 RETURN_VALUE

Benchmarking
IsSumInRangeWithVar: 0.019361s
IsSumInRangeWithoutVar: 0.020917s
IsSumInRangeSuperOptimized: 0.020171s

Disassembly in Python isn't terribly interesting, since the bytecode "compiler" doesn't do much in the way of optimization.

The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain.

However, if this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:

IsSumInRangeWithVar: 0.000180s
IsSumInRangeWithoutVar: 0.001175s
IsSumInRangeSuperOptimized: 0.001306s

Just using pypy, which uses JIT compilation to eliminate a lot of the interpreter overhead, has yielded a performance improvement of 1 or 2 orders of magnitude. I was quite shocked to see IsSumInRangeWithVar() is an order of magnitude faster than the others. So I changed the order of the benchmarks and ran again:

IsSumInRangeSuperOptimized: 0.000191s
IsSumInRangeWithoutVar: 0.001174s
IsSumInRangeWithVar: 0.001265s

So it seems it's not actually anything about the implementation that makes it fast, but rather the order in which I do the benchmarking!

I'd love to dig in to this more deeply, because honestly I don't know why this happens. But I believe the point has been made: micro-optimizations like whether to declare an intermediate value as a variable or not are rarely relevant. With an interpreted language or highly optimized compiler, the first objective is still to write clear code.

If further optimization might be required, benchmark. Remember that the best optimizations come not from the little details but the bigger algorithmic picture: pypy is going to be an order of magnitude faster for repeated evaluation of the same function than cpython because it uses faster algorithms (JIT compiler vs interpretation) to evaluate the program. And there's the coded algorithm to consider as well: a search through a B-tree will be faster than a linked list.

After ensuring you're using the right tools and algorithms for the job, be prepared to dive deep into the details of the system. The results can be very surprising, even for experienced developers, and this is why you must have a benchmark to quantify the changes.


Follow up question, what changes when this code is in an interpreted language instead of compiled? Then, does the optimization matter or does it have the same result?

Let's measure! I've transcribed the examples to Python:

def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))

Run with Python 3.5.2, this produces the output:

IsSumInRangeWithVar
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD
              7 STORE_FAST               2 (s)

  3          10 LOAD_FAST                2 (s)
             13 LOAD_CONST               1 (1000)
             16 COMPARE_OP               4 (>)
             19 POP_JUMP_IF_TRUE        34
             22 LOAD_FAST                2 (s)
             25 LOAD_CONST               4 (-1000)
             28 COMPARE_OP               0 (<)
             31 POP_JUMP_IF_FALSE       38

  4     >>   34 LOAD_CONST               2 (False)
             37 RETURN_VALUE

  6     >>   38 LOAD_CONST               3 (True)
             41 RETURN_VALUE
             42 LOAD_CONST               0 (None)
             45 RETURN_VALUE

IsSumInRangeWithoutVar
  9           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD
              7 LOAD_CONST               1 (1000)
             10 COMPARE_OP               4 (>)
             13 POP_JUMP_IF_TRUE        32
             16 LOAD_FAST                0 (a)
             19 LOAD_FAST                1 (b)
             22 BINARY_ADD
             23 LOAD_CONST               4 (-1000)
             26 COMPARE_OP               0 (<)
             29 POP_JUMP_IF_FALSE       36

 10     >>   32 LOAD_CONST               2 (False)
             35 RETURN_VALUE

 12     >>   36 LOAD_CONST               3 (True)
             39 RETURN_VALUE
             40 LOAD_CONST               0 (None)
             43 RETURN_VALUE

IsSumInRangeSuperOptimized
 15           0 LOAD_GLOBAL              0 (abs)
              3 LOAD_FAST                0 (a)
              6 LOAD_FAST                1 (b)
              9 BINARY_ADD
             10 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             13 LOAD_CONST               1 (1000)
             16 COMPARE_OP               1 (<=)
             19 RETURN_VALUE

Benchmarking
IsSumInRangeWithVar: 0.019361s
IsSumInRangeWithoutVar: 0.020917s
IsSumInRangeSuperOptimized: 0.020171s

Disassembly in Python isn't terribly interesting, since the bytecode "compiler" doesn't do much in the way of optimization.

The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain.

However, if this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:

IsSumInRangeWithVar: 0.000180s
IsSumInRangeWithoutVar: 0.001175s
IsSumInRangeSuperOptimized: 0.001306s

Just using pypy, which uses JIT compilation to eliminate a lot of the interpreter overhead, has yielded a performance improvement of 1 or 2 orders of magnitude. I was quite shocked to see IsSumInRangeWithVar() is an order of magnitude faster than the others. So I changed the order of the benchmarks and ran again:

IsSumInRangeSuperOptimized: 0.000191s
IsSumInRangeWithoutVar: 0.001174s
IsSumInRangeWithVar: 0.001265s

So it seems it's not actually anything about the implementation that makes it fast, but rather the order in which I do the benchmarking!

I'd love to dig in to this more deeply, because honestly I don't know why this happens. But I believe the point has been made: micro-optimizations like whether to declare an intermediate value as a variable or not are rarely relevant. With an interpreted language or highly optimized compiler, the first objective is still to write clear code.

If further optimization might be required, benchmark. Remember that the best optimizations come not from the little details but the bigger algorithmic picture: pypy is going to be an order of magnitude faster for repeated evaluation of the same function than cpython because it uses faster algorithms (JIT compiler vs interpretation) to evaluate the program. And there's the coded algorithm to consider as well: a search through a B-tree will be faster than a linked list.

After ensuring you're using the right tools and algorithms for the job, be prepared to dive deep into the details of the system. The results can be very surprising, even for experienced developers, and this is why you must have a benchmark to quantify the changes.

added 3 characters in body
Source Link
Deduplicator
  • 9.3k
  • 5
  • 34
  • 53
Loading
added 32 characters in body
Source Link
Phil Frost
  • 1.6k
  • 2
  • 11
  • 12
Loading
added 75 characters in body
Source Link
Phil Frost
  • 1.6k
  • 2
  • 11
  • 12
Loading
added syntax-highlighting
Source Link
Deduplicator
  • 9.3k
  • 5
  • 34
  • 53
Loading
Source Link
Phil Frost
  • 1.6k
  • 2
  • 11
  • 12
Loading