3

Was writing a blog post about some python coding styles and came across something that I found very strange and I was wondering if someone understood what was going on with it. Basically I've got two versions of the same function:

a = lambda x: (i for i in range(x))
def b(x):
    for i in range(x):
        yield i

And I want to compare the performance of these two doing just being set up. In my mind this should involve a negligible amount of computation and both methods should come up pretty close to zero, however, when I actually ran the timeit:

def timing(x, number=10):
    implicit = timeit.timeit('a(%s)' % int(x), 'from __main__ import a', number=number)
    explicit = timeit.timeit('b(%s)' % int(x), 'from __main__ import b', number=number)
    return (implicit, explicit)

def plot_timings(*args, **kwargs):
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    x_vector = np.linspace(*args, **kwargs)
    timings = np.vectorize(timing)(x_vector)
    ax.plot(x_vector, timings[0], 'b--')
    ax.plot(x_vector, timings[1], 'r--')
    ax.set_yscale('log')
    plt.show()

plot_timings(1, 1000000, 20)

I get a HUGE difference between the two methods as shown below:

<code>a</code> is in blue, <code>b</code> is in red

Where a is in blue, and b is in red.

Why is the difference so huge? It looks the explicit for loop version is also growing logarithmically, while the implicit version is doing nothing (as it should).

Any thoughts?

2
  • what do the axes mean? Commented Feb 21, 2014 at 2:27
  • 1
    I think you have it backwards in a couple of your statements? The lambda version is the one growing logarithmically. Commented Feb 21, 2014 at 2:32

2 Answers 2

2

The difference is caused by range

a needs to call range when you construct it.
b doesn't need to call range until the first iteration

>>> def myrange(n):
...     print "myrange(%s)"%n
...     return range(n)
... 
>>> a = lambda x: (i for i in myrange(x))
>>> def b(x):
...     for i in myrange(x):
...         yield i
... 
>>> a(100)
myrange(100)
range(100)
<generator object <genexpr> at 0xd62d70>
>>> b(100)
<generator object b at 0xdadb90>
>>> next(_)   # <-- first iteration of b(100)
myrange(100)
range(100)
0
Sign up to request clarification or add additional context in comments.

Comments

0

The lambda call is the slow one. Check this out:

import cProfile

a = lambda x: (i for i in range(x))

def b(x):
    for i in range(x):
        yield i

def c(x):
    for i in xrange(x):
        yield i

def d(x):
    i = 0
    while i < x:
        yield i
        i += 1


N = 100000
print " -- a --"
cProfile.run("""
for x in xrange(%i):
    a(x)
""" % N)

print " -- b --"
cProfile.run("""
for x in xrange(%i):
    b(x)
""" % N)

print " -- c --"
cProfile.run("""
for x in xrange(%i):
    c(x)
""" % N)

print " -- d --"
cProfile.run("""
for x in xrange(%i):
    d(x)
""" % N)

print " -- a (again) --"
cProfile.run("""
for x in xrange(%i):
    a(x)
""" % N)

Gives me the following results:

 -- a --
         300002 function calls in 61.764 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.881   30.881   61.764   61.764 <string>:3(<module>)
   100000    0.051    0.000    0.051    0.000 test.py:5(<genexpr>)
   100000    0.247    0.000   30.832    0.000 test.py:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000   30.585    0.000   30.585    0.000 {range}


 -- b --
         100002 function calls in 0.076 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.066    0.066    0.076    0.076 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:7(b)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- c --
         100002 function calls in 0.075 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.065    0.065    0.075    0.075 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:11(c)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- d --
         100002 function calls in 0.075 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.065    0.065    0.075    0.075 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:15(d)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- a (again) --
         300002 function calls in 60.890 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.487   30.487   60.890   60.890 <string>:3(<module>)
   100000    0.049    0.000    0.049    0.000 test.py:5(<genexpr>)
   100000    0.237    0.000   30.355    0.000 test.py:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000   30.118    0.000   30.118    0.000 {range}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.