7

These two questions concern using import inside a function vs. at the top of a module. I do not need to be convinced to put my imports at the top, there are good reasons to do that. However, to better understand the technical issues I would like to ask a followup.

Can you get the best of both worlds performance-wise by using a closure and only importing on first run of the function?


Specifically, suppose you have code such as:

import sys
def get_version():
    return sys.version

You want the import to only happen if the function ever gets called, so you move it inside:

def get_version():
    import sys
    return sys.version

But now it is slow if it does get called a lot, so you try something more complex:

def _get_version():
    import sys

    def nested():
        return sys.version

    global get_version
    get_version = nested
    return nested()
get_version = _get_version

Now at least a basic performace test indicates that this last option is slightly slower than the first (taking ~110% as long), but much faster than the second (taking ~20% as long).


First, does this actually work? Do my measurements accurately depict that the second example does more work or is it an artifact of how I measured things.

Second, is there a slowdown from the closure – beyond the first time the function is run?

2
  • You can use the lru_cache decorator from functools if it is a peer function. Not sure if it's a great idea or not, but it would cache the result of the function after the first time it's run. Commented Jan 5, 2017 at 15:34
  • 1
    @sytech: The lru_cache cache test is quite heavy, I doubt it'll give much of a performance boost for something as lightweight as an import (which, past the initial file load, is nothing more than binding a name). Commented Jan 5, 2017 at 15:39

1 Answer 1

3

Closure dereferencing is not any faster than global lookups:

>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=6, micro=0, releaselevel='final', serial=0)
>>> from timeit import timeit
>>> glob = 'foo'
>>> def f1(): return glob
...
>>> def closure():
...     closed_over = 'bar'
...     def f2():
...         return closed_over
...     return f2
...
>>> f2 = closure()
>>> timeit(f1, number=10**7)
0.8623221110319719
>>> timeit(f2, number=10**7)
0.872071701916866

In addition, even if it were faster, the tradeoff against readability are not worth it, certainly not when faster options are available for when you really need to optimise code.

Locals are the fastest option, always, if you really need to optimise code called from a tight loop, the proper hybrid is using function argument defaults:

import sys.version

def get_version(_sys_version=sys.version):
    return _sys_version

If you are concerned with the impact of the initial file load from an import at startup time, perhaps you should look at the py-demandimport project instead, which postpones loading modules until the first time they are used.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, like I wrote in the question my measurements indicated it was slightly slower. The point was to optimize speed while retaining the only-import-if-called property which your function argument example lacks.
This was just hypothetical, but that py-demandimport project seems like something I could actually use. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.