Caching in python using *args and lambda functions

Question

I recently attempted Googles foo.bar challenge. After my time was up I decided to try find a solution to the problem I couldn't do and found a solution here (includes the problem statement if you're interested). I'd previously been making a dictionary for every function I wanted to cache but it looks like in this solution any function/input can be cached using the same syntax.

Firstly I'm confused on how the code is even working, the *args variable isn't inputted as an argument (and prints to nothing). Heres an modified minimal example to illustrate my confusion:

mem = {}

def memoize(key, func, *args):
    """
    Helper to memoize the output of a function
    """

    print(args)

    if key not in mem:
        # store the output of the function in memory
        mem[key] = func(*args)

    return mem[key]

def example(n):
    return memoize(
        n,
        lambda: longrun(n),
    )

def example2(n):
     return memoize(
        n,
        longrun(n),
     )

def longrun(n):
    for i in range(10000):
        for j in range(100000):
            2**10
    return n

Here I use the same memoize function but with a print. The function example returns memoize(n, a lambda function,). The function longrun is just an identity function with lots of useless computation so it's easy to see if the cache is working (example(2) will take ~5 seconds the first time and be almost instant after).

Here are my confusions:

Why is the third argument of memoize empty? When args is printed in memoize it prints (). Yet somehow mem[key] stores func(*args) as func(key)?
Why does this behavior only work when using the lambda function (example will cache but example2 won't)? I thought lambda: longrun(n) is just a short way of giving as input a function which returns longrun(n).

As a bonus, does anyone know how you could memoize functions using a decorator?

Also I couldn't think of a more descriptive title, edits welcome. Thanks.

Have a look at docs.python.org/3/library/functools.html#functools.lru_cache — janbrohl
– janbrohl, Commented Sep 23, 2016 at 10:55
Please see the docs and use search. The *args notation provides variadic arguments. As you don't provide any arguments, *args is empty. example2 doesn't work, since you don't provide it a function, you provide the result of calling a function. It should read memoize(n, longrun, n). — MisterMiyagi
– MisterMiyagi, Commented Sep 23, 2016 at 10:56
Thanks @janbrohl, exactly what I was after regarding the decorator! — HBeel
– HBeel, Commented Sep 23, 2016 at 11:34

MisterMiyagi · Accepted Answer · 2016-09-23 13:03:07Z

2

The notation *args stands for a variable number of positional arguments. For example, print can be used as print(1), print(1, 2), print(1, 2, 3) and so on. Similarly, **kwargs stands for a variable number of keyword arguments.

Note that the names args and kwargs are just a convention - it's the * and ** symbols that make them variadic.

Anyways, memoize uses this to accept basically any input to func. If the result of func isn't cached, it's called with the arguments. In a function call, *args is basically the reverse of *args in a function definition. For example, the following are equivalent:

# provide *args explicitly
print(1, 2, 3)
# unpack iterable to *args
arguments = 1, 2, 3
print(*arguments)

If args is empty, then calling print(*args) is the same as calling print() - no arguments are passed to it.

Functions and lambda functions are the same in python. It's simply a different notation for creating a function object.

The problem is that in example2, you are not passing a function. You call a function, then pass on its result. Instead, you have to pass on the function and its argument separately.

def example2(n):
    return memoize(
        n,
        longrun,  # no () means no call, just the function object
        # all following parameters are put into *args
        n
    )

Now, some implementation details: why is args empty and why is there a separate key?

The empty args comes from your definition of the lambda. Let's write that as a function for clarity:
```
def example3(n):
    def nonlambda():
        return longrun(n)
    return memoize(n, nonlambda)
```
Note how nonlambda takes no arguments. The parameter n is bound from the containing scope as a closure, bound from the containing scope. As such, you don't have to pass it to memoize - it is already bound inside the nonlambda. Thus, args is empty in memoize, even though longrun does receive a parameter, because the two don't interact directly.
Now, why is it mem[key] = f(*args), not mem[key] = f(key)? That's actually slightly the wrong question; the right question is "why isn't it mem[f, args] = f(*args)?".

Memoization works because the same input to the same function leads to the same output. That is, f, args identifies your output. Ideally, your key would be f, args as that's the only relevant information.

The problem is you need a way to look up f and args inside mem. If you ever tried putting a list inside a dict, you know there are some types which don't work in mappings (or any other suitable lookup structure, for that matter). So if you define key = f, args, you cannot memoize functions taking mutable/unhashable types. Python's functools.lru_cache actually has this limitation.

Defining an explicit key is one way of solving this problem. It has the advantage that the caller can select an appropriate key, for example taking n without any modifications. This offers the best optimization potential. However, it breaks easily - using just n misses out the actual function called. Memoizing a second function with the same input would break your cache.

There are alternative approaches, each with pros and cons. Common is the explicit conversion of types: list to tuple, set to frozenset, and so on. This is slow, but the most precise. Another approach is to just call str or repr as in key = repr((f, args, sorted(kwargs.items()))), but it relies on every value having a proper repr.

edited Sep 23, 2016 at 13:03

answered Sep 23, 2016 at 11:02

MisterMiyagi

53.4k14 gold badges131 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

HBeel Over a year ago

Thanks for your answer. I understand what *args means, my confusion comes from the fact that when example is called it's given *args as empty. The dictionary should be storing mem[key] = f(key) but instead it looks like its storing mem[key] = f() (which shouldn't even work with longrun). I would have expected to have seen def example(n): return memoize( n, lambda: longrun(n), n ) (like how you have demonstrated example2, which works without the last argument too. This is my question, how does this still work without the last argument?)

HBeel Over a year ago

Thanks for pointing out the problem in example2, I missed that!

MisterMiyagi Over a year ago

@HBeel I've added an explanation for f() and key. In short, example does not pass on longrun (which needs a parameter) but the lambda function (which does not need a parameter).

HBeel Over a year ago

Thanks for the additional information @MisterMiyagi. If I understand correctly example is combining a function and its arguments into a single function (which takes no arguments); something like f_n() = f(n)? Your second point also makes a lot of sense, thanks. For this particular problem, why did the author of this code even both with *args if it's always passed empty? Perhaps a general piece of code they use often, sometimes using extra arguments?

Collectives™ on Stack Overflow

Caching in python using *args and lambda functions

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related