1
\$\begingroup\$

I am working on a code that will minimize the number of ingredients necessary to make some dishes.

Each dish can be prepared in an arbitrary large number of ways, with combinations of two ingredients. The result is a tuple of recipes.

I have a code that works, but if there are a large number of items, it is very slow and causes "memory errors" in Python 3.9 on a laptop that has 32Go of RAM.

Here's a simple example, with 3 dishes:

import itertools as it 
salty_egg = (("egg", "salt"), ("egg", "soy sauce"), ("egg", "mountain salt"))
meat_starter = (("egg", "ham"), ("carrot", "ham"), ("ham", "salt"))
soy_carrot = (("carrot", "soy sauce"), ("pink carrot", "soy sauce"))
recipes = (salty_egg, meat_starter, soy_carrot)   

recipe_and_sizes = [(len(set(it.chain(*x))), x) for x in it.product(*recipes)]
recipes = min(recipe_and_sizes)[1]
print(recipes)
        
(('egg', 'soy sauce'), ('carrot', 'ham'), ('carrot', 'soy sauce'))

Any idea how to speed this up or maybe another approach to solve the problem?

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

With such a small input dataset, it's already fast and I wouldn't worry about either algorithmic optimisation or micro-optimisation. I do think you should unravel your generator to a generator function with named variables to make things easier to understand and more debuggable:

from itertools import product  # , chain
from typing import Iterator


def make_size_and_recipes() -> Iterator[tuple[int, tuple[str, ...]]]:
    salty_egg = (("egg", "salt"), ("egg", "soy sauce"), ("egg", "mountain salt"))
    meat_starter = (("egg", "ham"), ("carrot", "ham"), ("ham", "salt"))
    soy_carrot = (("carrot", "soy sauce"), ("pink carrot", "soy sauce"))
    recipes = (salty_egg, meat_starter, soy_carrot)

    for recipe in product(*recipes):
        # flat_ingredients = chain(*recipe)
        # unique = set(flat_ingredients)
        ingr_a, ingr_b, ingr_c = recipe
        unique = {*ingr_a, *ingr_b, *ingr_c}
        n_unique = len(unique)
        pair = n_unique, recipe
        yield pair


recipe_and_sizes = make_size_and_recipes()
best_size, best_recipe = min(recipe_and_sizes)
print(best_recipe)

The one critical change is that you not set recipe_and_sizes to be a list comprehension. That will eat memory, and a bare generator will not.

Also shown above, potentially replace chain() with a simpler, hard-coded set comprehension of expanded tuples.

I'm sure there are better algorithms to address this but again, with such small data it doesn't matter, and legibility and confidence in correctness are more important.

\$\endgroup\$
3
  • \$\begingroup\$ Thank you. My dataset is unfortunately not as small as in the example, it is much bigger. I have 301 elements in the variable "recipes" and each element has between 2 and 60 pairs of ingredients. With this size, it causes memory error and it is very slow. Would your solution at least avoid the memory error? \$\endgroup\$ Commented Jul 4, 2022 at 20:44
  • 1
    \$\begingroup\$ Yeah probably. Give it a shot. \$\endgroup\$ Commented Jul 4, 2022 at 20:56
  • 1
    \$\begingroup\$ @adrienlucca.net I asked on your behalf: superset coverage question \$\endgroup\$ Commented Jul 4, 2022 at 21:12

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.