Removing redundancy with multiple loops in Python

Question

I am attempting to get all the distances between two two lists using two for loops and an outer while loop that continues until the first list is empty. I need to obtain a list of the sum of all the individual distances from items i in list1 and items j in list2. After each iteration the first item is removed from the first list and added to the second list. This results in a number of redundant calculations.

list1 = [1, 2, 3]

list2 = [5, 6]

while len(list1) != 0:
    distances = []

    for i in list1:
        distance = 0

        for j in list2:
            distance += abs(i - j)

        distances.append(distance)
    print(distances)

    first = list1.pop(0)
    list2.append(first)

The output I am getting is correct I am just looking for away to potentially speed up this process and remove these redundant calculations, potentially using memoization but I am unsure how to achieve this as I would run into issues such as (1,3) != (3,1).

The only loop that is pertinent is the outer while.

Expected output:

[9, 7, 5]
[8, 7]
[8]

JonSG · Accepted Answer · 2021-04-27 17:00:38Z

1

You might simplify and reduce calculations if you precomputed the values. Try this:

import collections

list1 = [1, 2, 3]
list2 = [5, 6]

## -------------------------
## precompute the distances between any two values
## -------------------------
set1 = set(list1 + list2)
lookup = collections.defaultdict(lambda: collections.defaultdict(int))
for (x, y) in ((x, y) for x in set1 for y in set1 if x != y):
    lookup[x][y] = abs(x - y)
## -------------------------

while list1:
    distances = [sum(lookup[x][y] for y in list2 if y != x) for x in list1]
    print(distances)
    list2.append(list1.pop(0))

Note: Sometimes comprehensions can be a little intimidating. It might not be obvious. This should do the same thing as is done in the more complicated one above.

## -------------------------
## loop through the unique (via set()) elements of our combined list (1,2,3,5,6)
for x in (x for x in set1):
    ## loop through them again for the current value from the outer loop
    ## but only when the value from the inner loop is not the same as the outer loop
    ## if x=y we don't need to remember the abs value of that.
    for y in (y for y in set1 if x != y):
        ## store the value in a nested default dictionary for future lookup
        ## defaultdict makes this much easier to figure out how to do.
        lookup[x][y] = abs(x - y)
## -------------------------

As an additional step, you might cut the lookup size down by only computing the abs() with y < x, though you would then need to account for min/max during the use of lookup. Maybe like:

import collections

list1 = [1, 2, 3]
list2 = [5, 6]

## -------------------------
## precompute the distances between any two values
## -------------------------
set1 = set(list1 + list2)
lookup = collections.defaultdict(lambda: collections.defaultdict(int))
for (x, y) in ((x, y) for x in set1 for y in set1 if x > y):
    lookup[x][y] = abs(x - y)
## -------------------------

while list1:
    distances = [
        sum(lookup[max(x,y)][min(x,y)] for y in list2 if y != x)
        for x in list1
    ]
    print(distances)
    list2.append(list1.pop(0))

edited Apr 27, 2021 at 17:00

answered Apr 27, 2021 at 16:12

JonSG

13.6k2 gold badges32 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

steve2020 Over a year ago

thank you for this. Would you be able to explain this line "for (x, y) in ((x, y) for x in set1 for y in set1 if x != y):"

JonSG Over a year ago

sure thing. I'll comment it in the code one sec

steve2020 Over a year ago

Thank you I am just not overly familiar with using multiple for loops on one line.

Thierry Lathuille Over a year ago

Replacing a simple subtraction by two lookups by index (and two function calls for min and max, plus a test) is much slower than a simple subtraction. Timing that with longer lists, I get 8.67 ms for the naive solution with subtractions, vs 2.12 s for your last code, which is 244 times slower. (this is already about 15 times slower for the small lists used in the question).

JonSG Over a year ago

@ThierryLathuille The second version is a space rather than time saver over the first. If I initialize both lists via [random.randint(0, 10) for x in range(1000)] then the original author's code takes 97 seconds on my laptop. The first version of the lookup takes 50 seconds and the simple abs(x-y) takes 62 seconds. So the lookup is the fastest for that shape of input. If I use [random.randint(0, 1000) for x in range(1000)] the lookup is still faster than the author's code but not as fast as the simple abs(). The shape of the data and cost of the calculations should be factored in.

Collectives™ on Stack Overflow

Removing redundancy with multiple loops in Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related