2

I am attempting to get all the distances between two two lists using two for loops and an outer while loop that continues until the first list is empty. I need to obtain a list of the sum of all the individual distances from items i in list1 and items j in list2. After each iteration the first item is removed from the first list and added to the second list. This results in a number of redundant calculations.

list1 = [1, 2, 3]

list2 = [5, 6]

while len(list1) != 0:
    distances = []

    for i in list1:
        distance = 0

        for j in list2:
            distance += abs(i - j)

        distances.append(distance)
    print(distances)

    first = list1.pop(0)
    list2.append(first)

The output I am getting is correct I am just looking for away to potentially speed up this process and remove these redundant calculations, potentially using memoization but I am unsure how to achieve this as I would run into issues such as (1,3) != (3,1).

The only loop that is pertinent is the outer while.

Expected output:

[9, 7, 5]
[8, 7]
[8]
0

1 Answer 1

1

You might simplify and reduce calculations if you precomputed the values. Try this:

import collections

list1 = [1, 2, 3]
list2 = [5, 6]

## -------------------------
## precompute the distances between any two values
## -------------------------
set1 = set(list1 + list2)
lookup = collections.defaultdict(lambda: collections.defaultdict(int))
for (x, y) in ((x, y) for x in set1 for y in set1 if x != y):
    lookup[x][y] = abs(x - y)
## -------------------------

while list1:
    distances = [sum(lookup[x][y] for y in list2 if y != x) for x in list1]
    print(distances)
    list2.append(list1.pop(0))

Note: Sometimes comprehensions can be a little intimidating. It might not be obvious. This should do the same thing as is done in the more complicated one above.

## -------------------------
## loop through the unique (via set()) elements of our combined list (1,2,3,5,6)
for x in (x for x in set1):
    ## loop through them again for the current value from the outer loop
    ## but only when the value from the inner loop is not the same as the outer loop
    ## if x=y we don't need to remember the abs value of that.
    for y in (y for y in set1 if x != y):
        ## store the value in a nested default dictionary for future lookup
        ## defaultdict makes this much easier to figure out how to do.
        lookup[x][y] = abs(x - y)
## -------------------------

As an additional step, you might cut the lookup size down by only computing the abs() with y < x, though you would then need to account for min/max during the use of lookup. Maybe like:

import collections

list1 = [1, 2, 3]
list2 = [5, 6]

## -------------------------
## precompute the distances between any two values
## -------------------------
set1 = set(list1 + list2)
lookup = collections.defaultdict(lambda: collections.defaultdict(int))
for (x, y) in ((x, y) for x in set1 for y in set1 if x > y):
    lookup[x][y] = abs(x - y)
## -------------------------

while list1:
    distances = [
        sum(lookup[max(x,y)][min(x,y)] for y in list2 if y != x)
        for x in list1
    ]
    print(distances)
    list2.append(list1.pop(0))
Sign up to request clarification or add additional context in comments.

5 Comments

thank you for this. Would you be able to explain this line "for (x, y) in ((x, y) for x in set1 for y in set1 if x != y):"
sure thing. I'll comment it in the code one sec
Thank you I am just not overly familiar with using multiple for loops on one line.
Replacing a simple subtraction by two lookups by index (and two function calls for min and max, plus a test) is much slower than a simple subtraction. Timing that with longer lists, I get 8.67 ms for the naive solution with subtractions, vs 2.12 s for your last code, which is 244 times slower. (this is already about 15 times slower for the small lists used in the question).
@ThierryLathuille The second version is a space rather than time saver over the first. If I initialize both lists via [random.randint(0, 10) for x in range(1000)] then the original author's code takes 97 seconds on my laptop. The first version of the lookup takes 50 seconds and the simple abs(x-y) takes 62 seconds. So the lookup is the fastest for that shape of input. If I use [random.randint(0, 1000) for x in range(1000)] the lookup is still faster than the author's code but not as fast as the simple abs(). The shape of the data and cost of the calculations should be factored in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.