1

I've tried as much as possible to find a suitable solution/answer on the forum but I'm pulling blanks - most probably due to incorrect terminology.

I currently have the following block of code that performs very poorly when performing the below operations on the array areas[] - the size of the array is upto 150,000 elements.

areas = [x .. x1]
prices = [y .. y1]
for key, value in enumerate(areas):
    compListPrices = [prices[ind] for ind, x in enumerate(areas) if x == value and ind != key]
    # Further operations using compListPrices here

From my understanding, the execution of this code could be improved considerably by leveraging numpy, but I'm struggling to convert it.

5
  • 1
    numpy.org/doc/stable/reference/generated/numpy.array.html you are going to want to be using numpy arrays Commented Jun 5, 2022 at 22:51
  • Yes indeed - I'm not sure if just switching to numpy arrays is enough. It's more the syntax around quick querying I'm struggling with. Commented Jun 5, 2022 at 22:58
  • Syntax is important. Your are using key which is typically used for dictionaries to call the index Commented Jun 5, 2022 at 23:12
  • I would recommend to read a tutorial on numpy or pandas. Commented Jun 5, 2022 at 23:13
  • It's not really clear what you are trying to accomplish. Part of this is because you didn't create a concrete example. It looks like you are trying to process groups of things related by area, but you have creates an O(n**2) implementation which means you will look at 22.5 billion items instead of 150k items, which as you noticed is slow. You can probably make a lookup dictionary instead of Numpy, which will allow you to efficiently group by area. But it's hard to help since we don't have a reproducible example with a desired result. Commented Jun 5, 2022 at 23:22

2 Answers 2

1

Your implementation complexity is O(n²)

I would recommend using zip: O(n) then sort: O(n*log(n)) and a simple loop O(n) resulting in O(n*log(n)) complexity

Even though I'm not using numpy, a faster algorithm is usually faster than just move the logic to numpy

Here is an working example: https://abstra.show/yMFZqiUt8D

Sign up to request clarification or add additional context in comments.

Comments

0

You can do what you want in O(n). Like below:

>>> areas  = [1,2,3,4,1,1,3,2]
>>> prices = [100,10,20,50,30,40,20,20]

# zipping areas with thier indexs
>>> lst = list(zip(areas , range(len(areas))))
>>> lst
[(1, 0), (2, 1), (3, 2), (4, 3), (1, 4), (1, 5), (3, 6), (2, 7)]

# save each values exist in each indexs
>>> dct = {}
>>> for l in lst:
....    dct.setdefault(l[0], []).append(l[1])
>>> dct
{1: [0, 4, 5], 2: [1, 7], 3: [2, 6], 4: [3]}

# find prices from their indexs
>>> res = [[] for _ in range(len(areas))]
>>> for k,v in dct.items():
....    for i in v:
....        res[i] = [prices[j] for j in v if i!=j]
>>> res
[[30, 40], [20], [20], [], [100, 40], [100, 30], [20], [10]]

Benchmark on Colab:

from numpy.random import randint
areas = randint(0, 10, 10_000)
prices = randint(0, 1000, 10_000)

def method_Npow2(areas, prices):
    res = []
    for key, value in enumerate(areas):
        res.append([prices[ind] for ind, x in enumerate(areas) if x == value and ind != key])
    return res

def method_Npow1(areas, prices):
    lst = list(zip(areas , range(len(areas))))
    dct = {}
    for l in lst:
        dct.setdefault(l[0], []).append(l[1])

    res = [[] for _ in range(len(areas))]
    for k,v in dct.items():
        for i in v:
            res[i] = [prices[j] for j in v if i!=j]    
    return res

Output:

%timeit method_Npow2(areas, prices)
# 1 loop, best of 5: 13.9 s per loop

%timeit method_Npow1(areas, prices)
# 1 loop, best of 5: 1.76 s per loop

m1 = method_Npow2(areas, prices)
m2 = method_Npow1(areas, prices)
np.all(np.asarray(m1) == np.asarray(m2))
# True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.