Speed up python looping with numpy

Question

I've tried as much as possible to find a suitable solution/answer on the forum but I'm pulling blanks - most probably due to incorrect terminology.

I currently have the following block of code that performs very poorly when performing the below operations on the array areas[] - the size of the array is upto 150,000 elements.

areas = [x .. x1]
prices = [y .. y1]
for key, value in enumerate(areas):
    compListPrices = [prices[ind] for ind, x in enumerate(areas) if x == value and ind != key]
    # Further operations using compListPrices here

From my understanding, the execution of this code could be improved considerably by leveraging numpy, but I'm struggling to convert it.

numpy.org/doc/stable/reference/generated/numpy.array.html you are going to want to be using numpy arrays — Andrew Ryan
– Andrew Ryan, Commented Jun 5, 2022 at 22:51
Yes indeed - I'm not sure if just switching to numpy arrays is enough. It's more the syntax around quick querying I'm struggling with. — Jack
– Jack, Commented Jun 5, 2022 at 22:58
Syntax is important. Your are using key which is typically used for dictionaries to call the index — Zaero Divide
– Zaero Divide, Commented Jun 5, 2022 at 23:12
It's not really clear what you are trying to accomplish. Part of this is because you didn't create a concrete example. It looks like you are trying to process groups of things related by area, but you have creates an O(n**2) implementation which means you will look at 22.5 billion items instead of 150k items, which as you noticed is slow. You can probably make a lookup dictionary instead of Numpy, which will allow you to efficiently group by area. But it's hard to help since we don't have a reproducible example with a desired result. — Mark
– Mark, Commented Jun 5, 2022 at 23:22

Bruno vieira costa · Accepted Answer · 2022-06-05 23:24:23Z

1

Your implementation complexity is O(n²)

I would recommend using zip: O(n) then sort: O(n*log(n)) and a simple loop O(n) resulting in O(n*log(n)) complexity

Even though I'm not using numpy, a faster algorithm is usually faster than just move the logic to numpy

Here is an working example: https://abstra.show/yMFZqiUt8D

answered Jun 5, 2022 at 23:24

Bruno vieira costa

687 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mahdi F. · Accepted Answer · 2022-06-06 13:32:51Z

You can do what you want in O(n). Like below:

>>> areas  = [1,2,3,4,1,1,3,2]
>>> prices = [100,10,20,50,30,40,20,20]

# zipping areas with thier indexs
>>> lst = list(zip(areas , range(len(areas))))
>>> lst
[(1, 0), (2, 1), (3, 2), (4, 3), (1, 4), (1, 5), (3, 6), (2, 7)]

# save each values exist in each indexs
>>> dct = {}
>>> for l in lst:
....    dct.setdefault(l[0], []).append(l[1])
>>> dct
{1: [0, 4, 5], 2: [1, 7], 3: [2, 6], 4: [3]}

# find prices from their indexs
>>> res = [[] for _ in range(len(areas))]
>>> for k,v in dct.items():
....    for i in v:
....        res[i] = [prices[j] for j in v if i!=j]
>>> res
[[30, 40], [20], [20], [], [100, 40], [100, 30], [20], [10]]

Benchmark on Colab:

from numpy.random import randint
areas = randint(0, 10, 10_000)
prices = randint(0, 1000, 10_000)

def method_Npow2(areas, prices):
    res = []
    for key, value in enumerate(areas):
        res.append([prices[ind] for ind, x in enumerate(areas) if x == value and ind != key])
    return res

def method_Npow1(areas, prices):
    lst = list(zip(areas , range(len(areas))))
    dct = {}
    for l in lst:
        dct.setdefault(l[0], []).append(l[1])

    res = [[] for _ in range(len(areas))]
    for k,v in dct.items():
        for i in v:
            res[i] = [prices[j] for j in v if i!=j]    
    return res

Output:

%timeit method_Npow2(areas, prices)
# 1 loop, best of 5: 13.9 s per loop

%timeit method_Npow1(areas, prices)
# 1 loop, best of 5: 1.76 s per loop

m1 = method_Npow2(areas, prices)
m2 = method_Npow1(areas, prices)
np.all(np.asarray(m1) == np.asarray(m2))
# True

Collectives™ on Stack Overflow

Speed up python looping with numpy

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related