Is it possible to speed up this loop in Python?

Question

The normal way to map a function in a numpy.narray like np.array[map(some_func,x)] or vectorize(f)(x) can't provide an index. The following code is just a simple example that is commonly seen in many applications.

dis_mat = np.zeros([feature_mat.shape[0], feature_mat.shape[0]])

for i in range(feature_mat.shape[0]):
    for j in range(i, feature_mat.shape[0]):
        dis_mat[i, j] = np.linalg.norm(
            feature_mat[i, :] - feature_mat[j, :]
        )
        dis_mat[j, i] = dis_mat[i, j]

Is there a way to speed it up?

Thank you for your help! The quickest way to speed up this code is this, using the function that @user2357112 commented about:

    from scipy.spatial.distance import pdist,squareform
    dis_mat = squareform(pdist(feature_mat))

@Julien's method is also good if feature_mat is small, but when the feature_mat is 1000 by 2000, then it needs nearly 40 GB of memory.

@Marco. numba sounds like overkill. This looks like it can be vectorized directly, or at least that's my hunch. My gut says to look at ufunc.outer. — Mad Physicist
– Mad Physicist, Commented Nov 30, 2017 at 4:53
As an aside, the "normal way" to apply a function over an array should not be map or vectorize; those should be reserved for cases where it's just not possible to apply more efficient methods, or where you have to get something written fast and the performance hit won't be a problem. — user2357112
– user2357112, Commented Nov 30, 2017 at 5:09
user167122, the etiquette here is that you don't put answers into your question. Would you consider reverting your last edit to the question (revision 5)? — David Z
– David Z, Commented Nov 30, 2017 at 9:59
If you consider user2357112's solution best, why did you accept Julien's answer? — Barmar
– Barmar, Commented Nov 30, 2017 at 20:31

user2357112 · Accepted Answer · 2017-11-30 08:35:16Z

SciPy comes with a function specifically to compute the kind of pairwise distances you're computing. It's scipy.spatial.distance.pdist, and it produces the distances in a condensed format that basically only stores the upper triangle of the distance matrix, but you can convert the result to square form with scipy.spatial.distance.squareform:

from scipy.spatial.distance import pdist, squareform

distance_matrix = squareform(pdist(feature_mat))

This has the benefit of avoiding the giant intermediate arrays required with a direct vectorized solution, so it's faster and works on larger inputs. It loses the timing to an approach that uses algebraic manipulations to have dot handle the heavy lifting, though.

pdist also supports a wide variety of alternate distance metrics, if you decide you want something other than Euclidean distance.

# Manhattan distance!
distance_matrix = squareform(pdist(feature_mat, 'cityblock'))

# Cosine distance!
distance_matrix = squareform(pdist(feature_mat, 'cosine'))

# Correlation distance!
distance_matrix = squareform(pdist(feature_mat, 'correlation'))

# And more! Check out the docs.

Peter Mortensen · Accepted Answer · 2017-12-20 23:35:37Z

14

You can create a new axis and broadcast:

dis_mat = np.linalg.norm(feature_mat[:,None] - feature_mat, axis=-1)

Timing:

feature_mat = np.random.rand(100,200)

def a():
    dis_mat = np.zeros([feature_mat.shape[0], feature_mat.shape[0]])
    for i in range(feature_mat.shape[0]):
        for j in range(i, feature_mat.shape[0]):
            dis_mat[i, j] = np.linalg.norm(
                feature_mat[i, :] - feature_mat[j, :]
            )
            dis_mat[j, i] = dis_mat[i, j]

def b():
    dis_mat = np.linalg.norm(feature_mat[:,None] - feature_mat, axis=-1)



%timeit a()
100 loops, best of 3: 20.5 ms per loop

%timeit b()
100 loops, best of 3: 11.8 ms per loop

edited Dec 20, 2017 at 23:35

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Nov 30, 2017 at 5:01

Julien

15.3k6 gold badges33 silver badges58 bronze badges

3 Comments

user2357112 Over a year ago

feature_mat[:,None] - feature_mat can be huge, though. You're likely to blow your RAM on reasonably large inputs.

Julien Over a year ago

The usual dilemma between speed and memory... :)

Stephen Wang Over a year ago

Thanks for your advice. The totally time reduce from 6363 ms to 967 ms. And it's really easy to understand and readable.

B. M. · Accepted Answer · 2017-12-21 07:54:58Z

8

Factor what can be done, and use np.dot optimizations on k x k matrix, in little memory place (kxk):

def c(m): 
    xy=np.dot(m,m.T) # O(k^3)
    x2=y2=(m*m).sum(1) #O(k^2)
    d2=np.add.outer(x2,y2)-2*xy  #O(k^2)
    d2.flat[::len(m)+1]=0 # Rounding issues
    return np.sqrt(d2)  # O (k^2)

And for comparison:

def d(m):
   return  squareform(pdist(m))

Here are the 'time(it)' for a k*k initial matrices:

The two algorithms are O(k^3), but c(m) makes the O(k^3) part of the job through np.dot, the critical node of linear algebra which benefits of all optimizations (multicore and so on). pdist is just loops as seen in the source.

This explains the 15x factor for big arrays, even if pdist exploits the symmetry of the matrix by calculating only half of the terms.

edited Dec 21, 2017 at 7:54

answered Nov 30, 2017 at 7:58

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

3 Comments

user2357112 Over a year ago

Oh yeah, this solution. I'm pretty sure I've seen it in another answer before, but I didn't remember it. No giant intermediates, and you get a BLAS matrix multiply for a good chunk of the work (most of it?). Performance should be pretty good. I don't remember whether it beat pdist, but I wouldn't be surprised either way; last time I checked, pdist wasn't anywhere near as optimized as a BLAS matrix multiply.

user2357112 Over a year ago

I might be remembering it from this answer, but I feel like I saw a NumPy version. Anyway, my timings say it's competitive with pdist, winning on larger arrays but losing on smaller arrays. With a bit more optimization, it could probably beat pdist on smaller inputs.

user2357112 Over a year ago

(squareforming the pdist output takes long enough to tie with this answer on the small inputs, though.)

Peter Mortensen · Accepted Answer · 2017-12-20 23:40:19Z

One way I thought of to avoid mixing NumPy and for loops would be to create an index array using a version of this index creator that allows for replacement:

import numpy as np
from itertools import product, chain
from scipy.special import comb

def comb_index(n, k):
    count = comb(n, k, exact=True, repetition=True)
    index = np.fromiter(chain.from_iterable(product(range(n), repeat=k)),
                        int, count=count*k)
    return index.reshape(-1, k)

Then, we simply take each of those array couples, compute the difference between them, reshape the resulting array, and take the norm of each of the rows of the array:

reshape_mat = np.diff(feature_mat[comb_index(feature_mat.shape[0], 2), :], axis=1).reshape(-1, feature_mat.shape[1])
dis_list = np.linalg.norm(reshape_mat, axis=-1)

Note that dis_list is simply a list of all of the n*(n+1)/2 possible norms. This runs at close to the same speed as the other answer for the feature_mat he provided, and when comparing the byte sizes of our largest sections,

(feature_mat[:,None] - feature_mat).nbytes == 16000000

while

np.diff(feature_mat[comb_index(feature_mat.shape[0], 2), :], axis=1).reshape(-1, feature_mat.shape[1]).nbytes == 8080000

For most inputs, mine uses only half the storage: still unoptimal, but a marginal improvement.

Peter Mortensen · Accepted Answer · 2017-12-20 23:42:38Z

Based on np.triu_indices, in case you really want to do this with pure NumPy:

s = feature_mat.shape[0]
i, j = np.triu_indices(s, 1)         # All possible combinations of indices
dist_mat = np.empty((s, s))          # Don't waste time filling with zeros
np.einsum('ii->i', dist_mat)[:] = 0  # When you can just fill the diagonal
dist_mat[i, j] = dist_mat[j, i] = np.linalg.norm(feature_mat[i] - feature_mat[j], axis=-1)
                                     # Vectorized version of your original process

The benefit of this method over broadcasting is that you can do it in chunks:

n = 10000000   # Based on your RAM available
for k in range (0, i.size, n):
    i_ = i[k: k + n]
    j_ = j[k: k + n]
    dist_mat[i_, j_] = dist_mat[j_, i_] = \
                     np.linalg.norm(feature_mat[i_] - feature_mat[j_], axis = -1)

Peter Mortensen · Accepted Answer · 2017-12-20 23:37:51Z

-1

Let's begin by rewriting this in terms of a function:

dist(mat, i, j):
    return np.linalg.norm(mat[i, :] - mat[j, :])

size = feature_mat.shape[0]

for i in range(size):
    for j in range(size):
        dis_mat[i, j] = dist(feature_mat, i, j)

This can be rewritten in (a slightly more) vectorized form as:

v = [dist(feature_map, i, j) for i in range(size) for j in range(size)]
dist_mat = np.array(v).reshape(size, size)

Notice that we're still relying on Python rather than NumPy for some of the computation, but it's a step towards vectorization. Also notice that dist(i, j) is symmetric, so we could further reduce computations by approximately half. Perhaps considering:

v = [dist(feature_map, i, j) for i in range(size) for j in range(i + 1)]

Now the tricky bit is assigning these computed values to the correct elements in a dist_mat.

How fast this performs depends on the cost of computing dist(i, j). For small feature_mats, the cost of recomputing is not high enough to worry about this. But for large matrices, you definitely do not want to recompute.

edited Dec 20, 2017 at 23:37

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Nov 30, 2017 at 5:01

Mateen Ulhaq

27.9k21 gold badges121 silver badges155 bronze badges

1 Comment

Stephen Wang Over a year ago

Consider the result of your code, the v maybe a list?

Collectives™ on Stack Overflow

Is it possible to speed up this loop in Python?

6 Answers 6

Comments

3 Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

3 Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related