Compute pairwise distance in a batch without replicating tensor in Tensorflow?

Question

I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by tiling the original tensor :

def pairwise_l2_norm2(x, y, scope=None):
    with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
        size_x = tf.shape(x)[0]
        size_y = tf.shape(y)[0]
        xx = tf.expand_dims(x, -1)
        xx = tf.tile(xx, tf.pack([1, 1, size_y]))

        yy = tf.expand_dims(y, -1)
        yy = tf.tile(yy, tf.pack([1, 1, size_x]))
        yy = tf.transpose(yy, perm=[2, 1, 0])

        diff = tf.sub(xx, yy)
        square_diff = tf.square(diff)

        square_dist = tf.reduce_sum(square_diff, 1)

        return square_dist

This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.

The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory. I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.

It's not clear that your code is doing 'pairwise distance of a batch of feature`. Can you specify the function you want to do more formally? Also, have you considered tf.squared_difference — keveman
– keveman, Commented May 3, 2016 at 22:58
I update the question to explain this. If you put a batch of features as input of this function it should compute distance between its rows. — jrabary
– jrabary, Commented May 4, 2016 at 1:20

Yaroslav Bulatov · Accepted Answer · 2016-05-24 19:45:59Z

74

You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and

D[i,j] = (a[i]-a[j])(a[i]-a[j])'

You can rewrite that into

D[i,j] = r[i] - 2 a[i]a[j]' + r[j]

Where r[i] is squared norm of ith row of the original matrix.

In a system that supports standard broadcasting rules you can treat r as a column vector and write D as

D = r - 2 A A' + r'

In TensorFlow you could write this as

A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)

# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)

result

array([[0, 2, 8],
       [2, 0, 2],
       [8, 2, 0]], dtype=int32)

edited May 24, 2016 at 19:45

answered May 5, 2016 at 0:40

Yaroslav Bulatov

58.1k23 gold badges147 silver badges210 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jrabary Over a year ago

Thank you. I better understand why broadcasting is interesting.

David J. Harris Over a year ago

I don't know how much of a performance improvement this provides, but tf.matmul has arguments for transposing arrays on the fly (transpose_a and transpose_b).

RGMyr Over a year ago

^ I recently tested this (on GPU, with TF 2.0). tf.matmul(a, b, transpose_b=True) was consistently ~40% faster than tf.matmul(a, tf.transpose(b)).

fizzybear Over a year ago

Is there a simple way to extend this to work for pairwise distances between two sets of points? D_ij = |X[i] - Y[j]|. @Yamaneko's answer can be easily adapted but I don't know how this one can.

Andrew White Over a year ago

I think it should be noted that this produces the square of the distance and not the distance itself. A simple sqrt at the end gets you the actual distance. I only mention that because of the title of the original Q.

|

Yamaneko · Accepted Answer · 2017-10-20 13:54:49Z

22

Using squared_difference:

def squared_dist(A): 
    expanded_a = tf.expand_dims(A, 1)
    expanded_b = tf.expand_dims(A, 0)
    distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
    return distances

One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by @YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).

edited Oct 20, 2017 at 13:54

answered Oct 14, 2016 at 23:03

Yamaneko

3,5832 gold badges41 silver badges58 bronze badges

2 Comments

lhk Over a year ago

thanks for the information that the other solution is less memory intensive. good to know that. +1 for the great answer

Evgeny Savinov Over a year ago

This solution is also less compute efficient. But it is very useful when there is no possibility to use matrix multiplication (e.g. for absolute distance)

Augustin · Accepted Answer · 2018-07-05 12:47:39Z

11

Here is a more general solution for two tensors of coordinates A and B:

def squared_dist(A, B):
  assert A.shape.as_list() == B.shape.as_list()

  row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
  row_norms_A = tf.reshape(row_norms_A, [-1, 1])  # Column vector.

  row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
  row_norms_B = tf.reshape(row_norms_B, [1, -1])  # Row vector.

  return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B

Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).

edited Jul 5, 2018 at 12:47

answered Jul 5, 2018 at 8:30

Augustin

2,6421 gold badge26 silver badges25 bronze badges

Comments

Hyunguk Choi · Accepted Answer · 2018-03-20 11:53:38Z

1

If you want compute other method , then change the order of the tf modules.

def compute_euclidean_distance(x, y):
    size_x = x.shape.dims[0]
    size_y = y.shape.dims[0]
    for i in range(size_x):
        tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
        eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
        if i == 0:
            d = eu_one
        else:
            d = tf.concat([d, eu_one], axis=0)
return d

answered Mar 20, 2018 at 11:53

Hyunguk Choi

111 bronze badge

Collectives™ on Stack Overflow

Compute pairwise distance in a batch without replicating tensor in Tensorflow?

4 Answers 4

7 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related