0

Why is the explicit dimension expansion so space inefficient as compared to leveraging implicit numpy broadcasting, which technically does the same thing i.e. copies the matrix over on a given dimension.

I have two arrays: X(500,3072) and X_train(5000,3072). I want to calculate distance of all 500 points in X from the 5000 points in X_train. When I try to do this via explicit dimension expansion, it takes over 60GB of space to do this calculation.

dists = np.linalg.norm(np.expand_dims(X,axis=1)-(X_train), axis =2) 

Whereas if I leverage numpy's broadcasting, it gets done within MBs of space.

dists = np.square(X).sum(axis=1, keepdims=True)+np.square(X_train).sum(axis=1, keepdims=True).T-2*np.dot(X, X_train.T)

Why is the explicit dimension expansion so space inefficient despite of such small matrices being used.

3
  • This isn't strictly what you asked, but cdist is usually the fastest way of doing a pairwise distance calculation for all pairs of rows. Commented Nov 4, 2023 at 23:25
  • It took me a bit to understand how the second one worked - this is a helpful resource. Commented Nov 4, 2023 at 23:47
  • Technically, both of these are using broadcasting. When it does np.expand_dims(X,axis=1)-(X_train), the subtraction is a broacast. It's also the step where it runs out of memory. Commented Nov 4, 2023 at 23:53

1 Answer 1

1

Tracking dimensions in the 2 alternatives:

X(500,3072) and X_train(5000,3072).  
(500,1,3072) - (1,5000, 30720) => (500,5000,30720)

norm squares this, and sums on axis 2 (30720) and returns the sqrt. (500,5000)

That 3d array is quite large, 600GB

np.square(X).sum(axis=1, keepdims=True)+np.square(X_train).sum(axis=1, keepdims=True).T-2*np.dot(X, X_train.T)

(500,3072)=>(500,1)
(5000,3872)=>(5000,1)==>(1,5000)
==> (500,5000)
dot (500,3072), (3872,5000) ==> (500,5000)
net (500,5000)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.