Why is the explicit dimension expansion so space inefficient as compared to leveraging implicit numpy broadcasting, which technically does the same thing i.e. copies the matrix over on a given dimension.
I have two arrays: X(500,3072) and X_train(5000,3072). I want to calculate distance of all 500 points in X from the 5000 points in X_train. When I try to do this via explicit dimension expansion, it takes over 60GB of space to do this calculation.
dists = np.linalg.norm(np.expand_dims(X,axis=1)-(X_train), axis =2)
Whereas if I leverage numpy's broadcasting, it gets done within MBs of space.
dists = np.square(X).sum(axis=1, keepdims=True)+np.square(X_train).sum(axis=1, keepdims=True).T-2*np.dot(X, X_train.T)
Why is the explicit dimension expansion so space inefficient despite of such small matrices being used.
np.expand_dims(X,axis=1)-(X_train), the subtraction is a broacast. It's also the step where it runs out of memory.