One approach with broadcasting -
X.T[:,:,None]*X.T[:,None]
Another with broadcasting and swapping axes afterwards -
(X[:,None,:]*X).swapaxes(0,2)
Another with broadcasting and a multi-dimensional transpose afterwards -
(X[:,None,:]*X).T
Another approach with np.einsum, which might be more intuitive thinking in terms of the iterators involved if you are translating from a loopy code -
np.einsum('ij,kj->jik',X,X)
Basic idea in all of these approaches is that we spread out the last axis for elementwise multiplication against each other keeping the first axis aligned. We achieve this process of putting against each other by extending X to two 3D array versions.
Ni*Ni.T. Each column is of length Mrows=2, so the product yields a (2,2) matrix - that is, (Mrows,Mrows). Doing that for each column and stacking the result in the third dimension should result in an (M x M x N) matrix.for i in range(N): out[i] = X[:,i,None].dot(X[None,:,i]).Result = np.dstack( X[:,i].reshape((nrows,1)) * X[:,i] for i in range(ncols) ).xi = X[:,i,None]and then using OP's code :xi.dot(xi.T).