I just wanted to ask a very similar question when i saw yours. I have tested this question from various directions. From quite some time I am trying to beat numpy.dot function by my code.
I have large complex matrices and their multiplication is the primary bottleneck of my program. I have tested following methods
- simple c code.
- cython code with various optimizations, using cblas.
- python 32 bit and 64 bit versions and found that 64 bit version is 1.5-2 times faster than the 32 bit.
- ananconda's MKL implementation but no luck there also.
- einsum for the matrix multiplication
- python 3 and python 2.7 are same python 3 @ operator is also same
numpy.dot(a,b,c) is marginally faster than c=numpy.dot(a,b)
by far the numpy.dot is the best. It beat every other method, sometimes marginally (einsum) but mostly significantly.
During my research i come across one article namely
Ultrafast matrix multiplication which tells that apple's altivec implementation can multiply 2500x2500 matrix in less than a second. On my PC with intel core i3 4th generation 2.3 GHZ 4 gb ram it took 73 seconds using numpy.dot hence I am still searching for faster implementation on PC.
cythonactually can be regarded as another language. That's not what I want.