I am working with a C++ MPI code which when compiled with openMPI takes 1min12 seconds and 16 seconds with Intel MPI (I have tested it on other inputs too, difference is similar. Both compiled codes give correct answer). I want to understand why is there such a big difference in run time. And what can be done to decrease run time with openMPI (GCC).
I am using CentOS 6 OS with Intel Haswell processor. I am using following flags for compiling.
openMPI (GCC): mpiCC -Wall -O3
I have also tried -march=native -funroll-loops. It does not make a great difference. I have also tried -lm option. I cannot compile for 32 bit.
Intel MPI: mpiicpc -Wall -O3 -xhost
-xhost saves 3 seconds in run time.
mpiccso you can get performance numbers with Intel MPI + gcc, and then make a fair comparison between Intel MPI and Open MPI. fwiw, if you run an old version of Open MPI and useMPI_THREAD_MULTIPLE, you will use IPoIB instead of native infiniband, and that issue is fixed in recent Open MPI.