I have an MPI application that currently has one process (call it A) which is causing serious problems for scalability. Currently, all the other processes are sitting in an MPI_Recv waiting for that one process to send them information.
Since I want to speed this up now with as little effort as possible, I was thinking about using OpenMP parallelize process A. Is this practical?
Since the other processes sharing a node with A are in an MPI_Recv, can I utilize all the resources from that node to work on process A, or will the MPI_Recv prevent that?
The other benefit of using OpenMP is that the memory can be shared since process A takes a lot.
By the way, does it change anything if my processors are waiting in an MPI_Send instead of an MPI_Recv?