1

I am a graduate student (master's) and use an in-house code for running my simulations that use MPI. Earlier, I used OpenMPI on a supercomputer we used to access and since it shut down I've been trying to switch to another supercomputer that has Intel MPI installed on it. The problem is, the same code that was working perfectly fine earlier now gives memory leaks after a set number of iterations (time steps). Since the code is relatively large and my knowledge of MPI is very basic, it is proving very difficult to debug it. So I installed OpenMPI onto this new supercomputer I am using, but it gives the following error message upon execution and then terminates:

Invalid number of PE Please check partitioning pattern or number of PE

NOTE: The error message is repeated for as many numbers of nodes I used to run the case (here, 8). Compiled using mpif90 with -fopenmp for thread parallelisation.

There is in fact no guarantee that running it on OpenMPI won't give the memory leak, but it is worth a shot I feel, as it was running perfectly fine earlier.

PS: On Intel MPI, this is the error I got (compiled with mpiifort with -qopenmp)

Abort(941211497) on node 16 (rank 16 in comm 0): Fatal error in PMPI_Isend: >Unknown error class, error stack: PMPI_Isend(152)...........: MPI_Isend(buf=0x2aba1cbc8060, count=4900, dtype=0x4c000829, dest=20, tag=0, MPI_COMM_WORLD, request=0x7ffec8586e5c) failed MPID_Isend(662)...........: MPID_isend_unsafe(282)....: MPIDI_OFI_send_normal(305): failure occurred while allocating memory for a request object Abort(203013993) on node 17 (rank 17 in comm 0): Fatal error in PMPI_Isend: >Unknown error class, error stack: PMPI_Isend(152)...........: MPI_Isend(buf=0x2b38c479c060, count=4900, dtype=0x4c000829, dest=21, tag=0, MPI_COMM_WORLD, request=0x7fffc20097dc) failed MPID_Isend(662)...........: MPID_isend_unsafe(282)....: MPIDI_OFI_send_normal(305): failure occurred while allocating memory for a request object [[email protected]] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:357): write error (Bad file descriptor) [[email protected]] cmd_bcast_root (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:164): error sending cmd 15 to proxy [[email protected]] send_abort_rank_downstream (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:557): unable to send response downstream [[email protected]] control_cb (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1576): unable to send abort rank to downstreams [[email protected]] HYDI_dmx_poll_wait_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:79): callback returned error status [[email protected]] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1962): error waiting for event"

I will be happy to provide the code in case somebody is willing to take a look at it. It is written using Fortran with some of the functions written in C. My research progress has been completely halted due to this problem and nobody at my lab has enough experience with MPI to resolve this.

7
  • can you please copy/paste the error messages? This will have search engines correctly index the question and hence make it (more) useful to other/future readers. Commented Mar 13, 2020 at 4:40
  • Please provide the version of Intel MPI, your command line, and which interconnect do you have? Also, is this a public code? Is there a way you can provide the source? You would be able to provide source privately if you post on the Intel forum: software.intel.com/en-us/forums/… Commented Mar 27, 2020 at 22:08
  • @ChileAddict-Intel Thank you for your reply. Unfortunately it is not a public code, but if you'd be willing to look into it, I will be happy share it with you privately. Commented Mar 30, 2020 at 11:47
  • @ChileAddict-Intel as for Intel MPI version: 19.0.4.243 20190416 Commented Mar 30, 2020 at 11:47
  • @ChileAddict-Intel I'm sorry I'm not sure what you mean by interconnect, but command line : Red Hat Enterprise Linux Server release 7.6 Commented Mar 30, 2020 at 12:04

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.