I need to make cluster using MPICH. In this case first I tried these examples(http://mpitutorial.com/beginner-mpi-tutorial/) in a single machine and those were work as it expected. Then I created cluster according to this (https://help.ubuntu.com/community/MpichCluster) and run below example which is given there and it works.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv) {
int myrank, nprocs;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
printf("Hello from processor %d of %d\n", myrank, nprocs);
MPI_Finalize();
return 0;
}
mpiexec -n 8 -f machinefile ./mpi_hello
So next I ran this example(http://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) but at that time I am getting this error. No idea what and where went wrong?
Fatal error in MPI_Allreduce: A process has failed, error stack:
MPI_Allreduce(861)........: MPI_Allreduce(sbuf=0x7ffff0f55630, rbuf=0x7ffff0f55634, count=1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Allreduce_impl(719)..:
MPIR_Allreduce_intra(362).:
dequeue_and_set_error(888): Communication error with rank 1
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@ce-412] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@ce-412] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@ce-412] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@ce-411] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@ce-411] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@ce-411] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec@ce-411] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
Communication error with rank 1message, your master node with rank0cannot connect to the node with rank1, so you should look into that direction. You can try simple MPI_Send and MPI_Recv to ping the node1from root.