Initialize MPI cluster using Rmpi

Question

Recently I try to make use of the department cluster to do parallel computing in R. The cluster system is manged by SGE. OpenMPI has been installed and passed the installation test.

I submit my query to the cluster via qsub command. In the script, I specify the number of node I want to use via the following command.
#PBS -l nodes=2:ppn=24 (two nodes with 24 threads each)
Then, mpirun -np 1 R --slave -f test.R
I have checked $PBS_NODEFILE afterwards. Two nodes are allocated as I wish. I could find two nodes' names node1, node2 and each of them appears 24 times.

The content of ``test.R` is listed as follows.

library(Rmpi)
library(snow)

cl <- makeCluster(41,type="MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
stopCluster(cl)
mpi.quit()

The output of clusterCall() is quite disappointing. There is only one node's name node1 which appears 41 times. This is definitely wrong since there are only 24 threads on node1. It seems that my R script only finds one node or even one thread out of it. I just wonder what is the right way to construct a MPI cluster?

Your cluster is managed by either PBSPro or Torque. Although many commands in SGE are named the same (qsub, qstat, etc.), SGE doesn't use the #PBS sentinel but rather #$ and the environment variables it exports start with SGE_. — Hristo Iliev
– Hristo Iliev, Commented May 27, 2015 at 8:42
Your script and launch commands work for me with LSF. Make sure the MPI library is getting the environment properly. Run the following from within the batch script: mpiexec hostname and see if it outputs the same hostnames and the same number of times as is contained in $PBS_NODEFILE. There could be a problem with the integration between Open MPI and PBS/Torque. — Hristo Iliev
– Hristo Iliev, Commented May 27, 2015 at 9:08

Hristo Iliev · Accepted Answer · 2015-05-27 11:22:49Z

First of all, your cluster is definitely not managed by SGE even if the latter is installed. SGE doesn't understand the #PBS sentinel in the job files and it doesn't export the PBS_NODEFILE environment variable (most environment variables that SGE exports start with SGE_). It also won't accept the nodes=2:ppn=24 resource request as the distribution of the slots among the allocated nodes is controlled by the specified parallel environment. What you have is either PBS Pro or Torque. But SGE names the command line utilities the same and qsub takes more or less the same arguments, which probably is why you think it is SGE that you have.

The problem you describe usually occurs if Open MPI is not able to properly obtain the node list from the environment, e.g. if it wasn't compiled with support for PBS Pro/Torque. In that case, it will start all MPI processes on the node on which mpirun was executed. Check that the proper RAS module was compiled by running:

ompi_info | grep ras

It should list the various RAS modules and among them should be one called tm:

...
MCA ras: tm (MCA v2.0, API v2.0, Component v1.6.5)
...

If the tm module is not listed, then Open MPI will not automatically obtain the node list and the hostfile must be explicitly specified:

mpiexec ... -machinefile $PBS_NODEFILE ...

Under PBS Pro/Torque, Open MPI also needs the tm PLM module. The lack of that module will prevent Open MPI from using the TM API to remotely launch the processes on the second node and it will therefore fall back to using SSH. In such case, you should make sure that passwordless SSH login, e.g. one using public key authentication, is possible from each cluster node into each other node.

Your first step in solving the issue is to check for the presence of the correct modules as shown above. If the modules are there, you should launch hostname under mpiexec and check if that works, e.g.:

#PBS -l nodes=2:ppn=24

echo "Allocated nodes:"
cat $PBS_NODEFILE
echo "MPI nodes:"
mpiexec --mca ras_base_display_alloc 1 hostname

then compare the two lists and also examine the ALLOCATED NODES block. The lists should be more or less equal and both nodes should be shown in the allocated nodes table with 24 slots per node (cf. Num slots). If the second list contains only one hostname, then Open MPI is not able to properly obtain the hostfile because something is preventing the tm modules (given that they do exist) from initialising or being selected. This could either be the system-wide Open MPI configuration or some other RAS module having higher priority. Passing --mca ras_base_verbose 10 to mpiexec helps in determining if that is the case.

Great answer ! It turns out that Open MPI didn't obtain the node list appropriately. This is mainly due to the permission issue. Thanks.

damienfrancois · Accepted Answer · 2015-05-26 23:41:27Z

0

The -np 1 part of your mpirun invocation instructs MPI to use only one core. Try removing that part so that OpenMPI gets the number of core from the environment set by SGE.

answered May 26, 2015 at 23:41

damienfrancois

60.5k9 gold badges116 silver badges128 bronze badges

2 Comments

Hristo Iliev Over a year ago

Open MPI always receives the full slot allocation information from the DRM environment. -np 1 only instructs it to use a single slot for launching the initial MPI job.

Wayne Over a year ago

I agree with @HristoIliev. There are discussions about this -np options. Thanks anyway.

Collectives™ on Stack Overflow

Initialize MPI cluster using Rmpi

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related