6

I am trying to start a ipyparallel cluster using MPI.

The ipcluster_config has following lines modified as such:

c.MPILauncher.mpi_cmd = ['mpiexec']
c.MPIControllerLauncher.controller_args = ['--ip=*']
c.MPILauncher.mpi_args = ["-machinefile", "~/mpi_hosts"]

The ipcontroller_config.py is configured as such:

c.HubFactory.engine_ip = '*'
c.HubFactory.ip = '*'
c.HubFactory.client_ip = '*'

However, when I launch the cluster using command ipcluster start --profile mpi -n 2 it fails with following message

Engines shutdown early, they probably failed to connect.
You can set this by adding "--ip='*'" to your ControllerLauncher.controller_args

Not sure how to debug further.

4
  • 1
    Try running ipcluster start --profile mpi -n 2 --debug and post the logs from the same Commented Nov 14, 2017 at 14:50
  • Thanks Tarun. This helps. It seems ipcluster is not able to find mpiexec. I need to figure out how to configure ipcluster so it loads the modules. Commented Nov 16, 2017 at 17:16
  • Did you install the MPI package? Commented Nov 16, 2017 at 17:17
  • I am on a PBS cluster environment. I have to do module load to see mpiexec in the path. I guess when ipcluster is launching engines on remote nodes, it does not do "module load". I am looking into configs to see if there is any place to specify that. Commented Nov 16, 2017 at 17:20

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.