3

I'm having trouble running an OpenMPI program using only two nodes (one of the nodes is the same machine that is executing the mpiexec command and the other node is a separate machine).

I'll call the machine that is running mpiexec, master, and the other node slave.

On both master and slave, I've installed OpemMPI in my home directory under ~/mpi

I have a file called ~/machines.txt on master.

Ideally, ~/machines.txt should contain:

master
slave

However, when I run the following on master:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT, I get the following error:

bash: orted: command not found

But if ~/maschines.txt only contains the name of the node that the command is running on, it works. ~/machines.txt:

master

Command:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT:

master
master

I've tried running the same command on slave, and changed the machines.txt file to contain only slave, and it worked too. I've made sure that my .bashrc file contains the proper paths for OpenMPI.

What am I doing wrong? In short, there is only a problem when I try to execute a program on a remote machine, but I can run mpiexec perfectly fine on the machine that is executing the command. This makes me believe that it's not a path issue. Am I missing a step in connecting both machines? I have passwordless ssh login capability from master to slave.

2
  • If you installed MPI under ~/mpi, then I am guessing you have added ~/mpi to your PATH inside .bashrc or something. Do not assume that .bashrc is loaded on each machine that MPI is run. Commented Apr 8, 2014 at 0:55
  • Yes, I added bin to PATH and lib LD_LIBRARY_PATH for both machines. Commented Apr 8, 2014 at 20:30

4 Answers 4

4

This error message means that you either do not have Open MPI installed on the remote machine, or you do not have your PATH set properly on the remote machine for non-interactive logins (i.e., such that it can't find the installation of Open MPI on the remote machine). "orted" is one of the helper executables that Open MPI uses to launch processes on remote nodes -- so if "orted" was not found, then it didn't even get to the point of trying to launch "hostname" on the remote node.

Note that there might be a difference between interactive and non-interactive logins in your shell startup files (e.g., in your .bashrc).

Also note that it is considerably simpler to have Open MPI installed in the same path location on all nodes -- in that way, the prefix method described above will automatically add the right PATH and LD_LIBRARY_PATH when executing on the remote nodes, and you don't have to muck with your shell startup files.

Note that there are a bunch of FAQ items about these kinds of topics on the main Open MPI web site.

Sign up to request clarification or add additional context in comments.

10 Comments

This is what my .bashrc file looks like on master: pastebin.com/JTCZzpWs This is what it looks like on slave: pastebin.com/TDSZiFUz I don't see anything wrong here. Do you? I'm using Ubuntu.
Are you 100% sure that your $HOME/.bashrc is being executed when you run non-interactive ssh commands? E.g., "ssh master uptime" and "ssh slave uptime"? You might want to put echo statements in your $HOME/.bashrc's to verify.
Yeah .bashrc seems to be executed because I echo "Welcome" at the top of the file now and it says "Welcome bash: orted: command not found" now though.
However, when I put echo "Welcome" at the bottom of the .bashrc file, it doesn't output "Welcome". hmmm...
If moving the export statements to the top of your .bashrc works, it means that there's more in your .bashrc than you put in the pastebin outputs. There may be a difference between interactive and non-interactive logins in your .bashrc -- that's the likely culprit (and why moving them up works). As for no output, yes, firewalls can be an issue, see: open-mpi.org/faq/?category=running#diagnose-multi-host-problems
|
1

Either explicitly set the absolute OpenMPI prefix with the --prefix option:

prompt> mpiexec --prefix=$HOME/mpi ...

or invoke mpiexec with the absolute path to it:

prompt> $HOME/mpi/bin/mpiexec ...

The latter option sets the prefix automatically. The prefix is then used to set PATH and LD_LIBRARY_PATH on the remote machines.

1 Comment

I tried both methods. It works fine for executing on the same machine, but when I try executing on another machine, it waits about 5 seconds then it stops without any output. I'm running the "hostname" program so I'm expecting the hostname of the remote machine there, but it doesn't appear. It seems like it's logging into the other machine because it does wait 5 seconds before terminating.
0

This answer comes very late but for linux users, it is a bad habit to add the environment variables at the end of the ~/.bashrc file, because carefully looking at the top, you will notice an if function exiting if in non-interactive mode, which is precisely what you do compiling your program through the ssh host. So put your environment variables at the TOP of the file, before this exiting if

Comments

-2

try edit the file

/etc/environment

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/hadoop/openmpi_install/bin" LD_LIBRARY_PATH=/home/hadoop/openmpi_install/lib

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.