0

I am trying to run the following example MPI code that launches 20 threads and keeps those threads busy for a while. However, when I check the CPU utilization using a tool like nmon or top I see that only a single thread is being used.

#include <iostream>
#include <thread>
#include <mpi.h>
using namespace std;

int main(int argc, char *argv[]) {
    int provided, rank;
    MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
    if (provided != MPI_THREAD_FUNNELED)
        exit(1);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    auto f = [](float x) {
        float result = 0;
        for (float i = 0; i < x; i++) { result += 10 * i + x; }
        cout << "Result: " << result << endl;
    };

    thread threads[20];
    for (int i = 0; i < 20; ++i)
        threads[i] = thread(f, 100000000.f); // do some work
    for (auto& th : threads) 
        th.join();

    MPI_Finalize();
    return 0;
}

I compile this code using mpicxx: mpicxx -std=c++11 -pthread example.cpp -o example and run it using mpirun: mpirun -np 1 example.

I am using Open MPI version 4.1.4 that is compiled with posix thread support (following the explanation from this question).

$ mpicxx --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ mpirun --version
mpirun (Open MPI) 4.1.4
$ ompi_info | grep -i thread
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
   FT Checkpoint support: no (checkpoint thread: no)
$ mpicxx -std=c++11 -pthread example.cpp -o example
$ ./example

My CPU has 10 cores and 20 threads and runs the example code above without MPI on all 20 threads. So, why does the code with MPI not run on all threads?

I suspect I might need to do something with MPI bindings, which I see being mentioned in some answers on the same topic (1, 2), but other answers entirely exclude these options, so I'm unsure whether this is the correct approach.

1
  • 1
    I googled mpirun and it said this: "If your application uses threads, then you probably want to ensure that you are either not bound at all (by specifying --bind-to none), or bound to multiple cores using an appropriate binding level or specific number of processing elements per application process. " Commented Jan 16, 2023 at 19:13

1 Answer 1

2

mpirun -np 1 ./example assigns a single core to your program (so 20 threads end up time sharing): this is the default behavior for Open MPI (e.g. 1 core per MPI process when running with -np 1 or -np 2.

./example (e.g. singleton mode) should use all the available cores, unless you are already running on a subset.

If you want to use all the available cores with mpirun, you can

mpirun --bind-to none -np 1 ./example
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.