MPI does not run with requested number of threads

Question

I am trying to run the following example MPI code that launches 20 threads and keeps those threads busy for a while. However, when I check the CPU utilization using a tool like nmon or top I see that only a single thread is being used.

#include <iostream>
#include <thread>
#include <mpi.h>
using namespace std;

int main(int argc, char *argv[]) {
    int provided, rank;
    MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
    if (provided != MPI_THREAD_FUNNELED)
        exit(1);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    auto f = [](float x) {
        float result = 0;
        for (float i = 0; i < x; i++) { result += 10 * i + x; }
        cout << "Result: " << result << endl;
    };

    thread threads[20];
    for (int i = 0; i < 20; ++i)
        threads[i] = thread(f, 100000000.f); // do some work
    for (auto& th : threads) 
        th.join();

    MPI_Finalize();
    return 0;
}

I compile this code using mpicxx: mpicxx -std=c++11 -pthread example.cpp -o example and run it using mpirun: mpirun -np 1 example.

I am using Open MPI version 4.1.4 that is compiled with posix thread support (following the explanation from this question).

$ mpicxx --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ mpirun --version
mpirun (Open MPI) 4.1.4
$ ompi_info | grep -i thread
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
   FT Checkpoint support: no (checkpoint thread: no)
$ mpicxx -std=c++11 -pthread example.cpp -o example
$ ./example

My CPU has 10 cores and 20 threads and runs the example code above without MPI on all 20 threads. So, why does the code with MPI not run on all threads?

I suspect I might need to do something with MPI bindings, which I see being mentioned in some answers on the same topic (1, 2), but other answers entirely exclude these options, so I'm unsure whether this is the correct approach.

I googled mpirun and it said this: "If your application uses threads, then you probably want to ensure that you are either not bound at all (by specifying --bind-to none), or bound to multiple cores using an appropriate binding level or specific number of processing elements per application process. " — Stack Exchange Broke The Law
– Stack Exchange Broke The Law, Commented Jan 16, 2023 at 19:13

Gilles Gouaillardet · Accepted Answer · 2023-01-16 23:52:41Z

2

mpirun -np 1 ./example assigns a single core to your program (so 20 threads end up time sharing): this is the default behavior for Open MPI (e.g. 1 core per MPI process when running with -np 1 or -np 2.

./example (e.g. singleton mode) should use all the available cores, unless you are already running on a subset.

If you want to use all the available cores with mpirun, you can

mpirun --bind-to none -np 1 ./example

answered Jan 16, 2023 at 23:52

Gilles Gouaillardet

8,47111 gold badges26 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

MPI does not run with requested number of threads

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related