0

I am working on debugging a code. Let me say up front that it is not my code and I do not have permission to share a single line of it. I know we all like to see code, but that's my situation.

It uses OpenMP. First I went through it, just trying to get it to work in straight serial mode, i.e. not compiling for OpenMP. Then I moved to OpenMP. I saw that the results changed with a change in thread count (OMP_NUM_THREADS). So, it's obviously not thread safe somewhere.

But there's a much stranger problem. Through a couple of mishaps, I noticed that even when I went back to serial mode (i.e. not compiling for OpenMP), setting the thread count to any value whatsoever actually changed the code's results.

I finally went through the code and removed absolutely every single reference to OpenMP whatsoever. Then I culled everything out of the compile and linking commands except what was absolutely necessary and I definitely pulled out all OpenMP options and threading libraries of all kinds. Didn't matter. This code is as serial as I can get it, and setting OMP_NUM_THREADS to some value or another before running still changes the output of the code! How is this possible???

The one thing I can think of is that the code is compiled to use the Intel MKL library, making use of some of the matrix routines. Is it possible that Intel MKL is compiled for OpenMP, and it is going to use it whether you want it or not?

Any other ideas are most welcome.

7
  • 1
    You can tweak the MKL so to use 1 thread (AFAIK there is a sequential version you can link). Besides, you can check if OpenMP is linked (using tools lik ldd on Linux). You can also sue OMP_DISPLAY_ENV=TRUE which AFAIK is only printed if an OpenMP runtime is initialized. Not to mention you can put a breakpoint in the entry point of famous OpenMP runtimes (generally GOMP and IOMP). OMPT and OMPD may also help to track issues like this. Commented Sep 22, 2023 at 0:48
  • 1
    Alternatively you can set MKL_NUM_THREADS=1, which normally forces MKL to use a single thread, whatever the value of OMP_NUM_THREADS Commented Sep 22, 2023 at 12:48
  • I think you should also consider that openmp might not be the (only) cause. Maybe there are other random effects that causes these differences. Commented Sep 22, 2023 at 19:20
  • @csoeger I agree, just don't know what they might be. Commented Sep 23, 2023 at 0:58
  • 1
    @paleonix Yes. It's deterministic. At least there's that! Commented Sep 28, 2023 at 0:31

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.