I am working on debugging a code. Let me say up front that it is not my code and I do not have permission to share a single line of it. I know we all like to see code, but that's my situation.
It uses OpenMP. First I went through it, just trying to get it to work in straight serial mode, i.e. not compiling for OpenMP. Then I moved to OpenMP. I saw that the results changed with a change in thread count (OMP_NUM_THREADS). So, it's obviously not thread safe somewhere.
But there's a much stranger problem. Through a couple of mishaps, I noticed that even when I went back to serial mode (i.e. not compiling for OpenMP), setting the thread count to any value whatsoever actually changed the code's results.
I finally went through the code and removed absolutely every single reference to OpenMP whatsoever. Then I culled everything out of the compile and linking commands except what was absolutely necessary and I definitely pulled out all OpenMP options and threading libraries of all kinds. Didn't matter. This code is as serial as I can get it, and setting OMP_NUM_THREADS to some value or another before running still changes the output of the code! How is this possible???
The one thing I can think of is that the code is compiled to use the Intel MKL library, making use of some of the matrix routines. Is it possible that Intel MKL is compiled for OpenMP, and it is going to use it whether you want it or not?
Any other ideas are most welcome.
lddon Linux). You can also sueOMP_DISPLAY_ENV=TRUEwhich AFAIK is only printed if an OpenMP runtime is initialized. Not to mention you can put a breakpoint in the entry point of famous OpenMP runtimes (generally GOMP and IOMP). OMPT and OMPD may also help to track issues like this.MKL_NUM_THREADS=1, which normally forces MKL to use a single thread, whatever the value ofOMP_NUM_THREADS