Is it possible that the -O2 optimization flag re-arranges code, thereby possibly making a multi-threaded application work as un-intended?
As an example of what I mean by un-intended behavior when code is re-arranged: A variable declared (by the programmer) to be created for each thread is moved outside the #pragma omp parallal such that only one single copy is created, shared by all threads.