I have DFT(density functional theory) code written in Fortran. Inside this program I use zheev function from lapack to diagonalize overlap matrix (S). I compile the code in two different targets (a) standalone program fireball.x and (b) dynamic library (libFireCore.so), without any differences in the code.
When I run the program fireball.x everything works fine.
When I call libFireCore.so from inside other program (written in C++) it produces incorect output which is non-deterministic - i.e. every time I run it I got different results.
I tracked down the problem, and it turns out the problem is in some zheev call where I diagonalize overlap matrix.
When I print the overlap matrix (S) which goes into zheev it is exactly the same (in fireball.x and libFireCore.so). But the output matrix (eigenvectors, zzzz) differ at the end. (see attached images)
The S matrix which goes in is the same:

The eigenvectros (zzzz) which goes out differs at the end:

At the same time the eigenvaues are exactly the same (and the values are not crazy small or crazy large): S-eigenvalues: 0.20320429171825077 0.23871180664043531 0.24800955043297762 0.25981763454707818 0.26625761411873500 0.29656130959541332 0.30406932675113513 0.31931361483751519 0.34177021310844230 0.36133697610296117 0.38357717267479940 0.57800206110211672 0.73150940647530815 0.73986354541136745 0.87968589806449793 1.0629166543759208 1.1745454410259122 1.1914134775739986 1.2071693499296059 1.2928315521982148 1.3497098021371781 1.4298908216838513 1.5846592240498847 1.7381615303550286 1.9139531841288968 1.9292327889682286 2.1528696122119046 2.2187686011872194 2.6022345485931204
How is this possible? It seems like some memory corruption inside zheev or how else I can explain the randomness of output (non-deterministic)?
I checked using MKL_VERBOS=1 that the version of MKL library is the same as well as the dimensions of all arrays (including the temporary work-arrays) is the same. So what can be wrong?
integer lwork, lrwork, liwork
complex, allocatable, dimension (:) :: work
real, allocatable, dimension (:) :: rwork
real, allocatable, dimension (:) :: slam
complex, allocatable, dimension (:, :) :: xxxx, zzzz
allocate ( slam(norbitals) )
allocate ( xxxx(norbitals,norbitals) )
allocate ( zzzz(norbitals,norbitals) )
lwork = 1
lrwork = 3*norbitals - 2
allocate (work(lwork))
allocate (rwork(lrwork))
write (*,*) "!!!! DEBUG sqrtS debug_writeMatFile(Sk_sqrtS.log) norbitals=",norbitals, ' lwork = ',lwork, ' lrwork = ',lrwork, ' liwork = ',liwork, ' divide = ',divide
call debug_writeMatFile_cmp( "Sk_sqrtS_", zzzz, norbitals, norbitals, 0 )
slam(:) = 0.0d0
work (:) = 0.0d0
rwork(:) = 0.0d0
! first find optimal working space size (lwork)
call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work, -1, rwork, info)
! resize working space
lwork = work(1) ! workspace query see: https://www.netlib.org/lapack/explore-html/df/d9a/group__complex16_h_eeigen_gaf23fb5b3ae38072ef4890ba43d5cfea2.html
deallocate (work)
allocate (work(lwork))
! diagonalize the overlap matrix with the new working space
slam(:) = 0.0d0
work (:) = 0.0d0
rwork(:) = 0.0d0
call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work, lwork, rwork , info)
if (info .ne. 0) call diag_error (info, 0)
write (*,*) "!!!! DEBUG sqrtS() debug_writeMatFile(zzzz_pre.log) norbitals=",norbitals, " norbitals_new= ", norbitals_new, "info ", info
call debug_writeMatFile_cmp( "zzzz_pre1_", zzzz, norbitals, norbitals, 0 )
EDIT
I found the problem/solution. It is about OpenMP. When I switch OpenMP off in compiler, or if I set export OMP_NUM_THREADS=1 it works just fine.
It seems similar to this problem: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Incorrect-eigenvectors-from-ZHEEV-in-MKL-using-multi-threading/td-p/836860
Still it would be nice if somebody can give me some advice why OpenMP cause problem here, and how to solve it. I would like to use OpenMP in my application.
Implicit None, haven't you?). What we really need is a minimal, reproducible example. The above is not enough.export OMP_NUM_THREADS=10vsexport OMP_NUM_THREADS=1. But that is something I could not know beforehand.