0

I have DFT(density functional theory) code written in Fortran. Inside this program I use zheev function from lapack to diagonalize overlap matrix (S). I compile the code in two different targets (a) standalone program fireball.x and (b) dynamic library (libFireCore.so), without any differences in the code.

When I run the program fireball.x everything works fine. When I call libFireCore.so from inside other program (written in C++) it produces incorect output which is non-deterministic - i.e. every time I run it I got different results.

I tracked down the problem, and it turns out the problem is in some zheev call where I diagonalize overlap matrix.

When I print the overlap matrix (S) which goes into zheev it is exactly the same (in fireball.x and libFireCore.so). But the output matrix (eigenvectors, zzzz) differ at the end. (see attached images)

The S matrix which goes in is the same: enter image description here

The eigenvectros (zzzz) which goes out differs at the end: enter image description here

At the same time the eigenvaues are exactly the same (and the values are not crazy small or crazy large): S-eigenvalues: 0.20320429171825077 0.23871180664043531 0.24800955043297762 0.25981763454707818 0.26625761411873500 0.29656130959541332 0.30406932675113513 0.31931361483751519 0.34177021310844230 0.36133697610296117 0.38357717267479940 0.57800206110211672 0.73150940647530815 0.73986354541136745 0.87968589806449793 1.0629166543759208 1.1745454410259122 1.1914134775739986 1.2071693499296059 1.2928315521982148 1.3497098021371781 1.4298908216838513 1.5846592240498847 1.7381615303550286 1.9139531841288968 1.9292327889682286 2.1528696122119046 2.2187686011872194 2.6022345485931204

How is this possible? It seems like some memory corruption inside zheev or how else I can explain the randomness of output (non-deterministic)?

I checked using MKL_VERBOS=1 that the version of MKL library is the same as well as the dimensions of all arrays (including the temporary work-arrays) is the same. So what can be wrong?

    integer lwork, lrwork, liwork
    complex, allocatable, dimension (:) :: work
    real,    allocatable, dimension (:) :: rwork
    real,    allocatable, dimension (:) :: slam
    complex, allocatable, dimension (:, :)    :: xxxx, zzzz

    allocate ( slam(norbitals) )
    allocate ( xxxx(norbitals,norbitals) )
    allocate ( zzzz(norbitals,norbitals) )

    lwork  = 1
    lrwork = 3*norbitals - 2
    allocate (work(lwork))
    allocate (rwork(lrwork))

    write (*,*) "!!!! DEBUG sqrtS debug_writeMatFile(Sk_sqrtS.log) norbitals=",norbitals, ' lwork = ',lwork, ' lrwork = ',lrwork, ' liwork = ',liwork, ' divide = ',divide
    call debug_writeMatFile_cmp( "Sk_sqrtS_", zzzz, norbitals, norbitals, 0 )

        slam(:)  = 0.0d0
        work (:) = 0.0d0
        rwork(:) = 0.0d0

! first find optimal working space size (lwork)
        call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work,   -1, rwork, info)
! resize working space
        lwork = work(1)    !  workspace query see: https://www.netlib.org/lapack/explore-html/df/d9a/group__complex16_h_eeigen_gaf23fb5b3ae38072ef4890ba43d5cfea2.html

        deallocate (work)
        allocate (work(lwork))
! diagonalize the overlap matrix with the new working space

        slam(:)  = 0.0d0
        work (:) = 0.0d0
        rwork(:) = 0.0d0
        call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work,  lwork, rwork , info)

    if (info .ne. 0) call diag_error (info, 0)

    write (*,*) "!!!! DEBUG sqrtS() debug_writeMatFile(zzzz_pre.log) norbitals=",norbitals, " norbitals_new= ", norbitals_new, "info ", info
    call debug_writeMatFile_cmp( "zzzz_pre1_", zzzz, norbitals, norbitals, 0 )  

EDIT

I found the problem/solution. It is about OpenMP. When I switch OpenMP off in compiler, or if I set export OMP_NUM_THREADS=1 it works just fine.

It seems similar to this problem: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Incorrect-eigenvectors-from-ZHEEV-in-MKL-using-multi-threading/td-p/836860

Still it would be nice if somebody can give me some advice why OpenMP cause problem here, and how to solve it. I would like to use OpenMP in my application.

9
  • 3
    zheev is for (in old speak) double complex. You are using default kind complex. They are not the same thing. You should probably be calling cheev. This may be changed if you are using compiler flags to change the default real kind. If you are, don't unless you want to live a life full of pain. Commented Sep 25, 2023 at 13:55
  • 3
    To say anything more is a guess (e.g. You have got Implicit None, haven't you?). What we really need is a minimal, reproducible example. The above is not enough. Commented Sep 25, 2023 at 13:57
  • 2
    Note also complex eigenvectors have an arbitrary phase. You can get apparently total different evecs which are actually both totally valid solutions to the problem. And there is no guarantee that a eigenproblem solver will give the same variant, and note this is not just for degenerate cases. Simply comparing evecs is not enough, you have to check the Eval equation is satisfied if you are checking for correctness Commented Sep 25, 2023 at 17:33
  • @IanBush yes, I have set default real to double in compiler flags. Commented Sep 27, 2023 at 7:24
  • @IanBush clearly I cannot provide minimal, reproducible example because the problem is clearly context dependent - the same source code when run as standalone executable was working fine, while when called as a function from library was not working fine. Now I found that the difference in context was the export OMP_NUM_THREADS=10 vs export OMP_NUM_THREADS=1. But that is something I could not know beforehand. Commented Sep 27, 2023 at 7:27

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.