zheev outputs incorrect and random results when called from within dynamic library

Ask Question

Asked 2 years, 2 months ago

Modified 2 years, 2 months ago

Viewed 93 times

I have DFT(density functional theory) code written in Fortran. Inside this program I use zheev function from lapack to diagonalize overlap matrix (S). I compile the code in two different targets (a) standalone program fireball.x and (b) dynamic library (libFireCore.so), without any differences in the code.

When I run the program fireball.x everything works fine. When I call libFireCore.so from inside other program (written in C++) it produces incorect output which is non-deterministic - i.e. every time I run it I got different results.

I tracked down the problem, and it turns out the problem is in some zheev call where I diagonalize overlap matrix.

When I print the overlap matrix (S) which goes into zheev it is exactly the same (in fireball.x and libFireCore.so). But the output matrix (eigenvectors, zzzz) differ at the end. (see attached images)

The S matrix which goes in is the same:

The eigenvectros (zzzz) which goes out differs at the end:

At the same time the eigenvaues are exactly the same (and the values are not crazy small or crazy large): S-eigenvalues: 0.20320429171825077 0.23871180664043531 0.24800955043297762 0.25981763454707818 0.26625761411873500 0.29656130959541332 0.30406932675113513 0.31931361483751519 0.34177021310844230 0.36133697610296117 0.38357717267479940 0.57800206110211672 0.73150940647530815 0.73986354541136745 0.87968589806449793 1.0629166543759208 1.1745454410259122 1.1914134775739986 1.2071693499296059 1.2928315521982148 1.3497098021371781 1.4298908216838513 1.5846592240498847 1.7381615303550286 1.9139531841288968 1.9292327889682286 2.1528696122119046 2.2187686011872194 2.6022345485931204

How is this possible? It seems like some memory corruption inside zheev or how else I can explain the randomness of output (non-deterministic)?

I checked using MKL_VERBOS=1 that the version of MKL library is the same as well as the dimensions of all arrays (including the temporary work-arrays) is the same. So what can be wrong?

    integer lwork, lrwork, liwork
    complex, allocatable, dimension (:) :: work
    real,    allocatable, dimension (:) :: rwork
    real,    allocatable, dimension (:) :: slam
    complex, allocatable, dimension (:, :)    :: xxxx, zzzz

    allocate ( slam(norbitals) )
    allocate ( xxxx(norbitals,norbitals) )
    allocate ( zzzz(norbitals,norbitals) )

    lwork  = 1
    lrwork = 3*norbitals - 2
    allocate (work(lwork))
    allocate (rwork(lrwork))

    write (*,*) "!!!! DEBUG sqrtS debug_writeMatFile(Sk_sqrtS.log) norbitals=",norbitals, ' lwork = ',lwork, ' lrwork = ',lrwork, ' liwork = ',liwork, ' divide = ',divide
    call debug_writeMatFile_cmp( "Sk_sqrtS_", zzzz, norbitals, norbitals, 0 )

        slam(:)  = 0.0d0
        work (:) = 0.0d0
        rwork(:) = 0.0d0

! first find optimal working space size (lwork)
        call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work,   -1, rwork, info)
! resize working space
        lwork = work(1)    !  workspace query see: https://www.netlib.org/lapack/explore-html/df/d9a/group__complex16_h_eeigen_gaf23fb5b3ae38072ef4890ba43d5cfea2.html

        deallocate (work)
        allocate (work(lwork))
! diagonalize the overlap matrix with the new working space

        slam(:)  = 0.0d0
        work (:) = 0.0d0
        rwork(:) = 0.0d0
        call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work,  lwork, rwork , info)

    if (info .ne. 0) call diag_error (info, 0)

    write (*,*) "!!!! DEBUG sqrtS() debug_writeMatFile(zzzz_pre.log) norbitals=",norbitals, " norbitals_new= ", norbitals_new, "info ", info
    call debug_writeMatFile_cmp( "zzzz_pre1_", zzzz, norbitals, norbitals, 0 )

EDIT

I found the problem/solution. It is about OpenMP. When I switch OpenMP off in compiler, or if I set export OMP_NUM_THREADS=1 it works just fine.

Still it would be nice if somebody can give me some advice why OpenMP cause problem here, and how to solve it. I would like to use OpenMP in my application.

edited Sep 27, 2023 at 7:35

asked Sep 25, 2023 at 12:55

Prokop Hapala

2,4743 gold badges34 silver badges67 bronze badges

3

zheev is for (in old speak) double complex. You are using default kind complex. They are not the same thing. You should probably be calling cheev. This may be changed if you are using compiler flags to change the default real kind. If you are, don't unless you want to live a life full of pain.

Ian Bush
– Ian Bush

2023-09-25 13:55:30 +00:00
Commented Sep 25, 2023 at 13:55
3

To say anything more is a guess (e.g. You have got Implicit None, haven't you?). What we really need is a minimal, reproducible example. The above is not enough.

Ian Bush
– Ian Bush

2023-09-25 13:57:22 +00:00
Commented Sep 25, 2023 at 13:57
2

Note also complex eigenvectors have an arbitrary phase. You can get apparently total different evecs which are actually both totally valid solutions to the problem. And there is no guarantee that a eigenproblem solver will give the same variant, and note this is not just for degenerate cases. Simply comparing evecs is not enough, you have to check the Eval equation is satisfied if you are checking for correctness

Ian Bush
– Ian Bush

2023-09-25 17:33:30 +00:00
Commented Sep 25, 2023 at 17:33
@IanBush yes, I have set default real to double in compiler flags.

Prokop Hapala
– Prokop Hapala

2023-09-27 07:24:36 +00:00
Commented Sep 27, 2023 at 7:24
@IanBush clearly I cannot provide minimal, reproducible example because the problem is clearly context dependent - the same source code when run as standalone executable was working fine, while when called as a function from library was not working fine. Now I found that the difference in context was the export OMP_NUM_THREADS=10 vs export OMP_NUM_THREADS=1. But that is something I could not know beforehand.

Prokop Hapala
– Prokop Hapala

2023-09-27 07:27:08 +00:00
Commented Sep 27, 2023 at 7:27

| Show 4 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

zheev outputs incorrect and random results when called from within dynamic library

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest