2

I am using Intel MKL library to solve a system of linear equations (A*x = b) with multiple right-hand side (rhs) vectors. The rhs vectors are generated asynchronously and through a separate routine and therefore, it is not possible to solve them all at once.

In order to expedite the program, a multi-threaded program is used where each thread is responsible for solving a single rhs vectors. Since the matrix A is always constant, LU factorization should be performed once and the factors are used subsequently in all threads. So, I factor A using following command

dss_factor_real(handle, opt, data);

and pass the handle to the threads to solve the problems using following command:

dss_solve_real(handle, opt, rhs, nRhs, sol);

However, I found out that it is not thread-safe to use the same handle in several instances of dss_solve_real. Apparently, for some reason, MKL library changes handle in each instance which creates race condition. I read the MKL manual but could not find anything relevant. Since it is not logical to factorize A for each thread, I am wondering if there is any way to overcome this problem and use the same handle everywhere.

Thanks in advance for your help

1 Answer 1

1

As far as I understand the DSS interface, handle does not contains only the LU factorization, but also other data structures, used and modified in dss_solve_real; this is by design, so you should use a locking mechanism to avoid multiple threads calling dss_solve_real concurrently on the same handle.

Moreover your assumption that dss_solve_real is serial (otherwise I do not understand why should you call multiple instances of it concurrently) is probably wrong. DSS is an interface to the PARDISO solver, which should be parallel in all of it's phases, not only factorization.

Edit

Abandoning the DSS interface and calling directly pardiso, it should be possible to have many threads serially solving a single rhs each. (Not easy, but with careful programming it should be possible...)

However from the point of view of maximum throughput (rhs solved per unit of time) and not minimum latency (time before the solution of a single rhs is started) I think that the best approach is to have a single working thread that solves all rhs waiting in the queue with a single call to the parallel solver. Of course the queue should be organized so that rhs vectors are stored in a contigous memory area.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for your point. I do not make any assumption (serial or parallel). In fact, RHS vectors are coming asynchronously in a queue. Whenever a RHS vector is ready, a free worker (thread) is employed to solve the problem for that specific RHS vector. Now I am confident that dss_solve_real does changes the handle. If I want to lock the handle, using threads will have no justification.
See my edit: I still think that having many threads solving rhs is not efficient, even if you can force a serial solving phase.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.