Complex loop in a C++ program portable to OpenMP and MPI?

Question

I have a C++ number crunching program. The structure is:

a) data input, data preparation

b) "big" loop, uses global and local data (lots of different variables in both cases)

c) postprocess results and write data

The most intensive part is "b", which is basically a loop. I need to speedup the program in a cluster. 25 blades, 4 cores each. I wonder whether I could use here OpenMP and MPI, or if you can point me to tutorials, not general cases, but complex and "big" for loops.

Thanks

We can't help optimize your code unless you post the relevant code sample. — Ed Swangren
– Ed Swangren, Commented Jan 27, 2011 at 19:46
This depends a lot on what your loop actually does. For instance, could each iteration be executed independently from the other ones (and still give the correct result)? What sort of data dependencies are there? More info would help us give a good answer. — suszterpatt
– suszterpatt, Commented Jan 27, 2011 at 19:51
25 Seperate blades would require MPI. If I remember correctly OpenMP is for shared-memory applications. As for your global data, would they change during the loop? If so, you would need to push this change between all the nodes so that they stay current — Will
– Will, Commented Jan 27, 2011 at 20:00

Vitor Py · Accepted Answer · 2011-01-27 20:03:03Z

1

Actually, you should use both.

Use MPI to distribute tasks between blades and OpenMP to fully utilize each blade. Take some time to understand how memory and sharing works on each case.

answered Jan 27, 2011 at 20:03

Vitor Py

5,1904 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Deck · Accepted Answer · 2011-01-27 20:20:01Z

0

You cannot devide your task between blade using OpenMP. Try to devide you loop on several part and distribute capacity on them. For example if you want composition of 2 vectors with N size. N/2 will be on one node and another part on another.

But transmition costs between blades is palpable. Thus if your task is not actually great. May be would be better if you distribute it into 4 cores.

answered Jan 27, 2011 at 20:20

Deck

1,9794 gold badges20 silver badges41 bronze badges

Collectives™ on Stack Overflow

Complex loop in a C++ program portable to OpenMP and MPI?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related