Lock strategy of OpenMP programming in C++ for nested loops

Question

Yesterday I tested my program of landscape evolution with a big data set(20 Million nodes), without a doubt the running speed was unacceptable. During debugging I noticed that it was a certain function which slowed the whole system down. So I'd like to add multithreading process to it.

However, the function itself is a nested loop with pointer iterators and I believe some of the data has to be locked during the process.

Basically what I was trying to do is to calculate the contributing area of each node. Contributing area, as suggested by its name, is a product (sum) of of all area from upstream node.

Here is the code of that two functions,

for( curnode = nodIter.FirstP(); nodIter.IsActive();
    curnode = nodIter.NextP() )  //iterating thru a linked-list of pointers to active stream nodes (*IsActive* returns a  bool value)
{
  CalcDRArea( curnode, curnode->getVArea() ); //calls another function and pass a pointer to current node  as well as a value of its VArea
}

void CalcDRArea( NodeClass *curnode, double addedArea )
{
  // As long as the current node is neither a boundary nor non-valid, add
  // _addedArea_ to its total drainage area and advance to the next node downstream

  while( (curnode->ReachBoundary() == NonBoundary) &&
        (curnode->valid()!=Nonvalid) )
  {
       curnode->AddDrArea( addedArea );  //*AddDrArea* is a simple inline function *`"drarea +=value"`*

    curnode = curnode->getDownStreammNeighbor();      // *getDownstrmNbr()* reruns a pointer to the downstream node
  }

}

Here is a simple illustration of nodes and their flowing direction

A     B     C     D  
   \  |  /        |
E     F     G     H
      |           |
I     Q     K     L
      |        / 
M     N     O     P
//letters are stream nodes and slashes are their flowing directions

My plan is to use OpenMP to implement multithreading at the beginning of first function, the for loop. Ideally it will create several threads to calculate each node separately.

However, as shown in figure above, a sequent process can handle streams like

A-> F -> Q -> N 
B-> F -> Q -> N 
C-> F -> Q -> N

easily, but it will definitely cause problem in a multithreading conditions.

From what I've just read from OpenMP's document, flush and lock might be the right way to do this, but still I am quite clueless right now and there might still be other potential issues within this loops (like gcc ver. of OpenMP doesn't support "!=").

====Update ====== There are two kinds of areas: vArea, which is the area of each node; and drArea which is the sum of the area of current node and area from all its upstream nodes. I was wondering if I can change the current function to this :

for( active node iterator）  
{
    if（currentNode.hasDownStreamNode）
    {
       downStreamNode.drArea += currentNode.vArea + currentNode.DrArea;
    }
    CurrentNode.drArea += currentNode.varea;
}

This seems very easily applicable to dynamic programming - using a better algorithm before trying to multithread something is usually not a bad idea. You're basically doing this in O(n^2) instead of O(n). — Voo
– Voo, Commented Dec 7, 2013 at 13:41
could you give me some advices on a better algorithm? I did realize it is a O(n*n) and don't know how to make it simpler. Thanks. — KuN
– KuN, Commented Dec 7, 2013 at 21:08
It depends on how exactly your algorithm works, because it's not clear from the given code, but the basic idea is that F has the value of A+B+C+F, while Q has the value A+B+C+F+Q and so on. So to compute the value of one node you only have to look at the already computed values of all its predecessors once instead of iterating through all nodes from each starting node. — Voo
– Voo, Commented Dec 7, 2013 at 22:38

Voo · Accepted Answer · 2013-12-08 15:48:23Z

Before worrying about parallelism, you should first pick a better algorithm. Although in this case one actually goes with the other.

You want a dynamic programming solution in O(N) instead of your current O(n^2) approach. The intuition is simple, just call the following method on each of the leaf nodes in your tree:

def compute_value(node):
     if node.DrArea != 0: return node.DrArea
     total = node.vArea
     for n in node.upstream_nodes():
          total += compute_value(n)
     node.DrArea = total
     return node.DrArea

To see why this is more efficient, let's take a look at your example. At the moment you add the value of A to F, Q and N. Then you do the same for B and for C. So you have 12 add operations. The recursive method on the other hand computes the value for A, B and C first, then we compute F from the already known values of A, B and C. Q gets computed from F and so on. So that's only 8 adds. Basically every node only adds its total value to all of its children instead of going through the whole subtree and adding only its own value.

For a simple sequential algorithm you can then go ahead and just implement this iteratively using a list of nodes whose predecessors have all been evaluated (so instead of starting from leaves start from the root nodes). The easiest parallel implementation here is to just use a concurrent queue and atomic add operations, although using one normal queue per processor and some work-stealing would probably be a very good idea in practice.

Collectives™ on Stack Overflow

Lock strategy of OpenMP programming in C++ for nested loops

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related