1

I have a recursive algorithm that I use to iterate over a hierarchical data structure, but unfortunately with some data, the hierarchical structure is so deep that I'm getting a StackOverflowError. I've seen this happen with a depth of about 150ish nodes, while the data could potentially grow to much further than that. For context, this code will run in limited environments and changing the JVM stack size is not an option, and the data structure is a given and represents different file systems with directories and files.

To work around the stack overflow, I've tried to convert the algorithm into an iterative one. It's not something I've had to do before, so I started from some examples showing how to do this with a simple recursion, but I'm not sure how to apply this to recursion inside a loop. I've found a way to do it that seems to work, but the code is rather insane.

Here is a simplified version of my original recursive method:

private CacheEntry sumUpAndCacheChildren(Node node) {
    final CacheEntry entry = getCacheEntry(node);

    if (entryIsValid(entry))
        return entry;

    Node[] children = node.listChildren();

    long size = 0;  

    if (children != null) {         
        for (Node child : children) {
            if (child.hasChildren()) {  
                size += sumUpAndCacheChildren(child).size;                  
            } else {                    
                size += child.size();
            }
        }                   
    }

    return putInCache(node, size);      
}

Each leaf node has a size, while the size for any ancestor node is considered to be the size of all of its descendants. I want to know this size for each node, so the size is aggregated and cached for every node.

Here is the iterative version:

private CacheEntry sumUpAndCacheChildren(Node initialNode) {
    class StackFrame {
        final Node node;
        Node[] children;

        // Local vars
        long size;

        // Tracking stack frame state
        int stage;
        int loopIndex;

        StackFrame(Node node) {
            this.node = node;
            this.children = null;
            this.size = 0;
            this.stage = 0;
            this.loopIndex = 0;
        }
    }

    final Stack<StackFrame> stack = new Stack<StackFrame>();
    stack.push(new StackFrame(initialNode));
    CacheEntry retValue = getCacheEntry(initialNode);

    outer:
    while (!stack.isEmpty()) {
        final StackFrame frame = stack.peek();
        final Node node = frame.node;

        switch(frame.stage) {
            case 0: {
                final CacheEntry entry = getCacheEntry(node);

                if (entryIsValid(entry)) {
                    retValue = entry;
                    stack.pop();
                    continue;       
                }

                frame.children = node.asItem().listChildren();
                frame.stage = frame.children != null ? 1 : 3;
            } break;
            case 1: {
                for (int i = frame.loopIndex; i < frame.children.length; ++i) {
                    frame.loopIndex = i;
                    final Node child = frame.children[i];

                    if (child.hasChildren()) {
                        stack.push(new StackFrame(child));
                        frame.stage = 2;    // Accumulate results once all the child stacks have been calculated.
                        frame.loopIndex++;  // Make sure we restart the for loop at the next iteration the next time around.
                        continue outer;
                    } else {
                        frame.size += child.size();
                    }
                }

                frame.stage = 3;
            } break;
            case 2: {
                // Accumulate results
                frame.size += retValue.size;
                frame.stage = 1;            // Continue the for loop
            } break;
            case 3: {
                retValue = putInCache(node, frame.type);
                stack.pop();
                continue;
            }
        }
    }

    return retValue;
}

This just feels more insane than it needs to be, and it would be painful to have to do this in all the places in the code where I recurse into the children and do different ops on them. What techniques could I use to make it easier to do recursion when I'm aggregating at each level and doing that in a for-loop over the children?

EDIT:

I was able to greatly simplify things with the help of the answers below. The code is now nearly as concise as the original recursive version. Now, I just need to apply the same principles everywhere else where I'm recursing over the same data structure.

10
  • The order you visit the children in doesn't matter, right? Commented Jan 22, 2015 at 21:19
  • 1
    The thing you need to do is analyze the dynamic data structure. When you implement a recursive algorithm, the parameters passed create a data structure. In some cases this structure is trivial, in others not. You need to figure out how to maintain the logically equivalent data structure in your iterative algorithm. Commented Jan 22, 2015 at 21:20
  • The order doesn't matter as long as I'm able to get a sum of all descendants at each node. I have some other use-cases though where I'm doing other stuff to the nodes, e.g. duplicating them or moving them to another tree. Commented Jan 22, 2015 at 21:21
  • 1
    @specializt - That is quite wrong. There are situations where recursion is by far the better choice, both in simplicity and clarity of code and in performance. (And, of course, there are situations that go the other way.) Commented Jan 23, 2015 at 2:30
  • 1
    @specializt - There are many case where the call stack creates an implicit branched network structure, and, in an iterative scheme, that structure must be reproduced with a linked network of objects. Java call is very efficient, while creating an object is quite expensive. You can easily come out ahead performing a call vs creating an object. Commented Jan 23, 2015 at 12:52

4 Answers 4

1

Since you're dealing with a tree structure and wish to compute cumulative sizes, try a DFS while tracking the parent of each node. I assume here that you cannot change or subclass Node and I kept all the function signatures you used.

private class SizedNode {
    public long cumulativeSize;
    public Node node;
    public SizedNode parent;

    public SizedNode(SizedNode parent, Node node) {
        this.node = node;
        this.parent = parent;
    }

    public long getSize() {
        if (node.hasChildren()) {
            return cumulativeSize;
        }
        else {
            return node.size();
        }
    }
}

private void sumUpAndCacheChildren(Node start)
{
    Stack<SizedNode> nodeStack = new Stack<SizedNode>();

    // Let's start with the beginning node.
    nodeStack.push(new SizedNode(null, start));

    // Loop as long as we've got nodes to process
    while (!nodeStack.isEmpty()) {

        // Take a look at the top node
        SizedNode sizedNode = nodeStack.peek();            
        CacheEntry entry = getCacheEntry(sizedNode.node);

        if (entryIsValid(entry)) {
            // It's cached already, so we have computed its size
            nodeStack.pop();

            // Add the size to the parent, if applicable.
            if (sizedNode.parent != null) {
                sizedNode.parent.cumulativeSize += sizedNode.getSize();

                // If the parent's now the top guy, we're done with it so let's cache it
                if (sizedNode.parent == nodeStack.peek()) {
                    putInCache(sizedNode.parent.node, sizedNode.parent.getSize());
                }
            }
        }
        else {
            // Not cached.
            if (sizedNode.node.hasChildren()) {
                // It's got a bunch of children.
                // We can't compute the size yet, so just add the kids to the stack.
                Node[] children = sizedNode.node.listChildren();
                if (children != null) {
                    for (Node child : children) {
                        nodeStack.push(new SizedNode(sizedNode, child));
                    }    
                }                    
            }
            else {
                // It's a leaf node. Let's cache it.
                putInCache(sizedNode.node, sizedNode.node.size());
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

1

You're basically doing a post-order iterative traversal of an N-ary tree; you can try searching for that for more detailed examples.

In very rough pseudocode:

Node currentNode;
Stack<Node> pathToCurrent;
Stack<Integer> sizesInStack;
Stack<Integer> indexInNode;

pathToCurrent.push(rootNode);
sizesInStack.push(0);
indexInNode.push(0);

current = rootNode;
currentSize = 0;
currentIndex = 0;
while (current != null) {
  if (current.children != null && currentIndex < current.children.size) {
    //process the next node
    nextChild = current.children[currentIndex];
    pathToCurrent.push(current);
    sizesInStack.push(currentSize);
    indexInNode.push(currentIndex);
    current = nextChild;
    currentSize = 0;
    currentIndex = 0;
  } else {
    //this node is a leaf, or we've handled all its children 
    //put our size into the cache, then pop off the stack and set up for the next child of our parent
    currentSize += this.size();
    putInCache(this, currentSize);
    current = pathToCurrent.pop();  //If pop throws an exception on empty stack, handle it here and exit the loop
    currentSize = currentSize + sizesInStack.pop();
    currentIndex = 1 + indexInNode.pop();
  }
}

1 Comment

I only saw this after I came up with a different solution, but this will help me out with tackling the other places in the code where I need to change things to iterative. Thanks!
0

OK, im gonna explain it in human words since i dont want to code right now :

  1. Acquire topmost level of elements and write into a list
  2. LOOP BEGIN
  3. count elements on this level and add them to your counter
  4. acquire list of children from your current list, store seperately
  5. delete list of current elements
  6. write list of children to where the list of the current elements was
  7. LOOP END

you simply need to put a boolean into the loop-header and set it to false if the list of children has no elements anymore ... i hope i was able to express myself correctly, feel free to ask questions and/or inquire about clarification.

This algorithm will get exponentially slower ( --> O(n²) ) in each iteration if the data-structure keeps "folding out", its rather inefficient and im quite sure someone can come up with an optimization - but it will be faster than with recursion and it wont produce a stack overflow; yet it may produce an OutOfMemoryException for very large datasets - but since only one level is iterated at any time this is ... quite unrealistic i guess

3 Comments

The thing is, there is a counter for every node, not just one counter for the entire loop. Otherwise it would be much simpler. I mean I guess I could store the counters in a separate structure instead of as part of the stack frame, and then the iterative algorithm would become simpler and I could rerun it over and over with a different counter each time, but that doesn't seem much simpler than what I have right now.
well thats a bummer, then - but you would need only one single counter - declared outside of the loop .... i'd recommend an AtomicInteger
Ended up finding a simpler way but it took some rethinking of the problem. Definitely better that way than just ASMing the recursive approach.
0

After adapting @Marius's answer to my use case, I came up with this:

class SizedNode {
    final Node node;
    final SizedNode parent;

    long size;
    boolean needsCaching;

    SizedNode(Node node, SizedNode parent) {
        this.parent = parent;
        this.node = node;
    }
}

private CacheEntry sumUpAndCacheChildren(Node start) {      
    final Stack<SizedNode> stack = new Stack<SizedNode>();
    stack.push(new SizedNode(start, null));
    CacheEntry returnValue = getCacheEntry(start);

    while (!stack.isEmpty()) {
        final SizedNode sizedNode = stack.pop();           
        final CacheEntry entry = getCacheEntry(sizedNode.folder);

        if (sizedNode.needsCaching) {
            // We finished processing all children, and now we're done with this node.
            if (sizedNode.parent != null) {
                sizedNode.parent.size += sizedNode.size;                
            }
            returnValue = putInCache(sizedNode.folder, sizedNode.size);
        } else if (entryIsValid(entry)) {
            if (sizedNode.parent != null) {
                sizedNode.parent.size += entry.size;
            }
            returnValue = entry;
        } else {                    
            // The next time we see this node again, it will be after we process all of its children.
            sizedNode.needsCaching = true;
            stack.push(sizedNode);

            for (Node child : sizedNode.node.listChildren()) {
                if (child.hasChildren()) {
                    stack.push(new SizedNode(child, sizedNode));                        
                } else {
                    sizedNode.size += child.size();
                }
            }               
        }
    }

    return returnValue;
}

This is much better than the crazy mess I came up with on my first pass. Just goes to show that you really have to think about transforming the algorithm so that it also makes sense as an iterative approach. Thanks all for the help!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.