I have a recursive algorithm that I use to iterate over a hierarchical data structure, but unfortunately with some data, the hierarchical structure is so deep that I'm getting a StackOverflowError. I've seen this happen with a depth of about 150ish nodes, while the data could potentially grow to much further than that. For context, this code will run in limited environments and changing the JVM stack size is not an option, and the data structure is a given and represents different file systems with directories and files.
To work around the stack overflow, I've tried to convert the algorithm into an iterative one. It's not something I've had to do before, so I started from some examples showing how to do this with a simple recursion, but I'm not sure how to apply this to recursion inside a loop. I've found a way to do it that seems to work, but the code is rather insane.
Here is a simplified version of my original recursive method:
private CacheEntry sumUpAndCacheChildren(Node node) {
final CacheEntry entry = getCacheEntry(node);
if (entryIsValid(entry))
return entry;
Node[] children = node.listChildren();
long size = 0;
if (children != null) {
for (Node child : children) {
if (child.hasChildren()) {
size += sumUpAndCacheChildren(child).size;
} else {
size += child.size();
}
}
}
return putInCache(node, size);
}
Each leaf node has a size, while the size for any ancestor node is considered to be the size of all of its descendants. I want to know this size for each node, so the size is aggregated and cached for every node.
Here is the iterative version:
private CacheEntry sumUpAndCacheChildren(Node initialNode) {
class StackFrame {
final Node node;
Node[] children;
// Local vars
long size;
// Tracking stack frame state
int stage;
int loopIndex;
StackFrame(Node node) {
this.node = node;
this.children = null;
this.size = 0;
this.stage = 0;
this.loopIndex = 0;
}
}
final Stack<StackFrame> stack = new Stack<StackFrame>();
stack.push(new StackFrame(initialNode));
CacheEntry retValue = getCacheEntry(initialNode);
outer:
while (!stack.isEmpty()) {
final StackFrame frame = stack.peek();
final Node node = frame.node;
switch(frame.stage) {
case 0: {
final CacheEntry entry = getCacheEntry(node);
if (entryIsValid(entry)) {
retValue = entry;
stack.pop();
continue;
}
frame.children = node.asItem().listChildren();
frame.stage = frame.children != null ? 1 : 3;
} break;
case 1: {
for (int i = frame.loopIndex; i < frame.children.length; ++i) {
frame.loopIndex = i;
final Node child = frame.children[i];
if (child.hasChildren()) {
stack.push(new StackFrame(child));
frame.stage = 2; // Accumulate results once all the child stacks have been calculated.
frame.loopIndex++; // Make sure we restart the for loop at the next iteration the next time around.
continue outer;
} else {
frame.size += child.size();
}
}
frame.stage = 3;
} break;
case 2: {
// Accumulate results
frame.size += retValue.size;
frame.stage = 1; // Continue the for loop
} break;
case 3: {
retValue = putInCache(node, frame.type);
stack.pop();
continue;
}
}
}
return retValue;
}
This just feels more insane than it needs to be, and it would be painful to have to do this in all the places in the code where I recurse into the children and do different ops on them. What techniques could I use to make it easier to do recursion when I'm aggregating at each level and doing that in a for-loop over the children?
EDIT:
I was able to greatly simplify things with the help of the answers below. The code is now nearly as concise as the original recursive version. Now, I just need to apply the same principles everywhere else where I'm recursing over the same data structure.