61

I'm aware that StackOverflowExceptions in .NET can't be caught, take down their process, and have no stack trace. This is officially documented on MSDN. However, I'm wondering what the technical (or other) reasons are behind the behavior. All MSDN says is:

In prior versions of the .NET Framework, your application could catch a StackOverflowException object (for example, to recover from unbounded recursion). However, that practice is currently discouraged because significant additional code is required to reliably catch a stack overflow exception and continue program execution.

What is this "significant additional code"? Are there other documented reasons for this behavior? Even if we can't catch SOE, why can't we at least get a stack trace? Several co-workers and I just sunk several hours into debugging a production StackOverflowException that would have taken minutes with a stack trace, so I'm wondering if there is a good reason for my suffering.

5
  • 22
    "there's no more free space on the stack. Quick, put the necessary extra data on the stack to enable us to throw the exception, have it record relevant information, and to find and call the appropriate handler" Commented Mar 17, 2014 at 21:18
  • 8
    @jalf That's actually the easiest aspect of this problem to overcome (the RT could simply set a "soft limit" just shy of the actual stack size, so that it's guaranteed to have enough left over if a soft-overflow occurs). Commented Mar 17, 2014 at 21:21
  • @HansPassant You should seriously consider answering the question with that info. Awesome stuff. Commented Mar 17, 2014 at 23:03
  • 3
    Am I the only one worrying that asking about stack overflows on, well, stackoverflow, could cause the universe to collapse? Commented Mar 18, 2014 at 1:46
  • 1
    Related Java questions: Why does this method print 4? and Understanding java stack - Show what might happen when you do recover from a StackOverflow - additional method calls cause additional stack overflow errors, not a good state to be in. Of course, Java is not .Net, but I think it is interesting. Commented Mar 18, 2014 at 7:05

3 Answers 3

86

The stack of a thread is created by Windows. It uses so-called guard pages to be able to detect a stack overflow. A feature that's generally available to user mode code as described in this MSDN Library article. The basic idea is that the last two pages of the stack (2 x 4096 = 8192 bytes) are reserved and any processor access to them triggers a page fault that's turned into an SEH exception, STATUS_GUARD_PAGE_VIOLATION.

This is intercepted by the kernel in the case of those pages belonging to a thread stack. It changes the protection attributes of the first of those 2 pages, thus giving the thread some emergency stack space to deal with the mishap, then re-raises a STATUS_STACK_OVERFLOW exception.

This exception is in turn intercepted by the CLR. At that point there's about 3 kilobytes of stack space left. This is, for one, not enough to run the Just-in-time compiler (JITter) to compile the code that could deal with the exception in your program, the JITter needs much more space than that. The CLR therefore cannot do anything else but rudely abort the thread. And by .NET 2.0 policy that also terminates the process.

Note how this is less of a problem in Java, it has a bytecode interpreter so there's a guarantee that executable user code can run. Or in a non-managed program written in languages like C, C++ or Delphi, code is generated at build time. It is however still a very difficult mishap to deal with, the emergency space in the stack is blown so there is no scenario where continuing to run code on the thread is safe to do. The likelihood that a program can continue operating correctly with a thread aborted at a completely random location and rather corrupted state is quite unlikely.

If there was any effort at all in considering raising an event on another thread or in removing the restriction in the winapi (the number of guard pages is not configurable) then that's either a very well-kept secret or just wasn't considered useful. I suspect the latter, don't know it for a fact.

Sign up to request clarification or add additional context in comments.

7 Comments

+1 for this "The likelihood that a program can continue operating correctly with a thread aborted at a completely random location and rather corrupted state is quite unlikely" alone. If people would only get this - regardless of the circumstances that lead to such a situation.
This is exactly the kind of information I was looking for. Any idea why we can't at least get a stack trace with the exception?
I don't buy it. The OP starts with a quote indicating that it used to be possible, so the real question is what changed.
Hmya, the CLR v1.x policy of just letting the thread die without an AppDomain.UnhandledException callback was widely despised. Got the Windows group at Microsoft to hate managed code so heavily when they tried to use it in Longhorn.
Also, it might be an interesting exercise to explore how Mono handles it.
|
16

The stack is where virtually everything about the state of a program is stored. The address of each return site when methods are called, local variables, method parameters, etc. If a method overflows the stack, its execution must, by necessity, stop immediately (since there is no more stack space left for it to continue running). Then, to gracefully recover, somebody needs to clean up whatever that method did to the stack before it died. This means knowing what the stack looked like before the method was called. This incurs some overhead.

And if you can't clean up the stack, then you can't get a stack trace either, because the information required to generate the trace comes from "unrolling" the stack to discover which methods were called.

11 Comments

@pasty The stack is not wrapped. Storage for managed stacks is allocated and committed when the thread is created. There's no option to extend this at run-time.
Why not just destroy the current thread that overflowed the stack? Why does the whole process need to be killed?
@BobAlbright Because doing so would mean that the process would be in an undefined state.
Java handles this... I don't see the reason why .NET can't do the same. I also don't see why it can't back off to the last non-overflow stack frame and proceed normally (for an exception) from there.
@dvnrrs: this is the essence of my question. WHY did .NET decide not to handle SO gracefully?
|
7

To handle stack overflow or out-of-memory conditions gracefully, it is necessary to trigger an exception somewhat before the stack has actually overflowed or heap memory is totally exhausted, at a time when the available stack and heap resources will be adequate to execute any cleanup code that will need to run before the exceptions are caught. In the case of stack-overflow exceptions, handling them cleanly would basically require checking the stack pointer on entry to each method (which shouldn't really be all that expensive). Normally, they're handled by setting an access-violation trap just beyond the end of the stack, but the problem with doing that is that the trap won't fire until it's already too late to handle things cleanly. One could set the trap to fire on the last memory block of the stack, rather than the one past, and have the system change the trap to the block past the stack once it fires and triggers a StackOverflowException, but the problem is there would be no nice way to ensure that the "almost out of stack" trap got re-enabled once the stack had unwound that far.

That having been said, an alternative approach would be to allow threads to set a delegate for what should happen if the thread blows its stack, and then say that in case of StackOverflowException the thread's stack will be cleared and it will run the supplied delegate. The trap could be re-instated before running the delegate (the stack would be empty at that point), and code could maintain a thread-status object that the delegate could use to know whether any important finally blocks got skipped.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.