Why does .NET behave so poorly when StackOverflowException is thrown?

Question

I'm aware that StackOverflowExceptions in .NET can't be caught, take down their process, and have no stack trace. This is officially documented on MSDN. However, I'm wondering what the technical (or other) reasons are behind the behavior. All MSDN says is:

In prior versions of the .NET Framework, your application could catch a StackOverflowException object (for example, to recover from unbounded recursion). However, that practice is currently discouraged because significant additional code is required to reliably catch a stack overflow exception and continue program execution.

What is this "significant additional code"? Are there other documented reasons for this behavior? Even if we can't catch SOE, why can't we at least get a stack trace? Several co-workers and I just sunk several hours into debugging a production StackOverflowException that would have taken minutes with a stack trace, so I'm wondering if there is a good reason for my suffering.

"there's no more free space on the stack. Quick, put the necessary extra data on the stack to enable us to throw the exception, have it record relevant information, and to find and call the appropriate handler" — Stack Overflow is garbage
– Stack Overflow is garbage, Commented Mar 17, 2014 at 21:18
@jalf That's actually the easiest aspect of this problem to overcome (the RT could simply set a "soft limit" just shy of the actual stack size, so that it's guaranteed to have enough left over if a soft-overflow occurs). — TypeIA
– TypeIA, Commented Mar 17, 2014 at 21:21
@HansPassant You should seriously consider answering the question with that info. Awesome stuff. — julealgon
– julealgon, Commented Mar 17, 2014 at 23:03
Am I the only one worrying that asking about stack overflows on, well, stackoverflow, could cause the universe to collapse? — fgp
– fgp, Commented Mar 18, 2014 at 1:46
Related Java questions: Why does this method print 4? and Understanding java stack - Show what might happen when you do recover from a StackOverflow - additional method calls cause additional stack overflow errors, not a good state to be in. Of course, Java is not .Net, but I think it is interesting. — Kobi
– Kobi, Commented Mar 18, 2014 at 7:05

Hans Passant · Accepted Answer · 2017-06-02 14:27:05Z

86

The stack of a thread is created by Windows. It uses so-called guard pages to be able to detect a stack overflow. A feature that's generally available to user mode code as described in this MSDN Library article. The basic idea is that the last two pages of the stack (2 x 4096 = 8192 bytes) are reserved and any processor access to them triggers a page fault that's turned into an SEH exception, STATUS_GUARD_PAGE_VIOLATION.

This is intercepted by the kernel in the case of those pages belonging to a thread stack. It changes the protection attributes of the first of those 2 pages, thus giving the thread some emergency stack space to deal with the mishap, then re-raises a STATUS_STACK_OVERFLOW exception.

This exception is in turn intercepted by the CLR. At that point there's about 3 kilobytes of stack space left. This is, for one, not enough to run the Just-in-time compiler (JITter) to compile the code that could deal with the exception in your program, the JITter needs much more space than that. The CLR therefore cannot do anything else but rudely abort the thread. And by .NET 2.0 policy that also terminates the process.

Note how this is less of a problem in Java, it has a bytecode interpreter so there's a guarantee that executable user code can run. Or in a non-managed program written in languages like C, C++ or Delphi, code is generated at build time. It is however still a very difficult mishap to deal with, the emergency space in the stack is blown so there is no scenario where continuing to run code on the thread is safe to do. The likelihood that a program can continue operating correctly with a thread aborted at a completely random location and rather corrupted state is quite unlikely.

If there was any effort at all in considering raising an event on another thread or in removing the restriction in the winapi (the number of guard pages is not configurable) then that's either a very well-kept secret or just wasn't considered useful. I suspect the latter, don't know it for a fact.

edited Jun 2, 2017 at 14:27

answered Mar 17, 2014 at 23:38

Hans Passant

946k151 gold badges1.8k silver badges2.6k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Christian.K Over a year ago

+1 for this "The likelihood that a program can continue operating correctly with a thread aborted at a completely random location and rather corrupted state is quite unlikely" alone. If people would only get this - regardless of the circumstances that lead to such a situation.

ChaseMedallion Over a year ago

This is exactly the kind of information I was looking for. Any idea why we can't at least get a stack trace with the exception?

Gabe Over a year ago

I don't buy it. The OP starts with a quote indicating that it used to be possible, so the real question is what changed.

Hans Passant Over a year ago

Hmya, the CLR v1.x policy of just letting the thread die without an AppDomain.UnhandledException callback was widely despised. Got the Windows group at Microsoft to hate managed code so heavily when they tried to use it in Longhorn.

Eric Lloyd Over a year ago

Also, it might be an interesting exercise to explore how Mono handles it.

|

TypeIA · Accepted Answer · 2014-03-17 21:18:13Z

16

The stack is where virtually everything about the state of a program is stored. The address of each return site when methods are called, local variables, method parameters, etc. If a method overflows the stack, its execution must, by necessity, stop immediately (since there is no more stack space left for it to continue running). Then, to gracefully recover, somebody needs to clean up whatever that method did to the stack before it died. This means knowing what the stack looked like before the method was called. This incurs some overhead.

And if you can't clean up the stack, then you can't get a stack trace either, because the information required to generate the trace comes from "unrolling" the stack to discover which methods were called.

answered Mar 17, 2014 at 21:18

TypeIA

17.4k1 gold badge41 silver badges58 bronze badges

11 Comments

Brian Rasmussen Over a year ago

@pasty The stack is not wrapped. Storage for managed stacks is allocated and committed when the thread is created. There's no option to extend this at run-time.

Bob Albright Over a year ago

Why not just destroy the current thread that overflowed the stack? Why does the whole process need to be killed?

Brian Rasmussen Over a year ago

@BobAlbright Because doing so would mean that the process would be in an undefined state.

ChaseMedallion Over a year ago

Java handles this... I don't see the reason why .NET can't do the same. I also don't see why it can't back off to the last non-overflow stack frame and proceed normally (for an exception) from there.

ChaseMedallion Over a year ago

@dvnrrs: this is the essence of my question. WHY did .NET decide not to handle SO gracefully?

|

supercat · Accepted Answer · 2014-03-17 22:36:56Z

To handle stack overflow or out-of-memory conditions gracefully, it is necessary to trigger an exception somewhat before the stack has actually overflowed or heap memory is totally exhausted, at a time when the available stack and heap resources will be adequate to execute any cleanup code that will need to run before the exceptions are caught. In the case of stack-overflow exceptions, handling them cleanly would basically require checking the stack pointer on entry to each method (which shouldn't really be all that expensive). Normally, they're handled by setting an access-violation trap just beyond the end of the stack, but the problem with doing that is that the trap won't fire until it's already too late to handle things cleanly. One could set the trap to fire on the last memory block of the stack, rather than the one past, and have the system change the trap to the block past the stack once it fires and triggers a StackOverflowException, but the problem is there would be no nice way to ensure that the "almost out of stack" trap got re-enabled once the stack had unwound that far.

That having been said, an alternative approach would be to allow threads to set a delegate for what should happen if the thread blows its stack, and then say that in case of StackOverflowException the thread's stack will be cleared and it will run the supplied delegate. The trap could be re-instated before running the delegate (the stack would be empty at that point), and code could maintain a thread-status object that the delegate could use to know whether any important finally blocks got skipped.

Collectives™ on Stack Overflow

Why does .NET behave so poorly when StackOverflowException is thrown?

3 Answers 3

7 Comments

11 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related