Looking at your description of the problems you are encountering and at the architecture and design you are describing in general (i.e. without even considering the fact that it is a compiler), your problems seem to be pretty standard ones:
- shared (even worse global) mutable state that makes it unclear what is actually going on
- large units of functionality
And the solutions are also pretty standard:
- don't share, don't mutate, or both, i.e. have pure functions that take an AST as argument and return an updated AST as a return value
- break them up into smaller ones
Of course, that is easier said than done … after all, you probably would have already done it if it was easy.
So, looking specifically at compilers, what can we do about shared mutable state? Like I said: instead of mutating a single AST, have the functions take an AST as an argument and return a new one as a result. This may sound pretty expensive, since it requires copying the entire AST, but unless you actually measure it, you cannot be sure. And once you do determine that it is slowing you down, the nice thing about immutable trees is that you can do structural sharing which means you have to copy only the path from the root to the updated node, and the rest of the tree can be shared between the old and new versions.
Breaking up the large pieces of functionality can be done by making a multi-pass compiler. Originally, compilers used multiple passes because the whole compiler, or the whole data set (or both) simply didn't fit into the extremely restricted memory of the early computers. But, it turns out that you can use multiple passes to simplify each individual pass.
Basically, each pass is like its own mini-compiler: it reads input in some language, does "stuff" with it, and writes output in some other "language". For example, the lexer pass reads C♯ and writes a token stream, the parser pass reads a token stream and writes a Parse Tree, the semantic analyzer pass reads a Parse Tree and writes an Abstract Syntax Tree, the typer pass reads an Abstract Syntax Tree and writes a Type-Annotated AST, and so on.
You can do this recursively on multiple scales, i.e. break the compiler up into a front end, a middle end, and a back end, break each of those again into multiple passes, and so on.
Don't forget that modern compilers must do a lot of stuff that older compilers typically didn't do: an IDE is kind-of a compiler, too (in fact, it does pretty much everything a compiler does, except the actual code generation, but it does lexing, parsing, type checking, type inference, name resolution, etc, all in order to support syntax highlighting, warnings, errors, quick fixes, refactorings, auto completion, documentation popups, and all those little light bulbs, squiggly lines, hints, and helpers that you are used to). So, why write two compilers? Why not use the same compiler for both?
In order to do that, the compiler needs to be able to process incomplete and invalid input (after all, you want to do all that while the programmer is typing, but while he is typing, the program 99.9% of the time doesn't compile; the compiler not only needs to deal with that, it also needs to be helpful). Also, re-compiling the entire project at each keystroke would be insane, so the compiler needs to be able to compile small pieces of code individually while the programmer types them and integrate that with its view of the rest of the code. The compiler needs to be asynchronous, concurrent, and re-entrant, since it not only runs in parallel with the rest of the IDE, but often even in parallel with itself (i.e. building the project while the programmer is already writing new code, which needs to be highlighted etc.)
These requirements are very different from a traditional batch compiler, yet, it makes sense to use the same compiler for "compiling" and the IDE: that way, they can never disagree and never get out of sync.
Here are a couple of pointers to compilers that are written in a somewhat non-traditional way that you won't find in text books, but that IMO improve maintainability and evolvability:
It is actually very close to a traditional compiler design, but its ASTs are completely immutable (and actually persistent). This allows multiple compilation processes to operate on the same AST without stepping on each other's toes, which is important if you integrate the compiler into an IDE (e.g. the syntax highlighter and the code style checker may traverse the AST at the same time that the code generator is building a solution).
Dotty is both a language that is intended to test out concepts for future versions of Scala, as well as a compiler that is intended to test out concepts for future versions of the Scala compiler. We will ignore Dotty the language for this question, only the compiler is relevant.
The Dotty compiler takes things even further than Roslyn when it comes to immutability. The Dotty compiler is built like a database, more precisely, like a temporal database.
See Martin Odersky's talk Compilers are Databases at the JVM Language Summit 2015 about the design of the dotty compiler.
The basic idea of the dotty compiler is that there is no mutable state. Everything is fully immutable and purely functional. This is achieved by taking ideas from purely functional (aka "temporal") databases. Data that in a traditional compiler would change over time (such as a symbol table) is instead represented as a pair of (timestamp, current_value), i.e. as values indexed by time. (They don't use actual time, though, rather a notion of time internal to the compiler, based on the run number and the compiler phase.)
In particular, this means that there is no symbol table. Instead, the role that the symbol table plays in a traditional compiler, is split across multiple data structures, all of which are immutable, some are time-invariant, some are time-varying. These are References, Denotations, and Symbols; the discussion of References starts around 30:30, the discussion of Denotations around 34:26, and the discussion of Symbols around 37:30.
Dotty also uses a comparatively large number of passes, compared to other industrial-strength production-quality compilers, with each individual pass being relatively simple. For performance reasons, it has a framework that can automatically fuse multiple passes back together into a single pass, but the important thing is that this large monsterpass was then automatically generated, not written by a human, and thus doesn't hurt maintainability.
Compilers written in Haskell
The Haskell community has a lot of very interesting approaches to various problems, and compilers are no exceptions. For example, structuring compilers as Monad Transformer Stacks, where each individual language feature is represented as a Monad Transformer, and thus the language can be composed of lots of little independent features. Or, the Idris compiler, which is written in Haskell, using an "Elaboration Monad" (you can roughly think of "Elaboration" as "Semantic Analysis"), with different language features written as "elaboration scripts" inside that monad.
The idea of nanopasses is basically: instead of breaking up a compiler into 2, 3, 10 passes, why don't we break it up into 20, 30, 100+ extremely simple, extremely small passes that each do one thing well and only one thing?
If you do that, you typically end up with a lot of code duplication: every pass takes in some input language, does a tiny thing and returns some output language. The code for parsing all those input languages and generating all those output languages is very repetitive, especially since the individual passes are extremely small, so the input languages of consecutive passes tend to be very similar.
That's what the nanopass framework is for, it contains two DSLs, and associated machinery. One DSL for defining languages, with support for defining languages differentially (IOW "inheriting" a language from another and only defining the things which changed). And one DSL for defining nanopasses, with support for only defining code for those parts of the language the pass is actually manipulating and automatically generating no-op "passthrough" code for the rest. As a result, typical language descriptions and passes are only a couple of lines of code each.
The nanopass framework was successfully used to re-architect the Chez Scheme compiler, this effort is described in Andy Keep's PhD thesis A Nanopass Framework for commercial compiler development and summarized in a short paper of the same name, and a talk given at ClojureConj 2013.