5

I'm trying to parse through an AST of a program for a made up language, to be specific I'm trying to emulate the scope, so you enter a function for example and you push a new scope, and when the function is finished being visited by the visitor, it pops the scope. One important aspect is that when we push a new scope, there is a pointer currentScope that is set, which points to the scope we're currently looking at. When we pop the scope, this currentScope is set to be the "outer":

class Scope:
    outer : Scope
    inner : Scope

This is going to happen in multiple passes, but the first pass it's important that it constructs the general tree of scopes. The question I'm asking though is how can I traverse this tree in the same order it was created? For example:

{ // global scope
    { // a
        { // aa

        }
        { // ab
        }
    }
    { // b
    }
}

When I pass over the exact same set of nodes again, in theory they will give me the same tree of scope, but I want to preserve all of the data we collect and store each scope over each pass. In other words, when the second or third pass happens over the AST, when we visit a, currentScope = a, and when we visit aa, then currentScope = aa. Is this possible? I'm really confused with this idea, the whole recursive-y aspect is really messing with my head and I can't seem to figure out how to do this.

Here's what I've tried:

class Scope
    outer : Scope
    inner : Scope
    siblings : []Scope

    Scope(outer):
        this.outer = outer

push_idx = 0

push_scope()
    // set global scope
    if current is null
        global = new Scope(null)
        current = global
        return

    if current.inner is not null:
        // first pass over the AST
        if current_pass == 0:
            new_scope = new Scope(current)
            current.siblings.push(new_scope)
            current = new_scope
            return
        current = current.siblings[push_idx++]
    else:
        new_scope = new Scope(current)
        current.inner = new_scope
        current = current.inner

pop_scope()
    push_idx = 0
    current = current.outer

Though the order doesn't seem correct, and I'm fairly certain this is the wrong approach to this.

1

3 Answers 3

6

A data structure that's often used to track scope inside a compiler is a spaghetti stack, which is essentially a linked list data structure where each scope is a node storing a pointer to its parent scope. Whenever you enter a scope, you create a new node, point it to the enclosing scope, then store the node somewhere in the AST associated with that scope. As you walk the AST, your AST walker stores a pointer to the current scope node. When you enter a scope, you create a new scope node as described above. When you leave a scope, you change the pointer to point to the parent of the current scope. This ends up building a large inverted tree structure where each scope can trace its scope chain up to the root scope - the spaghetti stack.

Sign up to request clarification or add additional context in comments.

4 Comments

Oh wow, I figured there was some partially obscure data structure for this. So this will work once I've created a spaghetti stack for the AST, and I can reuse it and check all the existing data stored in each scope later on?
Yep, that's precisely what it's designed to do. Hope this helps!
That's wonderful, thank you so much! I've been scratching my head over this issue for a day or so now. Can I ask if you know this from personal experience or if there's some book or resource that talks more about these mystical data structures?
I first heard about spaghetti stacks when I taught a compilers course a few years ago. I have some slides on scope checking that might be a good starting point.
0

"Scope" is really a region of the program where all the identifiers in that region have constant meaning.

If your language has pure nested lexical scopes, you can model the set of scopes with a tree ("spaghetti" stack if you like), where each leaf contains a mapping from symbols introduced in that scope to their corresponding type information. This is what is classically taught in compiler classes.

But with more complex scoping rules (namespaces, using constructs, ...) in general you may need a graph whose leaves are the individual scopes with graph arcs representing relations between the scopes. Yes, one of those relations is usually "lexical parent". Other may include "inherits from", etc. You may also find that a name in a leaf mapping may by a type, it may in fact be an access path to an arbitrary other (leaf) scope in the graph.

(I build generic program analysis tool infrastructure [see bio]. We defined a graph-style symbol table API to support all the different scoping rules we have encountered. An interesting class of arc is "inherits from with priority N" for arbitary integer N; this lets us easily model ordered multiple inheritance offered by C++).

Comments

0

Perhaps you should give some thought to Segment tree:

  • each segment presents a scope (beginning of scope | ending of scope).
  • The tree structure would be according to the code hierarchy.
  • The tree's leafs would be the keywords in each scope.

Good luck!

1 Comment

Hmm, I don't know how useful would be representing the scope with a beginning and a end, since that the job the AST.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.