0

Alternative headline: traverse all nodes in a tree between two nodes.

I have a DOM tree (an ordered tree, where child elements at each level have a specific order). I am given start and stop nodes in the tree. The start is guaranteed to be before the stop in a depth-first traversal of the tree. I want to find/traverse all the nodes between these two efficiently. This means that if I can determine that a node between the two does NOT include the stop node, I do not want to recurse through all its descendants.

Is this a well-known (named?) problem, that has algorithms for the traversal, that perhaps computes shared ancestors or such to determine whether traverse a node's descendants?

Example 0 (same level):

<body>
  <p>ignore this</p>
  <div id="start">start here</div>
  <p id="A">visit A</p>
  <p id="B">visit B</p>
  <p id="C">visit C <span>but <b>not</b> <i>all this</i></span></p>
  <div id="stop">stop here</div>
  <p>ignore this</p>
</body>

In Example 0, I want to visit the following, in order:

  • #A
  • #B
  • #C (but not all its descendants)

Example 1 (stop level is deeper):

<body>
  <p>ignore this</p>
  <div><h2><span id="start">start here</span> A1</h2> A2</div>
  <section id="B"><many><more><levels><of><content>…</many></section>
  <section id="C">
     <div id="D"><many><more><levels><of><content>…</many></div>
     <div id="E">
       <div id="F"><many><more><levels><of><content>…</many></div>
       <div id="stop">stop here</div>
       <p>ignore this</p>
     </div>
  </section>
  <p>ignore this</p>
</body>

In Example 1, I want to visit the following, in order:

  • the TextNodes A1 and A2
  • #B (but not all its descendants)
  • #C (or maybe not?)
  • #D (but not all its descendants)
  • #E (or maybe not?)
  • #F (but not all its descendants)

Example 2 (stop is higher):

<body>
  <p>ignore this</p>
  <div>
    <h2>
      <span id="start">start here</span>
      <div id="A">visit A</div>
      <section id="B"><many><more><levels><of><content>…</many></section>
    </h2>
    <section id="C"><many><more><levels><of><content>…</many></section>
  </div>
  <section id="D"><many><more><levels><of><content>…</many></section>
  <div id="stop">stop here</div>
  <p>ignore this</p>
</body>

In Example 2, I want to visit the following, in order:

  • #A
  • #B (but not its descendants)
  • #C (but not its descendants)
  • #D (but not its descendants)

1 Answer 1

0

Here's an algorithm that seems to accomplish the goal, in pseudocode:

results   = []
ancestors = get_ancestors_of(stop_node)
current   = start_node

while current and current != stop_node:
    if current in ancestors:
        # if this node contains the stop node, dive in
        current = current.children[0]
    else:
        if current != start_node:
            results.add(current)

        if current.next_sibling:
            current = current.next_sibling

        else:
            # pop up until you find the next "aunt" node
            while current and !current.next_sibling:
                # assumes the root node returns null/false for its parent
                current = current.parent

            if current:
                current = current.next_sibling

return results

Here it is in Ruby, as a method that takes a block and yields nodes as it finds them:

# Traverse nodes between two in the tree, yielding each to the block
# but without traversing the descendants that are wholly between the two.
def traverse_between(start_node, end_node)
    # Get all ancestors of the end_node
    end_node_ancestors = end_node ? end_node.ancestors.to_set : Set.new

    current = start_node

    while current && current != end_node
        # Traverse children if the end node is in there somewhere.
        # We don't yield ancestors of the end_node, since
        # they implicitly contain the end_node and possibly more.
        if end_node_ancestors.include?(current)
            current = current.element_children.first
        else
            yield current unless current == start_node

            if current.next_sibling
                current = current.next_sibling
            else
                # Move up to the next ancestor with a sibling; stop if we reach the root
                while current && !current.next_sibling
                    current = current.parent
                    return if current.is_a?(Nokogiri::XML::Document)
                end
                current = current.next_sibling if current
            end
        end
    end
end
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.