4

This has bothered me for a long time. I want an iterator over a binary tree (or similar nested structure) that's efficient, simple, and pythonic. For example for usages like these:

for value in values(root):
    do_something_with_value(value)

print(sum(values(root)))

Here root is the tree's root node, and nodes have a .value, .left and .right. And values is supposed to give me the desired iterator/iterable over the tree's values.

Let n be the number of nodes in the tree and h be the height of the tree.

Attempt 1: Too slow

def values(root):
    if root:
        yield from values(root.left)
        yield root.value
        yield from values(root.right)

Simple, pythonic, lazy, and takes only O(h) space. This should be it. But... it's stacked generators and every single value gets yielded through the entire stack of generators. So the whole iteration takes O(n2) time in the worst case and O(n log n) time even in the best case.

Attempt 2: Too complicated

def values(root):
    stack = []
    while root or stack:
        while root:
            stack.append(root)
            root = root.left
        root = stack.pop()
        yield root.value
        root = root.right

Iterative with a stack of nodes. Only takes O(h) space and O(n) time, but it's much more complicated than the above totally natural recursion.

Attempt 3: Too big

def values(root):
    result = []
    def collect_values(root):
        if root:
            collect_values(root.left)
            result.append(root.value)
            collect_values(root.right)
    collect_values(root)
    return result

This collects all values in a semi-global list. Natural recursion and O(n) time, but sadly O(n) space and isn't lazy.

Attempt 4: Can't get it to work

Instead of a semi-global list I thought maybe I could abuse a semi-global generator. As a sort of pipe from inside the recursion directly to the outside. The recursion would send values into it, and the outside could fetch them. Something like this:

def values(root):
    pipe = magic_generator_pipe()
    def iterate(root):
        if root:
            iterate(root.left)
            pipe.send(root.value)
            iterate(root.right)
    iterate(root)
    yield from pipe

But I can't get it to work and I'm not sure it's possible.

Attempt 5: Maybe

Something with threading or asyncio? One more idea I have is that the values function starts a new thread. This thread recursively iterates the tree and communicates the values to the main thread in the values function, which yields them to the original outside caller. And they lock each other appropriately. But I don't have much experience with this.

The question

Is there a way to achieve all I want?

  1. Pythonic lazy iterator
  2. O(n) total time
  3. O(h) space
  4. Natural recursion

So essentially I want something like Attempt 1, but fast. Because I've used recursive yield from for several problems and always feel bad about the time complexity.

To clarify: With "too complicated" I really meant the iterative algorithm (it's not that complicated, but compared to the natural recursion it is). A solution that's "complicated" only in a "technical" way (like with an extra thread or @chepner's trampoline idea) would still be interesting. I just insist on the natural recursion (or something similarly algorithmically simple) and the other three goals.

3
  • This might sound even more complicated, but the complicated part is at least isolated in a single, reusable piece. Write values as a tail-recursive function, then use a trampoline (for example, github.com/0x65/trampoline) to implement tail-call optimization. (Although, you would want the accumulator to be an iterator, rather than a list. I think that's possible, though I haven't thought it through. The cost of chaining together a bunch of single-item (or at least small) iterators might defeat the purpose of this approach.) Commented Jan 18, 2020 at 13:56
  • @chepner Hmm, can we do tail-recursion with my two recursive calls? About such extra stuff defeating the purpose: With my "too complicated" I really meant the iterative algorithm (it's not that bad, but compared to the natural recursion it is). A solution that's "complicated" only in a "technical" way (like with an extra thread or a trampoline) would still be interesting. I just insist on the natural recursion (or something similarly algorithmically simple) and the other three stated goals. Commented Jan 18, 2020 at 14:10
  • Hm. There's also eli.thegreenplace.net/2017/…, which explains how to use CPS to convert a natural implementation of Fibonacci numbers into a tail-recursive function, but sadly doesn't provide an actual implementation of the algorithm like my previous link. Maybe if you're up to writing the complicated part yourself... :) Commented Jan 18, 2020 at 14:34

2 Answers 2

2

My approach may not satisfy you since it's an inversion of what you are doing, namely using a callback. You pass to a function that is able to iterate the structure a callback method to be invoked for each value encountered. In this example, function walk is called with callback function. In this case the structure is a tree that is walked and for each node value, the callback function is invoked.

def walk(root, callback):

    def tree_walk(node):
        if node['left']:
            tree_walk(node['left'])
        callback(node['value'])
        if node['right']:
            tree_walk(node['right'])

    tree_walk(root)


node4 = {'value': 4, 'left': None, 'right': None}
node3 = {'value': 3, 'left': node4, 'right': None}
node2 = {'value': 2, 'left': None, 'right': None}
node1 = {'value': 1, 'left': node2, 'right': node3}

def do_something_with_value(value):
    print('value:', value)

walk(node1, do_something_with_value)

Prints:

value: 2
value: 1
value: 4
value: 3
Sign up to request clarification or add additional context in comments.

2 Comments

Right. I've actually done this before. Though it's indeed not really what I'm looking for. As stated I'd like an an iterator/iterable. So that I can use it as ubiquitously as other iterators. In my toy example, your way perfectly replaces the for loop. But what if the loop body is not just a function call with the value as the only argument, or if I want to use it in other ways like sum(values(root))? I don't see it work then, at least in the sum case.
Note: I edited the question to clarify what I mean, adding the sum example but keeping the for-loop example as it was so that your answer remains a perfect replacement for that case.
2

Yes, it's possible. I managed to get a combination of attempts 4 and 5 working. I use a queue.Queue as a "pipe" between a thread traversing the tree and the main thread yielding the values to the caller:

def values(root):
    q = Queue(40)
    def dfs(root):
        if root:
            dfs(root.left)
            q.put(root.value)
            dfs(root.right)
    def go():
        dfs(root)
        q.put(stop)
    stop = object()
    Thread(target=go).start()
    while True:
        value = q.get()
        if value is stop:
            return
        yield value

Benchmark with a linear tree of 10000 nodes (every node only has a left child, so tree height is 10000):

4184 ms  attempt1_recursive_generator
   2 ms  attempt2_iterative
   9 ms  attempt3_build_list
  69 ms  attempt5_threading

So it's indeed much faster than the quadratic-time recursive generator, but also much slower than the iterative and list-builders solutions.

Note I allowed the queue to hold maxsize=40 values before switching the thread. With maxsize=1 it takes about 320 ms due to the overhead of constantly switching. At around maxsize=40, the overhead becomes insignificant.

So while it achieves what I wanted and shows that it's possible, I'm now interested in reducing the overhead further.

Benchmark code (Try it online!):

from threading import Thread
from queue import Queue
from timeit import default_timer as timer
import sys

class Node:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

def attempt1_recursive_generator(root):
    def values(root):
        if root:
            yield from values(root.left)
            yield root.value
            yield from values(root.right)
    return values(root)

def attempt2_iterative(root):
    def values(root):
        stack = []
        while root or stack:
            while root:
                stack.append(root)
                root = root.left
            root = stack.pop()
            yield root.value
            root = root.right
    return values(root)

def attempt3_build_list(root):
    def values(root):
        result = []
        def collect_values(root):
            if root:
                collect_values(root.left)
                result.append(root.value)
                collect_values(root.right)
        collect_values(root)
        return result
    return values(root)

def attempt5_threading(root):
    def values(root):
        q = Queue(40)
        def dfs(root):
            if root:
                dfs(root.left)
                q.put(root.value)
                dfs(root.right)
        def go():
            dfs(root)
            q.put(stop)
        stop = object()
        Thread(target=go).start()
        while True:
            value = q.get()
            if value is stop:
                return
            yield value
    return values(root)

funcs = [
    attempt1_recursive_generator,
    attempt2_iterative,
    attempt3_build_list,
    attempt5_threading,
]

sys.setrecursionlimit(11000)

root = None
for value in range(10000):
    root = Node(value, root)

expect = None
for func in funcs:
    t0 = timer()
    result = list(func(root))
    t = timer() - t0
    if expect is None:
        expect = result
    else:
        assert result == expect
    print('%4d ms ' % (t * 1e3), func.__name__)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.