6

I am interested in methods for serializing and deserializing a binary tree in breadth-first order. The first part, serialization, is equivalent to performing a levelorder traversal where we preserve positioning information via the appropriate placing of null characters. Define a binary tree type:

data Tree a = Nil | Node a (Tree a) (Tree a)

Then an example of such a serialization is:

[2,1,3,#,0,#,#]

which serializes a binary tree of the form:

Node 2 (Node 1 Nil (Node 0 Nil Nil)) (Node 3 Nil Nil)

There is a gist on Github that details a clever way to do this in Haskell making use of laziness and infinite streams.

For serialization, there is a recursive version and a corecursive version. The corecursive version is as follows, adapted from that gist discussion:

bflist' :: Tree a -> [Maybe a]
bflist' t = map toMaybe q
  where
    q = t : go 1 q

    go :: Int -> [Tree a] -> [Tree a]
    go 0 _  = []
    go _ [] = []
    go i (Nil        : q') = Nil   : go (i-1) q'
    go i (Node _ l r : q') = l : r : go (i+1) q'

    toMaybe Nil          = Nothing
    toMaybe (Node x _ _) = Just x

There might be something a bit wonky about it because it creates more Nothings than I think it should. But I believe the general approach is sound.

The recursive version is the following:

-- | A 'fill-in-at-the-end' version of `zipWith`.
zipWith' :: (a -> a -> a) -> [a] -> [a] -> [a]
zipWith' _ xs [] = xs
zipWith' _ [] xs = xs
zipWith' f (x:xs) (y:ys) = f x y : zipWith' f xs ys

levelorder' :: Tree a -> [[Maybe a]]
levelorder' Nil = [[Nothing]]
levelorder' (Node a l r) = [Just a] : zipWith' (++) (levelorder' l) (levelorder' r)

serialize :: Tree a -> [Maybe a]
serialize = concat . levelorder'

To me, this seems very nice because writing a levelorder traversal of the form Tree a -> [[a]] is very basic, and this is a trivial modification of that. But that's just my opinion.

For deserialization, however, I don't know how to do it other than the way given in the gist, which is corecursive:

data SS a = a :< SS a
infixr 5 :<

-- | Build a complete binary tree from a list of its breadth-first
-- traversal.
bft :: [a] -> Tree a
bft xs = tree
  where
    -- `subtrees` is a stream of all proper subtrees of the result tree,
    -- in breadth-first order, followed by infinitely many empty trees.
    -- We form each tree in the result by combining a label from the input
    -- list with consecutive subtrees.
    tree :< subtrees = go xs subtrees

    go :: [a] -> SS (Tree a) -> SS (Tree a)
    go (a : as) ys = Node a b1 b2 :< go as bs
      where
        {-# NOINLINE b2bs #-}
        b1 :< b2bs = ys
        b2 :< bs = b2bs
    go [] _ = fix (Nil :<)

I would like to know whether there is a recursive analogue of this, or if this is the best we can do.

I believe there is some sort of pointer-based imperative analogue, but I have long since forgotten it. In this question, I am looking for a non-pointer based version if it's possible, because Haskell's lists don't play well with indexing and I'd like to know if it's feasible to stay away from Seq or Vector.

1 Answer 1

5

A recursive version is going to look roughly like this:

parse :: M (Tree a)
parse = do
  token <- next
  case token of
    Nothing -> pure Nil
    Just x -> Node x <$> parse <*> parse

where M is a monad and next :: M (Maybe a) gives you the next token of the input stream. Of course, as it is, that is only a parser for a DFS traversal. The challenge for a BFS traversal is that, in the Just branch, we want to start parsing the siblings of the current node (with label x) before making the recursive calls to parse.

A traditional parser monad just has one stream as an input and output. Here we want our computation to output the stream right after next, to parse the siblings of the current node by some external computation, and then come back with another stream as an input. At the next level, we will parse the next two nodes, and then output the stream again so their siblings can be parsed, come back with yet another stream, and so on. So we will define a parser monad that uses a stream of streams as input and output.

It is just a specialized state monad:

data Stream a = a :< Stream a

-- State (Stream [Maybe a]) b
newtype TreeParser a b = TreeParser { runTreeParser :: Stream [Maybe a] -> (Stream [Maybe a], b) }

instance Monad (TreeParser a) where ...
instance Applicative (TreeParser a) where ...
instance Functor (TreeParser a) where ...

There will be two primitives: next and later.

The next operation removes the first element from the top stream:

next :: TreeParser a (Maybe a)
next = TreeParser $ \ss -> case ss of
    (x : s) :< ss' -> (s :< ss', x)
    _ -> (ss, Nothing)

The later operator is where the magic happens. It transforms a computation so that it sees only on the tail of the stream of streams. We think of it as "postponing" the operation: the current (top) stream passes through unchanged, and it's only when we come back at the next level that the computation starts working.

This makes crucial use of laziness (in the let pattern): this function manifestly outputs s, ss', and x, but s will be used first to parse the current level of the tree, it will be fed back for the next level as part of the input ss, and only then the head of x will be produced along with the head of ss', and so on, with successive levels of x needing successive levels of ss.

later :: TreeParser a b -> TreeParser a b
later p = TreeParser $ \(s :< ss) -> let (ss', x) = runTreeParser p ss in (s :< ss', x)

With those primitives in place, we can define our parser. The core loop, parse, is almost identical to the draft version earlier; the key difference is the later around the recursive calls. To run the parser, its input state is defined recursively. The head is the input stream s :: [Maybe a]. The tail is the output state from running the parser: the streams are shifted down to the next level.

parseTree :: [Maybe a] -> Tree a
parseTree s = t
  where
    (ss, t) = runTreeParser parse (s :< ss)
    parse :: TreeParser a (Tree a)
    parse = do
        token <- next
        case token of
            Nothing -> pure Nil
            Just x -> later (Node x <$> parse <*> parse)

That's all.

main :: IO ()
main = print (parseTree [Just 2, Just 1, Just 3, Nothing, Just 0, Nothing, Nothing])

-- Output: Node 2 (Node 1 Nil (Node 0 Nil Nil)) (Node 3 Nil Nil)

Full code:

{-# LANGUAGE BangPatterns #-}
import Control.Monad (liftM, ap)

data Tree a = Nil | Node a (Tree a) (Tree a)
  deriving Show

data Stream a = a :< Stream a

-- State (Stream [Maybe a]) b
newtype TreeParser a b = TreeParser { runTreeParser :: Stream [Maybe a] -> (Stream [Maybe a], b) }

instance Monad (TreeParser a) where
    u >>= k = TreeParser $ \s ->
        let !(s', x) = runTreeParser u s in
        runTreeParser (k x) s'
instance Applicative (TreeParser a) where
    pure x = TreeParser $ \s -> (s, x)
    (<*>) = ap
instance Functor (TreeParser a) where
    fmap = liftM

next :: TreeParser a (Maybe a)
next = TreeParser $ \ss -> case ss of
    (x : s) :< ss' -> (s :< ss', x)
    _ -> (ss, Nothing)

later :: TreeParser a b -> TreeParser a b
later p = TreeParser $ \(s :< ss) -> let (ss', x) = runTreeParser p ss in (s :< ss', x)

parseTree :: [Maybe a] -> Tree a
parseTree s = t
  where
    (ss, t) = runTreeParser parse (s :< ss)
    parse :: TreeParser a (Tree a)
    parse = do
        token <- next
        case token of
            Nothing -> pure Nil
            Just x -> later (Node x <$> parse <*> parse)

main :: IO ()
main = print (parseTree [Just 2, Just 1, Just 3, Nothing, Just 0, Nothing, Nothing])

-- Output: Node 2 (Node 1 Nil (Node 0 Nil Nil)) (Node 3 Nil Nil)
Sign up to request clarification or add additional context in comments.

2 Comments

Upvoted because this is clearly a very high-effort answer. But isn't (ss, t) = runTreeParser parse (s :< ss) corecursion? You are building an infinite stream up from the base case s.
For me the point is that parse makes two recursive calls to parse the children of the current tree only. The rest is scaffolding to support that. If you prefer I also have a solution that doesn't build up an infinite stream of states, that uses the same code for parse; instead it makes the computation itself into an infinite tree.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.