Serializing a Binary Search Tree

Question

How exactly does one go about serializing a BST? What is the correct way to do it in the most efficient way? Now, this is way too general, so let me explain what I mean.

Here is some pseudo-pseudocode :

public int[] serialize(root){
    preorder traversal 
    convert node to binary representation
    add the binary representation to an array
    send array via stream
}

Or

public int serialize(root){
    preorder traversal 
    convert node to binary representation
    send the binary representation via stream
}

My question is -- creating an array and sending it full of bits, is this efficient? Or should I skip the whole array idea and every time a node is converted, send it out to deserialize it? Perhaps both of these implementations are stupid. Any help would be appreciated.

The "best" solution may depend on how the tree is represented, so it's hard to say without knowing more about this tree. But I'd skip the array step if it's unnecessary (which it sounds like it is) and just serialize the bits out. — Cornstalks
– Cornstalks, Commented Aug 21, 2012 at 2:09
what do you mean how the tree is represented? it's a BST. smaller on left, larger on right — volk
– volk, Commented Aug 21, 2012 at 2:37
I mean how it's internally represented. You can represent BSTs as nodes and references (a la linked lists) or as arrays, for example (see Wikipedia). — Cornstalks
– Cornstalks, Commented Aug 21, 2012 at 2:51

as17237 · Accepted Answer · 2012-08-21 02:24:52Z

1

I would suggest that you also take a look at google protocol buffers https://developers.google.com/protocol-buffers/docs/overview

answered Aug 21, 2012 at 2:24

as17237

1191 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jamesatha · Accepted Answer · 2012-08-21 02:24:58Z

0

It depends on the tree and the type of data. If the order of nodes in the tree matter, you need to store enough information to recreate it. If it's in an array, you can use the position in the array to recreate the structure

answered Aug 21, 2012 at 2:24

jamesatha

7,66014 gold badges39 silver badges58 bronze badges

2 Comments

volk Over a year ago

isn't it obvious that the order of nodes matter? It's a BST.. (since traversing it in preorder outputs an exact copy when inserting into a new one) . but what i was really asking..was more about serialization i suppose..whether it matters if i send an array full of bits, or bits one at a time

jamesatha Over a year ago

there can be elements that are "equal" by the compareTo function and when recreating it, the order can be different. But you are right, the functionality will be the same

newID · Accepted Answer · 2013-01-25 14:23:57Z

0

BST can be only serialized in post-order, as pre-order and in-order are not unique.

1) non-unique in pre-order

      root                     root
    /     \                   / 
  left    right             left
                               \
                               right

2) non-unique in in-order

     1                 1
    /                   \    
   2                     2

answered Jan 25, 2013 at 14:23

newID

3432 silver badges9 bronze badges

1 Comment

David Airapetyan Over a year ago

Both examples are incorrect because they violate the BST definition. In the first example, we assume "right" is bigger than "root", so it is impossible to have a BST with "right" in the left subtree. In the second example, the first tree is not a valid BST because 2 cannot be a left child of 1.

jthill · Accepted Answer · 2013-01-25 18:46:51Z

If by "streams" you're talking about C++ iostreams, they're already buffered at a reasonable size and the cost of inserting into that buffer is very low. The standard library is mature; beating it at its own game is very hard. and you'll need exploitable specifics you can take advantage of to get anything worthwhile. That said:

How big your output buffer should be (with the degenerate case being a single-element buffer, i.e. no buffering) depends on the overhead of a buffer flush. That overhead will have a fixed cost and a size-related cost -- not a simple linear one given cache effects. With more expensive fixed overhead bigger buffers help amortize the fixed expense. For instance, if a buffer flush can trigger zero-copy I/O it can be dramatically cheaper to buffer all of a largish serialization, but if the output operation is going to be copying your source buffer, buffer sizes down around a quarter of your L1 cache size are a decent choice when the fixed cost of a flush is low.

None of this matters at all unless the time serialization takes puts it on a critical path, i.e. makes it what a user's waiting on -- for something like this, that's getting hard to produce unless you're talking about millions of items and up. Even then, if you haven't already worked on it it's almost certain there's more waste in how you produce an individual serialization than in the buffering scheme you choose -- and even then never forget what you're racing. Is it I/O bandwidth? Sending your serialized stream through a low-grade compressor could easily save more time than anything you could do up front.

Collectives™ on Stack Overflow

Serializing a Binary Search Tree

4 Answers 4

Comments

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related