I'm writing some serialization code, and I'm wondering how to deal with binary data. As I'm doing it in Python, my goal is to make it very simple, not require a lot of programmer overhead, etc.
Three options I am considering:
The fields that will be binary data are represented as hex-encoded strings. Thus you'd have something like:
obj = { 'foo': 100, 'bar': [1, 2, 3, 4], 'baz': "ab0123ffbbaa55", } ObjectSpec.loads(ObjectSpec(obj).dumps())ObjectSpecis the class which determines how to serialize the object.- The pros: it's easy to look at, easy to make object literals, easy to print out.
- The cons: you have to remember to hex-encode the fields. If you have bytes, you have to hex-encode them before the serialization code then hex-decodes them. If you want to store the objects, there's more overhead unless you hex-decode the strings first.
The fields are byte strings, instead, e.g.:
obj = { 'foo': 100, 'bar': [1, 2, 3, 4], 'baz': '\xab\x01#\xff\xbb\xaaU', }- The pros: less overhead, both in space, and in not having to hex-encode if you already have bytes.
- The cons: harder to make literals, harder to print out. If you accidentally leave in a hex-encoded string then it will serialize the wrong thing (the hex representation instead of the thing itself).
The binary data fields use some custom type, e.g.
bson.Binary:from bson import Binary obj = { 'foo': 100, 'bar': [1, 2, 3, 4], 'baz': Binary('\xab\x01#\xff\xbb\xaaU'), }- The pros: Same as #2, but also clearly delineates binary types.
- The cons: Same as #2, except harder to accidentally encode the wrong thing. Requires wrapping the data in a type just to get the serialization code to accept it, instead of leaving bytes in.
What would the most sensible approach be? Is there another variant that is better?
bytes. You can create a byte array and wrapper that up inside a protobuf message if you need to preserve the existing format. However if you can modify the C++ code, then you could create new messages using the bindings (The bindings are auto-generatedclasses using getters/setters for int/bool/std::string/etc. )