Deserializing protobufs without source file

Question

Is it possible to deserialize a protobuf message without access to the source .proto file/generated classes? My source system generates messages using Ruby and consumes in Java. The source system can create new message formats that the consumer has no easy way of knowing about. Alternatively, what's the best way for the consumer to get access to the proto classes? Is it possible have some kind of a proto repository?

What do you expect to do with the proto messages whose meaning you haven't been told? Are you just passing them along elsewhere? — Louis Wasserman
– Louis Wasserman, Commented Feb 6, 2016 at 0:03
Pretty much - either pass them along or persist them somewhere. — mottosan
– mottosan, Commented Feb 6, 2016 at 0:05
then why do you have to deserialize them at all? You can always just store the raw bytes. (Deserializing them will be very hard or impossible, but it's not clear that you actually have to.) — Louis Wasserman
– Louis Wasserman, Commented Feb 6, 2016 at 0:06
The idea is to store it in a hive table - all methods seem to require access to the source proto to deserialize. — mottosan
– mottosan, Commented Feb 6, 2016 at 0:10
it's not clear what you'd expect to be able to do with data whose meaning you haven't been told. You could send around proto descriptors, which describe a protobuf message format as a proto itself, I suppose. — Louis Wasserman
– Louis Wasserman, Commented Feb 6, 2016 at 0:19

Kenton Varda · Accepted Answer · 2016-02-06 21:03:14Z

Without the schema (.proto file or compiled Descriptor), you can only decode a Protobuf into a series of tag/value pairs, where the tags are numeric and the values have limited type information. This may be enough for a human to reverse-engineer the protocol but usually isn't useful for code.

It is possible to send the schema along with the payload by sending along a FileDescriptorSet (basically, a compiled version of the relevant .proto files), as described here:

https://developers.google.com/protocol-buffers/docs/techniques#self-description

However, this is not as useful as it sounds! A FileDescriptorSet will allow you to determine the names and types of fields, but that doesn't mean your code will know what to do with them.

That said, there are some possible use cases:

You could have a proxy that translates the message to JSON based on the schema.
You could have a storage system which parses the message in order to store it in a different form. For example, it might transpose rows and columns for better compression, or it might do some sort of indexing on fields.

However, in these use cases I would generally recommend that the proxy or storage system be pre-configured with the necessary schemas rather than sending them with every message, as schemas tend to be pretty big.

Collectives™ on Stack Overflow

Deserializing protobufs without source file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related