5

The Python version of Google protobuf gives us only:

SerializeAsString()

Where as the C++ version gives us both:

SerializeToArray(...)
SerializeAsString()

We're writing to our C++ file in binary format, and we'd like to keep it this way. That said, is there a way of reading the binary data into Python and parsing it as if it were a string?

Is this the correct way of doing it?

binary = get_binary_data()
binary_size = get_binary_size()

string = None
for i in range(len(binary_size)):
   string += i

message = new MyMessage()
message.ParseFromString(string)

Update:

Here's a new example, and a problem:

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(data)

When we get to the foo_bar.ParseFromString(data) line, I get this error:

Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.

Update 2:

It turns out, that the padding on the binary data was throwing protobuf off; too many bytes were being sent in, as the message suggests (in this case it was referring to the padding).

This padding comes from using the C++ protobuf function, SerializeToArray on a fixed-length buffer. To eliminate this, I have used this temproary code:

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    string = ''
    for i in range(0, len(data)):
        byte = data[i]
        if byte != '\xcc': # yuck!
            string += data[i]

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(string)

There is a design flaw here I think. I will re-implement my C++ code so that it writes variable length arrays to the binary file. As advised by the protobuf documentation, I will prefix each message with it's binary size so that I know how much to read when I'm opening the file with Python.

1
  • I'm not exactly sure what you're trying to do with your loop, but you're going to raise a TypeError with that. You assign None to the name string, and then attempt to add a series of ints to it. In python, a string is a sequence of bytes, so any binary data should be safe in a string. Can you explain more clearly what SerializeAsString is doing wrong with your data? Commented Dec 7, 2009 at 14:30

2 Answers 2

4

I'm not an expert with Python, but you can pass the result of a file.read() operation into message.ParseFromString(...) without having to build a new string type or anything.

Sign up to request clarification or add additional context in comments.

Comments

4

Python strings can contain any character, i.e. they are capable of holding "binary" data directly. There should be no need to convert from string to "binary".

1 Comment

This is not true anymore for Python 3.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.