How are protocol-buffers faster than XML and JSON?

Question

I recently started reading and employing gRPC in my work.
gRPC uses protocol-buffers internally as its interface definition language.
I keep reading that protocol-buffers perform much better and faster than JSON and XML.

What I fail to understand is:

how do they do that?
What design in protocol-buffers actually makes them faster than XML and JSON?

Binary format. Less wasteful. At the cost of not being human-readable. — Sergio Tulentsev
– Sergio Tulentsev, Commented Sep 3, 2018 at 9:21
Protocol buffers uses an optimized binary format. Furthermore, the meta information defining what is in the message is not included in the message. E.g. if your message has a property named foo then this name is not part of the message. In XML and JSON you will include foo as a literal string for each occurrence of the property foo in the message. The result is that protocol buffer messages are very compact compared to the same messages in XML or JSON. — Martin Liversage
– Martin Liversage, Commented Sep 3, 2018 at 9:25
They have an really good explanation in their docs: developers.google.com/protocol-buffers/docs/overview Chapter 'Why not xml' — ChristianMurschall
– ChristianMurschall, Commented Sep 4, 2018 at 9:17
lol i cant believe i found this question from two years ago and just notice an edit was made an hour ago (a little while after i got here) — Deryck
– Deryck, Commented Aug 9, 2020 at 9:32

Oreo · Accepted Answer · 2025-04-07 20:22:57Z

54

String representations of data:

require text encode/decode (which can be cheap, but is still an extra step)
requires complex parse code, especially if there are human-friendly rules like "must allow whitespace"
usually involves more bandwidth - so more actual payload to churn - due to embedding of things like names, and (again) having to deal with human-friendly representations (how to tokenize the syntax, for example)
often requires lots of intermediate string instances that are used for member-lookups etc

Both text-based and binary-based serializers can be fast and efficient (or slow and horrible)... just: binary serializers have the scales tipped in their advantage. This means that a "good" binary serializer will usually be faster than a "good" text-based serializer.

Let's compare a basic example of encoding the integer 42:

JSON: {"id":42} (9 bytes assuming ASCII/UTF-8 and no whitespace)
XML: <id>42</id> (11 bytes assuming ASCII/UTF-8, no whitespace and no namespace noise)
protobuf: 0x08 0x2a (2 bytes)

Now imagine writing a general purpose XML or JSON parser, and all the ambiguities and scenarios you need to handle just at the text layer, then you need to map the text token "id" to a member, then you need to do an integer parse on "42".

In protobuf, the payload is smaller, plus the math is simple, and the member-lookup is an integer (so: suitable for a very fast switch/jump).

edited Apr 7 at 20:22

Oreo

5995 silver badges17 bronze badges

answered Sep 3, 2018 at 10:51

Marc Gravell

1.1m273 gold badges2.6k silver badges3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Marc Gravell Over a year ago

btw: in case anyone is unsure whether 0x08 0x2a represents the same scenario: protogen.marcgravell.com/decode?hex=082a

Rick Over a year ago

btw: encoding reference developers.google.com/protocol-buffers/docs/encoding

Bob Horn Over a year ago

My guess as to how 0x08 0x2a translates to id of 42: the 0x08 is the position of the property in the type. That means, instead of specifying the property name, like id, or name, etc., we specify the position. If id is the 4th property, then that's how we specify that property. And the 0x2a is the value, which is 42. (Although 0x08 maps to a backspace, I'm assuming that really wouldn't be valid here, but was just used as an example?)

Marc Gravell Over a year ago

@BobHorn the first byte (could be multiple bytes, if the MSB is set) is a protobuf field header, using varint packing; the low 3 bits are the wire-type - varint in this case; the remaining bits give the field number, 1 in this case. So this is "field 1, varint"

Bob Horn Over a year ago

Ah, so 0x08 converts to 00001000 in binary. If the low 3 bits (the 000 on the right) are the wire type, then what's left is 00001, which is 1. Got it. Thanks, @MarcGravell!

Andriy Plokhotnyuk · Accepted Answer · 2018-09-04 06:46:32Z

4

While binary protocols have an advantage in theory, in practice, they can lose in performance to JSON or other protocol with textual representation depending on the implementation.

Efficient JSON parsers like RapidJSON or jsoniter-scala parse most JSON samples at speed 2-8 cycles per byte. They serialize even more efficiently, except some edge cases like numbers with floating points when serialization speed can drop down to 16-32 cycles per byte.

But for most domains which don't have a lot of floats or doubles their speed is quite competitive with the best binary serializers. Please see results of benchmarks where jsoniter-scala parses and serializes on par with Java and Scala libraries for ProtoBuf:

https://github.com/dkomanov/scala-serialization/pull/8

edited Sep 4, 2018 at 6:46

answered Sep 4, 2018 at 6:40

Andriy Plokhotnyuk

8,0432 gold badges46 silver badges69 bronze badges

Comments

BlakeStone · Accepted Answer · 2018-12-20 00:51:20Z

1

I'd have to argue that Binary Protocols will typically always win in performance vs text based protocols. Ha, you won't find many (or any) video streaming applications using JSON to represent the frame data. However, any poorly designed data structure will struggle when being parsed. I've worked on many communications projects to where the text based protocols were replaced with "binary protocols".

answered Dec 20, 2018 at 0:51

BlakeStone

194 bronze badges

1 Comment

gravetii Over a year ago

Relevant, but doesn't answer the question.

Collectives™ on Stack Overflow

How are protocol-buffers faster than XML and JSON?

3 Answers 3

5 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related