Protocol Buffers: A Modern Data Serialization Method

Published in

codeburst

5 min readJul 14, 2020

In this blog, I’m going to explore the modern method for data serialization: protocol buffers (usually referred to as Protobuf). A protocol buffer is a binary communication format designed by Google that allows us to serialize and deserialize structured data.

But wait, the above tasks can also be done by other formats such as JSON or XML, so why did Google choose to design a new communication format? As we all know, almost all big tech giants are majorly focused on high performance and optimized speed. Due to the tremendous popularity of microservices architecture system, it’s been very difficult to manage the communication between thousands of services using text-based communication format as services generate thousands of requests to each other, load a network, and require a lot of resources. This is why we need a fast way to serialize transferring compact data between services. In this scenario Protocol buffers can save us a lot of time, money, and resources.

It is important to note that, although JSON and Protobuf do the same job, these technologies were designed with different goals and approaches in mind.

Protocol buffers were designed to be faster than JSON & XML by removing many responsibilities performed by these formats and focusing solely on the ability to serialize and deserialize data as fast as possible. Another important optimization is regarding how much network bandwidth is being utilized by making the transmitted data as small as possible.

How are Protobuf’s faster than other communication formats?

Data transmitted during communication is in the form of binary which improves the speed of transmission compared to JSON’s string format.
Let’s take a look at the following example to get a clear understanding of this:

{
 "status":"success",
 "message":"found"
}

In the above JSON object, there are a total of 38 characters including spaces and characters like { } , "" : which don’t possess any kind of informational data. So finally we have 2 curly brackets, 8 quotation marks, 2 colons, and 1 comma which added up to 13 characters, and keywords of JSON object occupies a total space of 6+7 = 13 characters whereas the information value of JSON object occupies 7 +5= 12 characters.

After summing up the result we get the following information:

JSON object length: 38 bytes
Informational length: 12 bytes
Non-Informational length: 26 bytes [“WASTAGE”]

PROTOBUF to the rescue!
According to google:

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data.

Protocol buffers help us to define the structure of the models once and then use generated source code to easily write and read structured data to/from a variety of data streams using a variety of languages.

How can we define the structure of models?
The structure of the models is defined in a unique .proto file and compiled with protoc command which generates source code that can be invoked by a sender or recipient of these model structures. We’ll discuss protoc later, but first, let us dive deep into this through an exercise.

Let’s say we want to create a service that takes user_id as a request parameter and returns the user details.

user.proto

syntax = "proto3"; tell’s the compiler the version of protocol buffer.
In proto file, we can also define the RPC service interface and protoc(compiler) will generate service interface code and stubs in your chosen language.
In the proto file, we can see that the structure of the model is declared with a message keyword followed by the user-defined message name. Here you can make an analogy that service is equivalent to class, and rpc is equivalent to functions, and message is equivalent to parameters/arguments in programming languages.
In the message body, we can see the fields are defined with their respective types which are associated with a unique integer number.
These field numbers are used to identify your fields in the message binary format, and should not be changed once your message type is in use. Note that field numbers in the range 1 through 15 take one byte to encode, including the field number and the field’s type (you can find out more about this in Protocol Buffer Encoding). Field numbers in the range 16 through 2047 take two bytes. So you should reserve the numbers 1 through 15 for very frequently occurring message elements.

Now, we’re going to generate interface code using protoc for our user.proto file in our desired language. Here I’m using Golang for generating source code.

protoc --proto_path=src --go_out=. user.proto

The first argument --proto_path is the place where the output files are saved, --go_out means that we need the output as Golang to the user-defined directory. The last param is the path of the .proto file.
We can also generate code for other languages like JAVA, Python etc, we just have to replace “--go_out” with --java_out or --python_out.
The above command will generate a user.pb.go that implements all messages as Golang structs and types:

It would be very difficult to explain the implementation of generated source code in the same blog, you can take a reference from here.

The serialization and deserialization is processed by the proto package, which provides Marshal and Unmarshal functions:

Let’s say our user doesn’t exist then serialized data you got will be looks like this:-
JSON O/P:- {"status":"failure", "message": "not found"}

SERIALIZED O/P (line:- 24):-

The serialized data contains only 20 bytes.
After summing up the result we get the following information:-

Serialized data length: 20 bytes
Informational length: 16 bytes
Non-Informational length: 4 bytes [“WASTAGE”]

Conclusion

In large microservices architecture projects, JSON is not the best method for data serialization. Instead, Protocol Buffers are a great option. I hope that you find this article useful and informative, stay tuned for next time.

codeburst

Protocol Buffers: A Modern Data Serialization Method

How are Protobuf’s faster than other communication formats?

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in codeburst

Written by Abhi Khandelwal

No responses yet