protocol-buffer - Protocol Buffers Series (1) - What are Protocol Buffers? - 个人文章

What are Protocol Buffers?

Protocol Buffers is a multilingual, cross-platform, extensible mechanism for serializing structured data formats. Compared with some serialization tools such as XML, JSON, YAML, CSV, Protocol Buffers更简单、更快、更轻量。
We only need to define the structure of the data (message) once as we wish, and we can easily write between various data streams and various languages using specially generated code (either via the command line, via the Maven plugin) and read our structured data.

What problems do Protocol Buffers solve?

Protocol Buffers provide serialization formats for structured packets up to a few Mbytes in size.
By serializing (serializing) structured data , the function of data storage/RPC data exchange is realized.
Protocol Buffers can be extended with new information without invalidating existing data or requiring code updates.

Serialization : The process of converting a data structure or object into a binary string
Deserialization : The process of converting binary strings generated during serialization into data structures or objects

Protocol Buffers are the most commonly used data format at Google. Write Protocol Buffers messages and services through files with a .proto suffix.
An example message is shown below:

 message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;
}

The proto compiler is invoked when the .proto file is constructed to generate code in various programming languages to manipulate the corresponding Protocol Buffers.
Each generated class contains simple accessors and methods for each field to serialize and parse the entire structure into raw bytes. An example using these generation methods is shown below:

 Person john = Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("jdoe@example.com")
    .build();
output = new FileOutputStream(args[0]);
john.writeTo(output);

Advantages of Protocol Buffers

Protocol Buffers are great for any situation where you need to serialize structured data in a language-neutral, platform-neutral, and extensible way. They are often used to define communication protocols (along with gRPC) and data storage.

Compatibility across languages

At present, Google officially supports C++, C#, Java, Kotlin, PHP, Python, Ruby, etc. Others can be found on github and support plugins.

Cross-project support

By customizing message in the .proto file, we can use Protocol Buffers across projects, and at the same time, we can place files outside the project's codebase, such as the same directory as the Java folder. .
If we expect to define message types or enums that will be widely used outside of the immediate team, we can put them in their own files and not rely on them.

Support for updating Proto definitions without updating the code

Scenarios where Protocol Buffers are not applicable

Protocol Buffers tend to load messages into memory all at once and are no larger than an object graph. When our data reaches several megabytes, other solutions need to be considered, because when processing larger data, due to serialized copies, we will eventually get multiple copies of the data, which will lead to a surge in memory usage, because It is recommended not to exceed 1M.
对象图(Object graph) It can be simply understood as a relationship network composed of reference relationships between objects. In Java, the garbage collector basically uses an object graph to determine which instances in memory are still linked to an object, and which may be needed by the program, and which are no longer accessible and therefore can be deleted.
When Protocol Buffers are serialized, the same data can have many different binary serializations. Without fully parsing the two messages, there is no way to compare them for equality.
Messages are not compressed.
For many scientific and engineering applications involving large multidimensional arrays of floating point numbers, Protocol Buffers messages are not maximally efficient in size and speed. For these applications, FITS and similar formats have less overhead.
Protocol Buffers messages themselves do not have the ability to self-describe their data, but they have a fully reflective schema that you can use to achieve self-description. That is, you cannot fully interpret it without accessing its corresponding .proto file.

Summarized in the picture:

How do Protocol Buffers work?

The following diagram shows how data is processed using protocol buffers.

protocol处理数据

Serialization is fast

The encoding/decoding method is simple and done by bit operation.
Using PB's own framework code and compiler to complete

Serialization is small

Adopt unique encoding methods, such as Varint, Zigzag encoding methods, etc.
T-L-V data storage method: reduce the use of delimiters & compact data storage

refer to:
Overview of Protocol Buffers

Protocol Buffers Series (1) - What are Protocol Buffers?