java - Protocol Buffers Series (3) - proto2 .proto Syntax Guide - 个人文章

This article describes how to use the Protocol Buffers language to construct protocol buffer data, including .proto文件 syntax and how to generate data access classes from .proto文件 . It covers the proto2 version of the Protocol Buffers language.

This article is just a reference guide, and a Java language tutorial will follow.
How to use this guide? When you encounter it at work, you can find the required knowledge points by querying keywords.

Define a Message type

First we start with a simple example, suppose we want to build a search request message format.
Each searched message includes three parameters:

string to query
specified page number
and the number of results

The following is the message format corresponding to this .proto

 message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

SearchRequest消息 definition specifies three fields (also called names or key-value pairs), each with a name and a type.

the specified field type

In the above example, you can see that there are two types: two integers (page number and the number of results per page), and strings (query conditions).
Of course, we can also define fields as composite types, including enumerations and other message types.

Assign field numbers

In the above example, you can see that each field has a unique number, which is used to identify our field in the binary message, which is equivalent to the alias of the field.
Fields in the range 1 to 15 are represented by one byte .
Fields in the range of 16 to 2047 are represented by two bytes .
Therefore, in order to improve performance, we try to keep the fields that appear very frequently in the range of 1~15.

The smallest number is 1, and the largest number is 2^29 - 1 , which is 536,870,911.
Note that numbers 19000 到 19999 cannot be used because they are reserved for the protocol buffer implementation - if one of these reserved numbers is used in the .proto, the protocol buffer compiler will complain.
The number in the same message cannot be repeated, which also needs attention.

Specify Field Rules

required: It is simply understood as a required field, and the number is 1.
optional: optional, the number cannot exceed 1.
repeated: The modified field can be repeated any number of times, including 0 times. The order of repeated values is also recorded.

For historical reasons, repeated fields of scalar numeric types (eg, int32, int64, enum) are not as efficient as they should be. New code should use the special option [packed=true] for more efficient encoding. E.g:

 repeated int32 samples = 4 [packed = true];
repeated ProtoEnum results = 5 [packed = true];

Packed is a compressed field, which will be explained in subsequent articles.

required also means forever, so when we set the field, we need to consider the scope of the field as much as possible.

Add more message types

We can set multiple message types in a .proto file.

 message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

message SearchResponse {
 ...
}

While it is possible to define multiple message types (such as messages, enums, and services) in a single .proto file, it can also lead to dependency bloat when a large number of messages with different dependencies are defined in a single file. The official recommendation is to include as few message types as possible in each .proto file, but this value does not give a specific range and needs to be evaluated according to the actual situation.

reserved text

When we update the message structure, delete a field or comment out a field, a future user can use the number corresponding to the field we deleted, which is no problem. However, once the old version before the modification of .proto is loaded later, problems will occur due to numbering conflicts, resulting in data confusion, so reserved fields can be introduced.

 message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

The reserved field number ranges include: 2, 15, 9, 10, 11. 9 to max can also be used to reserve all subsequent numbers.
Note that you cannot mix field names and field numbers in the same reserved statement.

What is automatically generated based on the .proto file?

For java, the compiler generates a .java file that contains classes for each message type, as well as a special Builder class for creating instances of message classes.

Type comparison table

.proto type	describe	java type
double		double
float		float
int32	Use variable length encoding. Encoding negative numbers is inefficient - if your field may have negative values, use sint32 instead.	int
int64	Use variable length encoding. Encoding negative numbers is inefficient - if your field may have negative values, use sint64 instead.	long
uint32	Use variable length encoding.	int
uint64	Use variable length encoding.	long
sint32	Use variable length encoding. Signed int value. These encode negative numbers more efficiently than regular int32.	int
sint64	Use variable length encoding. Signed int value. These encode negative numbers more efficiently than regular int64.	long
fixed32	Always 4 bytes. More efficient than uint32 if the value is generally greater than 2^28.	int
fixed64	Always 8 bytes. More efficient than uint64 if the value is generally greater than 2^56.	long
sfixed32	Always 4 bytes.	int
sfixed64	Always 8 bytes.	long
bool		boolean
string	Strings must always contain UTF-8 encoded text.	String
bytes	Can contain arbitrary byte sequences.	ByteString

Optional fields and default values

When the field is set to optional, our message may or may not contain the field.
We can set default values for optional fields when not included.

 optional int32 result_per_page = 3 [default = 10];

If no default is specified, it is the system default.
String is the empty string, bool is false, integer is 0, and enumeration is the first value of the enumeration type.
So be careful when setting enumeration types.

enumerate

We can define an enumeration inside the message and set a number for it, for example we set a Corpus enumeration.

 message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3 [default = 10];
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  optional Corpus corpus = 4 [default = UNIVERSAL];
}

in the same enum. If we want multiple enumeration values to correspond to a number, we can use aliases.
In the following example, the above example will not have a problem, but the following example will have an error message.
Note: If the enumeration name is different, but the enumeration value is the same, an error will be reported, but if it is in a different message, an error will not be reported.

 enum EnumAllowingAlias {
  option allow_alias = true;
  UNKNOWN = 0;
  STARTED = 1;
  RUNNING = 1;
}
enum EnumNotAllowingAlias {
  UNKNOWN = 0;
  STARTED = 1;
  // RUNNING = 1;  // 取消注释此行将导致 Google 内部出现编译错误，并在外部出现警告消息。
}

Use other message types

You can use other message types as field types.
For example, let's say you want to include a Result message in every SearchResponse message - you can do this by defining a Result message type in the same .proto and then specifying a field of type Result in the SearchResponse:

 message SearchResponse {
  repeated Result result = 1;
}

message Result {
  required string url = 1;
  optional string title = 2;
  repeated string snippets = 3;
}

import definition

When we need to use messages from another .proto file, we can use them by importing definitions from other .proto files. To import the definition of another .proto, add an import statement at the top of the file:

 import "myproject/other_protos.proto";

By default, we can only use definitions from directly imported .proto files. However, sometimes we may need to move the .proto file to a new location.
Instead of directly moving the .proto file and updating all call sites in one change, we can put a placeholder .proto file in the old location to forward all imports to the new location using the concept of importing a common file.

 // new.proto
// All definitions are moved here

 // old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";

 // client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

Any code that imports a proto that contains an import public statement can transitively depend on import public dependencies.

nested type

We can define and use message types in other message types as shown in the following example - here the Result message is defined in the SearchResponse message:

 message SearchResponse {
  message Result {
    required string url = 1;
    optional string title = 2;
    repeated string snippets = 3;
  }
  repeated Result result = 1;
}

If we need to refer to result in another message, we need to specify the surrounding class (parent class) of result.

 message SomeOtherMessage {
  optional SearchResponse.Result result = 1;
}

You can nest messages however you want. In the example below, notice that the two nested types named Inner are completely independent because they are defined in different messages:

 message Outer {       // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      optional int64 ival = 1;
      optional bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      optional string name = 1;
      optional bool   flag = 2;
    }
  }
}

update message

If an existing message type no longer meets all our needs - for example, we want the message format to have an extra field - but we want to use code created with the old format before, don't worry! Updating message types without breaking any existing code is very simple. Just remember the following rules:

Do not change the field numbers of any existing fields.
Any new fields you add should be optional or repeating . This means that any messages serialized by code using the "old" message format can be parsed by newly generated code, since they do not lose any required elements. You should set sensible defaults for these elements so that new code can properly interact with messages generated by old code.
- Similarly, messages created by new code can be parsed by old code: old binaries ignore the new fields when parsing.
- However, unknown fields are not discarded, and if the message is serialized later, the unknown fields are serialized with it - so if the message is passed to new code, the new fields are still available.
Non-required fields can be removed if some field numbers are no longer used in the updated message type. You might want to rename the field, perhaps adding the prefix "OBSOLETE_", or keep the field number so future users of your .proto don't accidentally reuse the number.
As long as the type and number remain the same, non-required fields can be converted to extensions (extensions will be described later) and vice versa.
int32, uint32, int64, uint64, and bool are all compatible - this means you can change fields from one type to another without breaking forward or backward compatibility.
sint32 and sint64 are compatible with each other, but not with other integer types.
Strings and bytes are compatible as long as the bytes are in a valid UTF-8 encoding format.
fixed32 is compatible with sfixed32 and fixed64 is compatible with sfixed64.
For string, bytes and message fields, optional is compatible with repeated. Given serialized data for a repeated field as input, clients expecting this field to be optional will take the last input value if it is a base type field, or combine all inputs if it is a message type field element.
- Note that this is generally not safe for numeric types (including booleans and enums). Repeated fields of numeric type can be serialized in packed (more on that later) format, and when optional fields are required, they will not be parsed correctly.
Changing the default is usually fine, but keep in mind that the default is never sent over the network. Therefore, if a program receives a message that a particular field is not set, the program will see the default value defined in that program's protocol version. It won't see the default value defined in the sender code.
Try not to modify the enum values or you will get some weird problems.
Changing fields between map<K, V> and the corresponding duplicate message fields is binary compatible (see Maps below for message layout and other restrictions). However, the safety of the change depends on the application: a client defined with a repeated field will produce semantically identical results when deserializing and reserializing a message; however, a client defined with a mapped field may reorder the entries and delete entries with duplicate keys, because map keys cannot be duplicates.

Extensions

Extensions allow us to declare a range of field numbers in the message for use by third-party extensions.
Extensions are placeholders for fields of undefined types in the original .proto file. This allows other .proto files to define the types of fields by using these field numbers, which are eventually added to our message definition. Let's see an example:

 message Foo {
  // ...
  extensions 100 to 199;
}

This means that the field number range [100, 199] in Foo is reserved for expansion. Other users can now add new fields to Foo in their own .proto files that import our .proto, using field numbers in the range we specify - for example:

 extend Foo {
  optional int32 bar = 126;
}

This adds a field named bar with field number 126 to the original definition of Foo.
For specific operations, please refer to the follow-up Java Development Guide.
Note that extensions can be of any field type, including message types, but not oneofs or Maps.

nested expansion

We declare the extension inside the message

 message Baz {
  extend Foo {
    optional int32 bar = 126;
  }
  ...
}

The only difference is that bar is defined in the scope of Baz.
This is a common source of confusion : declaring a nested extension block inside a message does not imply any relationship between the outer type and the extension type. In particular, the above example does not imply that Baz is any subclass of Foo. This means that the symbol bar is declared in the scope of Baz; it is just a static member.

A common pattern is to define extensions within the scope of the extension's field type - for example, this is an extension to Foo of type Baz, where the extension is defined as part of Baz:

 message Baz {
  extend Foo {
    optional Baz foo_ext = 127;
  }
  ...
}

However, there is no requirement that an extension with the message type must be defined within the type. You can also do this:

 message Baz {
  ...
}

// This can even be in a different file.
extend Foo {
  optional Baz foo_baz_ext = 127;
}

In fact, to avoid confusion, it is better to use this syntax. As mentioned above, the nested syntax is often mistaken for subclassing by users unfamiliar with extensions.

Oneof

When our message contains multiple optional fields and assigns at most one, we can use oneOf.

Except for all fields in shared memory, where the oneof field is similar to the optional field, at most one field can be set at the same time. Setting oneof one of the members automatically clears all other members.
Depending on the language we choose, the value set in oneof (if any) can be checked using the special case() or WhichOneof() methods.
To define oneof in .proto, use the oneof keyword followed by the oneof name, in this case test_oneof:

 message SampleMessage {
  oneof test_oneof {
     string name = 4;
     SubMessage sub_message = 9;
  }
}

Note that the oneof field cannot be modified with required, optional, repeated. If you need to add a repeating field to oneof, you can use the code generated with the message containing the repeating field, the oneof field has the same getters and setters as regular optional methods.
We can also get a special method to check which value (if any) is set in oneof.

oneof feature

Setting the oneof field will automatically clear all other members of oneof. So if you set more than one oneof field, only the last field you set will still have a value.
```
 SampleMessage message;
message.set_name("name");
CHECK(message.has_name());
message.mutable_sub_message();   // Will clear name field.
CHECK(!message.has_name());
```
When parsing oneof, only the last member with a value is used.
oneof does not support Extensions.
oneof cannot be repeated.
The reflection API works with oneof fields.
If a default value is set, it will be parsed at serialization time.
When adding or deleting the oneof field, you need to pay attention. If checking the oneof field returns none or not_set, it means that we have not set the oneof field, or have set a different version of the oneof field. At this time, it is impossible to distinguish whether the field is cleared or not. .
oneof is very convenient in certain scenarios, such as when one of these conditions is not empty. So think about scenarios, avoid modifications, and control versions.

Maps

Provides a fast forwarding that can implement key-value pair mapping

 map<key_type, value_type> map_field = N;

where key_type can be any integer or string type (thus, any scalar type except float and bytes). Note that enums are not valid key_types. value_type can be any type except another map.
So, for example, if you wanted to create an item map where each item message is associated with a string key, you could define it like this:

 map<string, Project> projects = 3;

Map feature

Extension not supported
Cannot be repeated, optional, required.
In most cases, it is out of order and cannot be used as a sorting criterion.
When generating a .proto file in text format, it will be sorted by key.
For duplicate keys, the last seen key is used.

The commonly used functions are almost covered. If you encounter any usage that is not mentioned, you can participate in the comments and I will update it~ Thank you for your support~

Protocol Buffers Series (3) - proto2 .proto Syntax Guide

Define a Message type

the specified field type

Assign field numbers

Specify Field Rules

Add more message types

reserved text

What is automatically generated based on the .proto file?

Type comparison table

Optional fields and default values

enumerate

Use other message types

import definition

nested type

update message

Extensions

nested expansion

Oneof

oneof feature

Maps

Map feature

MurasakiSeiFu

引用和评论

Java12的新特性

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性