有 Protocol buffer 这种轻便的序列化反序列化工具，Json 为什么还会大量使用？

什么是protobuf？

protobuf全称Google Protocol Buffers，是一种语言无关、平台无关的针对结构化数据的序列化工具。作为Java开发者比较常用的是Java的序列化，但是这个序列化方式只能在Java语言中通信，而protobuf可以实现跨语言。另外其实我们可以使用JSON或者XML方式，但是这两种结构导致数据比较大，而protobuf它更小、更快、更简单。

如何工作？

protobuf定义了自己的语言，用户需要根据要求指定自己的文件（.proto结尾），比如如下内容：

上面的这些就是proto规定的文件，里面有一些类型等。具体什么意思，后面一一介绍与尝试。定义好文件后，通过proto提供的编译器进行编译，编译完毕后，会根据你选择的编译器生成不同的文件比如选择的是java，就会生成Java文件，使用者就可以使用这些文件了。

安装protobuf 编译器

github地址：https://github.com/protocolbuffers/protobuf#protocol-compiler-installation 我选择了releases中的mac系统的包（包含所支持的所有语言）https://github.com/protocolbuffers/protobuf/releases

image-20221020093058572

下载完毕后解压，看下结构

bin目录下的protoc就是编译器。

配置环境变量：

修改~/.bash_profile文件

修改后执行source ~/.bash_profile。

验证

没有问题!

生成Java文件

进入之前定义的proto文件目录中

-I代表输入后面的.代表当前路径，--java_out代表使用Java输出，.代表当前路径，person.proto代表protobuf的文件。执行完毕后就会看到生成了一个PersonOuterClass.java的文件。

文件内容如下：

消息结构说明

protobuf将定义的数据结构叫做消息，.proto文件的书写是有严格要求的，主要分为proto2和proto3两种语法，我目前就学习proto3。

定义消息类型

首先看一个官方的例子

messageSearchRequest

指定字段类型

三个前面的string、int32代表字段类型，protobuf定义了几种类型。

分配字段编号

字段后面的1、2、3代表的是字段的编号，一个消息内是唯一的。1-15的编号占用一个字节，16-2047的编号占用2个字节。最小字段编号是 1，最大的是 2^29 - 1。

指定字段规则

在类型之前可以使用规则标识来声明，规则标识有如下几种。 singular，这是默认规则，就是说被他修饰的字段只能出现0次或者1次。 optional，与singular类似，他有两种状态，如果该字段有值，就会被序列化，否则不会。 repeated，字段可以重复0次或者多次出现。 map，键值对类型

添加更多消息类型

一个.proto文件中，可以有多个消息，比如

有两个消息，一个是SearchRequest，另一个是SearchResponse

添加注释

分为单行和多行注释，属性上一般使用单行，消息体上方一般使用多行。单行使用//,多行使用/---/

类型对照表

一下表格列出了proto的类型和各种语言类型的关系。

.proto Type	Notes	C++ Type	Java/Kotlin Type[1]	Python Type[3]	Go Type	Ruby Type	C# Type	PHP Type	Dart Type
double	double	double	float	float64	Float	double	float	double
float	float	float	float	float32	Float	float	float	double
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int	int32	Fixnum or Bignum (as required)	int	integer	int
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long[4]	int64	Bignum	long	integer/string[6]	Int64
uint32	Uses variable-length encoding.	uint32	int[2]	int/long[4]	uint32	Fixnum or Bignum (as required)	uint	integer	int
uint64	Uses variable-length encoding.	uint64	long[2]	int/long[4]	uint64	Bignum	ulong	integer/string[6]	Int64
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int	int32	Fixnum or Bignum (as required)	int	integer	int
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long[4]	int64	Bignum	long	integer/string[6]	Int64
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 228.	uint32	int[2]	int/long[4]	uint32	Fixnum or Bignum (as required)	uint	integer	int
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 256.	uint64	long[2]	int/long[4]	uint64	Bignum	ulong	integer/string[6]	Int64
sfixed32	Always four bytes.	int32	int	int	int32	Fixnum or Bignum (as required)	int	integer	int
sfixed64	Always eight bytes.	int64	long	int/long[4]	int64	Bignum	long	integer/string[6]	Int64
bool	bool	boolean	bool	bool	TrueClass/FalseClass	bool	boolean	bool
string	A string must always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 232.	string	String	str/unicode[5]	string	String (UTF-8)	string	string	String
bytes	May contain any arbitrary sequence of bytes no longer than 232.	string	ByteString	str (Python 2) bytes (Python 3)	[]byte	String (ASCII-8BIT)	ByteString	string	List

默认值

string：默认值是空字符串
bytes：默认值是空字节数组
bools：默认是false
数字类型：默认是0.
enums：枚举类型默认值是第一个定义的枚举值。
消息类型：默认值取决于具体的语言

枚举

枚举使用enum声明，必须要有值=0的枚举定义，而且必须放到枚举定义的第一行，类似如下官方代码。

如果枚举中某两个或者多个的值相同，需要使用来声明别名。

可以看到EAA_STARTED = 1和 EAA_RUNNING = 1的值相同，他俩其实就是相同的，也可以理解为别名。

使用其他类型

一个消息里可以使用另一个消息作为字段类型

Result消息作为了SearchResponse的字段类型。

导入消息

如果将上面的两个消息分别放入不同的proto文件中，就得需要导入，导入使用import关键字，具体尝试下。

看如下图，两个文件的结构。

image-20221019195256890

右侧是导入，使用的是import "other/Result.proto";

有的时候，可能SearchResponse.proto和Result.proto本来在同一个目录，但是Result后来被移动到了other路径下，此时，要向上面所说的那样使用import "other/Result.proto"; 这样已经定义好的文件要修改，麻烦。这时候可以保留移动前的文件，下图中图2的文件，内容修改一下（修改一个文件跟修改多个文件工作量还是不同的）如下图。

image-20221019195847367

此时SearchResponse.proto不需要更改，保持原样。

image-20221019200111442

使用 proto2 消息类型

可以导入proto2消息类型并在您的 proto3 消息中使用它们，反之亦然。但是，proto2 枚举不能直接在 proto3 语法中使用（如果导入的 proto2 消息使用它们也没关系）。

嵌套消息

消息可以嵌套

其他人如果想使用Result，则需要使用外层消息名.内层消息名。这跟Java的内部类类似。

更新消息类型

待更新。。。

序列化与发序列化

这里我新建一个protobuf-java的maven项目，需要引入依赖

创建proto消息

使用编译器编译为java。

编译结果如下：

image-20221020101102783

生成了很多java文件。

在maven pom.xml文件中引入protobuf-java等依赖

接下来在main方法中进行序列化和反序列化，

运行结果如下，可以看到序列化和反序列化的结果是能够正确打印出来的

image-20221020103621897

勾搭交流

Java语言交流群

Go语言交流群

微信公众号