Google Protocol Buffers Compare

What is the simplest way to compare two google::protobuf::Message objects with each other?

From https://groups.google.com/d/msg/protobuf/5sOExQkB2eQ/ZSBNZI0K54YJ:

In C++, you could serialize the two and compare the bytes.
Alternatively, you could write some code that iterates over the fields
via reflection and compares them.

How do I compare the contents of two Google Protocol Buffer messages for equality?

protocol buffers have a method SerializeToString(daterministic=True)

Use it to compare your messages.

Compare two Repeated Fields with the C++ API

RepeatedField<T> has STL-like iterators, so you can use std::equal to compare them:

#include <algorithm>
#include <...>

const google::protobuf::ReapeatedField<int32> & myField1 = ...;
const google::protobuf::ReapeatedField<int32> & myField2 = ...;
bool fieldsEqual = std::equal(myField1.begin(), myField1.end(), myField2.begin());

How to compare two proto buffer message in Java?

== compares object references, it checks to see if the two operands point to the same object (not equivalent objects, the same object), so you can be sure that .build() makes a new object each time...

To use the code you posted you must compare with equals

System.out.println(aBuilder.build().equals(aBuilder.build()));

Comparison of protobuf and arrow

They are intended for two different problems. Protobuf is designed to create a common "on the wire" or "disk" format for data.

Arrow is designed to create a common "in memory" format for the data.

Of course, the next question, is what does this mean?

In Protobuf, if an application wants to work with the data, they first deserialize the data into some kind of "in memory" representation. This must be done because the Protobuf format is not easily compatible with CPU instructions. For example, protobuf packs unsigned integers into varints. These have a variable # of bytes and the wire-type of the field is crammed into the 3 least significant bits. You cannot take two unsigned integers and just add them without first converting them to some kind of "in memory" representation.

Now, protoc does have libraries for every language to convert to an "in memory" representation for those languages. However, this "in memory" representation is not common. You cannot take a Protobuf message, deserialize it into C# (using protoc generated code) and then process on these in-memory bytes in Java without doing some kind of C#->Java marshalling of the data.

Arrow, on the other hand, solves this problem. If you have an Arrow table in C# you can map that memory to a different language and start processing on it without doing any kind of "language-to-language" marshaling of data. This zero-copy allows for efficient hand-off between languages. Python has been employing tricks like this (e.g. the array protocol) for a while now and it works great for data analysis.

However, Arrow is not always the greatest format for over-the-wire transmission because it can be inefficient. Those varints I mentioned before help Protobuf cut down on message size. Also, Protobuf tags each field so it can save space when there are many optional fields. In fact, Arrow uses Protobuf & gRPC for over-the-wire transmission of metadata in Arrow Flight (an RPC framework).