Can I rely on the serialization of two identical structures being identical?

Can I rely on the serialization of two identical structures being identical? - rust

Consider the following code:
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct MyStruct {
a: u32,
b: u32
}
#[derive(Serialize, Deserialize)]
#[serde(rename = "MyStruct")]
struct AlsoMyStruct {
a: u32,
b: u32
}
I am wondering if I can safely do something like:
let ser = any_encoding::serialize(&MyStruct{a: 33, b: 44}).unwrap();
let deser: AlsoMyStruct = any_encoding::deserialize(&ser).unwrap();
where any_encoding is, e.g., bincode, json, or any other Serde-supported encoding. In my head, this should work nicely: the two structures have the same name (I'm explicitly renaming AlsoMyStruct into "MyStruct") and exactly the same fields: same field names, same field types, same order of fields.
However, I am wondering: is this is actually guaranteed to work? Or is there some other, corner-case, maybe platform-dependent, unforeseen piece of information that a Serde serializer/deserializer might include in the representation of MyStruct / AlsoMyStruct that could lead to the two representations being incompatible?

In general, no, you cannot expect this to work. The reason is that neither serde nor any de/serializers guarantee that you can round-trip your data (source). This means you cannot even expect this to work in all cases if you use the same struct in both places.
For example JSON cannot round-trip Option<()> and formats which are not self-describing like bincode, do not support untagged enums.
Nothing in the type signatures enforces round-tripping.
Here are some reasons why deserialization might fail:
Using skip_serializing_none with not self-describing formats (serde #1732).
Anything which calls deserialize_any, such as untagged, adjacently tagged, or internally tagged enums (serde #1762).
Borrowing during deserialization, e.g., for &'de str or &'de [u8]. serde_json only supports &'de str if there are no escape sequences and never supports &'de [u8].
Some formats cannot serialize some types, e.g., JSON does not supports structs as map keys and bincode only supports sequences of known lengths (bincode #167).
A type only implements one of the traits (Serializer/Deserializer) or the implementations do not match, e.g., serialize as number but deserialize as string.
That being said, this can work under some circumstances. The structs should have the same name and the fields in the same order. The types or rather the Serialize/Deserialize implementations also need to support round-tripping. With Option<()> from above it also depends on the Serializer/Deserializer implementations if you can round-trip them, even if Serialize/Deserialize implementations do support it.
Many types do try to support round-tripping, since that is what most commonly is expected.

Related

How can I sort fields in alphabetic order when serializing with serde?

I've an API that requires the object's fields to be sorted alphabetically because the struct has to be hashed.
In Java/Jackson, you can set a flag in the serializer: MapperFeature.SORT_PROPERTIES_ALPHABETICALLY. I can't find anything similar in Serde.
I'm using rmp-serde (MessagePack). It follows the annotations and serialization process used for JSON, so I thought that it would be fully compatible, but the sorting provided by #jonasbb doesn't work for it.
The struct has (a lot of) nested enums and structs which have to be flattened for the final representation. I'm using Serialize::serialize for that, but calling state.serialize_field at the right place (such that everything is alphabetical) is a pain because the enums need a match clause, so it has to be called multiple times for the same field in different places and the code is very difficult to follow.
As possible solutions, two ideas:
Create a new struct with the flat representation and sort the fields alphabetically manually.
This is a bit error prone, so a programmatic sorting solution for this flattened struct would be great.
Buffer the key values in Serialize::serialize (e.g. in a BTreeMap, which is sorted), and call state.serialize_field in a loop at the end.
The problem is that the values seem to have to be of type Serialize, which isn't object safe, so I wasn't able to figure out how to store them in the map.
How to sort HashMap keys when serializing with serde? is similar but not related because my question is about the sorting of the struct's fields/properties.

You are not writing which data format you are targetting. This makes it hard to find a solution, since some might not work in all cases.
This code works if you are using JSON (unless the preserve_order feature flag is used). The same would for for TOML by serializing into toml::Value as intermediate step.
The solution will also work for other data formats, but it might result in a different serialization, for example, emitting the data as a map instead of struct-like.
fn sort_alphabetically<T: Serialize, S: serde::Serializer>(value: &T, serializer: S) -> Result<S::Ok, S::Error> {
let value = serde_json::to_value(value).map_err(serde::ser::Error::custom)?;
value.serialize(serializer)
}
#[derive(Serialize)]
struct SortAlphabetically<T: Serialize>(
#[serde(serialize_with = "sort_alphabetically")]
T
);
#[derive(Serialize, Deserialize, Default, Debug)]
struct Foo {
z: (),
bar: (),
ZZZ: (),
aAa: (),
AaA: (),
}
println!("{}", serde_json::to_string_pretty(&SortAlphabetically(&Foo::default()))?);
because the struct has to be hashed
While field order is one source of indeterminism there are other factors too. Many formats allow different amounts of whitespace or different representations like Unicode escapes \u0066.

How do I save structured data to file?

My program contains a huge precomputation with constant output. I would like to avoid running this precomputation in the next times I run the program. Thus, I'd like to save its output to file in the first run of the program, and just load it the next time I run the program.
The output contains non-common data types, object and structs I defined myself.
How do I go about doing that?

A de-facto standard way for (de-)serializing rust objects is serde. Given a rust struct (or enum) it produces an intermediate representation, which then can be converted to a desired format (e.g. json). Given a struct:
use serde::{Serialize, Deserialize};
// here the "magic" conversion is generated
#[derive(Debug, Serialize, Deserialize)]
struct T {
i: i32,
f: f64,
}
you can get a json representation with simple as oneliner:
let t = T { i: 1, f: 321.456 };
println!("{}", serde_json::to_string(&t).unwrap());
// prints `{"i":1,"f":321.456}`
as well as converting back:
let t: T = serde_json::from_str(r#"{"i":1,"f":321.456}"#).unwrap();
println!("i: {}, f: {}", t.i, t.f);
// prints i: 1, f: 321.456
Here is a playground link. This is an example for serde_json usage, but you may find other more suitable libraries like serde_sbor, serde_yaml, bincode, serde_xml and many many others.

You would want to use something like serde to serialize the data, save it to disk, and then restore it from there on the next run. In particular, bincode, is useful to serialize the data in the binary format which saves much more space than JSON or other human readable format. But, you have to be careful not to use old serialized data if you have changed the layout of the structures in your program.
For more in depth, I would go to Serde's documentation, but the basic idea is to mark all your structures that need saving with #[derive(Serialize, Deserialize)], and use bincode to do the serialization/deserialization. Playground link for an example I wrote using serde_json (as bincode is not available in rust playground), but bincode is no different, other than using serialize and deserialize as opposed to to_vec and from_slice.

Avoiding pubs in Rust

I've just split my program into an executable and a large file full of struct definitions (structs.rs).
In order to use the structs and their fields for the main executable, I have to prepend each struct definition, and every field definition with pub.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Foo {
pub bar: u8,
pub baz: [u8; 4]
}
This reduces readability.
Is there a way of avoiding all of these pubs?
Or should I be using another way of decomposing my program into multiple files?

This is expected. Modules and crates are a privacy boundary in Rust. The default privacy level for structs and their fields is "same module and its submodules", so parent modules and sibling modules need to have a permission to touch struct's fields.
In Rust structs typically have only a few fields, but the fields may have important invariants to uphold. For example, if Vec.len was public, you could cause unsafe out of bounds access. As a precaution, Rust requires programmers to think through whether they can allow access to each field.
It's unusual to have thousands of structs or structs with thousands of fields, unless they're mirroring some external data definition. If that's the case, consider auto-generating struct definitions with a macro or build.rs.

Is it possible to implement ASN.1 DER in Rust using Serde?

AFAIK there's no (flexible and stable) ASN.1 {ser,deser}ialization library in Rust, so I'm looking into making one (while learning Rust at the same time). My goal is SNMP (v1-v3) client implementation in Rust.
Before starting from scratch, I'd like to ask Serde team or experienced Serde users if it's possible to implement ASN.1 codec with Serde. Problem is every object in ASN.1 has it's own header (TAG + LENGTH), where TAG is user defined for each type, so iXX or uXX or bytes or whatever can be any TAG.
An ASN.1 object is composed of tag, length and payload. ASN.1 has a set of universal (default) tags for integers, floats, bytestrings (as well as ASCII strings) etc. I could just stick to universal tags for primitive types, but for non-primitive types (tuples, newtypes, structs etc) the type should have an implementation of the Asn1Info trait, providing tag and custom serialize / deserialize functionality.
{ser,deser}ialization of primitive types is trivial, but how can I implement it for complex structures (or newtypes)? They must be Asn1Info.
I've looked into the asn1-cereal library. It looks like a decent ASN.1 implementation, providing useful macros and stuff. I might as well work on it instead of writing everything from scratch.
Let's assume tag is u8 and Asn1Info trait looks like this:
pub trait Asn1Info {
fn asn1_tag() -> u8;
}
Then I have a newtype like pub struct Counter(u32) with it's own application-specific tag. I'd then make an impl for Counter like this:
impl Asn1Info for Counter {
fn asn1_tag() -> u8 {
0x41
}
}
Now how do I serialize it with tag 0x41 without manually implementing Serialize trait? There's no way to inject additional information to Serializer, so I'm unable to reuse all non-primitive serialization methods in it (like serialize_newtype_variant).
If I can't use Serializer methods in Serialize trait impl for custom ASN.1 objects (application-specific, context-specific etc.), then there's no way (or no point) to implement a useful ASN.1 codec with Serde, isn't it?

Why does Rust have struct and enum?

Rust's enums are algebraic datatypes. As far as I can tell this seems to subsume what struct is. What is different about struct that necessitates keeping it?

First of all, you are correct that semantically enum is strictly superior to the struct as to what it can represent, and therefore struct is somewhat redundant.
However, there are other elements at play here.
ease of use: the values within an enum can only be accessed (directly) through matching; contrast with the ease of use of accessing a struct field. You could write accessors for each and every field, but that is really cumbersome.
distinction: an enum is a tagged union, a struct has a fixed-layout; we (programmers) generally like to put labels on things, and therefore giving different names to different functionality can be appreciated.
As I see it, struct is therefore syntactic sugar. I usually prefer lean and mean, but a bit of sugar can go a long way in increasing what can be represented tersely.

Firstly, Rust has a wide array of data types:
Structs with named fields (struct Foo {bar: uint})
Tuple structs (struct Foo(pub Bar, Baz))
Structs with no fields (struct Foo;)
Enums, with various types of variants:
Variants with no fields (eg None)
Tuple variants (eg Some(T))
Struct variants (eg Some { pub inner :T })
This gives the programmer some flexibility in defining datatypes. Often, you don't want named fields, especially if the struct/variant has only one field. Rust lets you use tuple structs/tuple variants in that case.
If structs were removed from Rust there would be no loss of functionality, enums with struct variants could be used again. But there would be an overwhelming number of single-variant enums which would be unnecessary and cumbersome to use.

Not 100% correct, but another nice way to think about it : enum isn't actually superior to struct, the syntax sugar just makes it look like it is.
An enum is a sum type meaning that it's value is one value of one of a set of other types. The Result<T, E> type is either of type T or E. So each enum variant has exactly one type associated with it. Everything else (no type, tuple variants and struct variants) could be syntax sugar.
enum Animal {
// without syntax sugar
Cat(i32),
// desugars to `Dog(())` (empty tuple/unit)
Dog,
// desugars to `Horse((i32, bool))` (tuple)
Horse(i32, bool),
// desugars to `Eagle(GeneratedEagleType)` and a struct definition outside
// of this enum `struct GeneratedEagleType { weight: i32, male: bool }`
Eagle { weight: i32, male: bool }
}
So it would be enough if each enum variant would be associated with exactly one type. And in that case enum is not superior to struct, because it cannot construct product types (like struct).
To be able write the "type definition" inside the enum variant definition is just for convenience.
Also: struct is superior to "tuple structs" and "tuples", too. If we ignore the names those three things are nearly equivalent. But Rust still has those three different kinds of types for convenience.
Please note that I don't know if those enum definitions are actually syntax sugar or not. But they could be and that might help think about. it

Visibility
Not to find reason in what may be a transient implementation detail (I'm not on the core team and have no insight), but
A public enum can not hold or contain a private struct.
A public struct can hold or contain a private enum.
See also
Why does "can't leak private type" only apply to structs and not enums?

You are right that enums and traits and their inheritance by structs both implement algebraic datatypes.
However traits is an extensible set of types, where any stuct can be attributed any trait from any piece of code. Using introspection, it is possible to expect a value with a given trait, and dynamically dig out the actual type of a struct. But that type is among a set of types that is unpredictable, because any struct could be given the said trait from anywhere.
Whereas enums define once and for all a limited type hierarchy, making it predictable and straightforward to go match subtypes as if they were simple values. The language features surrounding enums can therefore be optimized so that the type checks occur statically, providing much greater performance, and some sugar in the language. But also the hierarchy description is contained to the enum definition, and does not impact the trait type system.
TL;DR: enum narrows down a hierarchy of types in a contained manner, instead of relying on traits that are extensible by any piece of code and impacts everything.

Another important distinction along with the above answers is that enum at one point can ONLY be one of the values stated while struct represents values for all the parameters for that instance.
eg.
enum Things{
Box,
Page(i32),
}
struct Things{
box: Option<String>,
page: i32,
}
In the above case a single enum can either be a Box or a Page, while a single instance of struct will represent a box and a page.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Can I rely on the serialization of two identical structures being identical? - rust

Related

How can I sort fields in alphabetic order when serializing with serde?

How do I save structured data to file?

Avoiding pubs in Rust

Is it possible to implement ASN.1 DER in Rust using Serde?

Why does Rust have struct and enum?

Categories

Resources