AFAIK there's no (flexible and stable) ASN.1 {ser,deser}ialization library in Rust, so I'm looking into making one (while learning Rust at the same time). My goal is SNMP (v1-v3) client implementation in Rust.
Before starting from scratch, I'd like to ask Serde team or experienced Serde users if it's possible to implement ASN.1 codec with Serde. Problem is every object in ASN.1 has it's own header (TAG + LENGTH), where TAG is user defined for each type, so iXX or uXX or bytes or whatever can be any TAG.
An ASN.1 object is composed of tag, length and payload. ASN.1 has a set of universal (default) tags for integers, floats, bytestrings (as well as ASCII strings) etc. I could just stick to universal tags for primitive types, but for non-primitive types (tuples, newtypes, structs etc) the type should have an implementation of the Asn1Info trait, providing tag and custom serialize / deserialize functionality.
{ser,deser}ialization of primitive types is trivial, but how can I implement it for complex structures (or newtypes)? They must be Asn1Info.
I've looked into the asn1-cereal library. It looks like a decent ASN.1 implementation, providing useful macros and stuff. I might as well work on it instead of writing everything from scratch.
Let's assume tag is u8 and Asn1Info trait looks like this:
pub trait Asn1Info {
fn asn1_tag() -> u8;
}
Then I have a newtype like pub struct Counter(u32) with it's own application-specific tag. I'd then make an impl for Counter like this:
impl Asn1Info for Counter {
fn asn1_tag() -> u8 {
0x41
}
}
Now how do I serialize it with tag 0x41 without manually implementing Serialize trait? There's no way to inject additional information to Serializer, so I'm unable to reuse all non-primitive serialization methods in it (like serialize_newtype_variant).
If I can't use Serializer methods in Serialize trait impl for custom ASN.1 objects (application-specific, context-specific etc.), then there's no way (or no point) to implement a useful ASN.1 codec with Serde, isn't it?
Related
Consider the following code:
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct MyStruct {
a: u32,
b: u32
}
#[derive(Serialize, Deserialize)]
#[serde(rename = "MyStruct")]
struct AlsoMyStruct {
a: u32,
b: u32
}
I am wondering if I can safely do something like:
let ser = any_encoding::serialize(&MyStruct{a: 33, b: 44}).unwrap();
let deser: AlsoMyStruct = any_encoding::deserialize(&ser).unwrap();
where any_encoding is, e.g., bincode, json, or any other Serde-supported encoding. In my head, this should work nicely: the two structures have the same name (I'm explicitly renaming AlsoMyStruct into "MyStruct") and exactly the same fields: same field names, same field types, same order of fields.
However, I am wondering: is this is actually guaranteed to work? Or is there some other, corner-case, maybe platform-dependent, unforeseen piece of information that a Serde serializer/deserializer might include in the representation of MyStruct / AlsoMyStruct that could lead to the two representations being incompatible?
In general, no, you cannot expect this to work. The reason is that neither serde nor any de/serializers guarantee that you can round-trip your data (source). This means you cannot even expect this to work in all cases if you use the same struct in both places.
For example JSON cannot round-trip Option<()> and formats which are not self-describing like bincode, do not support untagged enums.
Nothing in the type signatures enforces round-tripping.
Here are some reasons why deserialization might fail:
Using skip_serializing_none with not self-describing formats (serde #1732).
Anything which calls deserialize_any, such as untagged, adjacently tagged, or internally tagged enums (serde #1762).
Borrowing during deserialization, e.g., for &'de str or &'de [u8]. serde_json only supports &'de str if there are no escape sequences and never supports &'de [u8].
Some formats cannot serialize some types, e.g., JSON does not supports structs as map keys and bincode only supports sequences of known lengths (bincode #167).
A type only implements one of the traits (Serializer/Deserializer) or the implementations do not match, e.g., serialize as number but deserialize as string.
That being said, this can work under some circumstances. The structs should have the same name and the fields in the same order. The types or rather the Serialize/Deserialize implementations also need to support round-tripping. With Option<()> from above it also depends on the Serializer/Deserializer implementations if you can round-trip them, even if Serialize/Deserialize implementations do support it.
Many types do try to support round-tripping, since that is what most commonly is expected.
I am trying to implement a Serde Serializer/Deserializer for the Kafka wire protocol. In this protocol, there are 4 different string representations. This poses a problem when implementing the deserializer: when deserialize_str is called to attempt to deserialize a string for a given message, there’s no way to know whether a string starts with i32 or a varint, since neither the deserialize_str method or the provided visitor provides any kind of metadata or type information that could be used to help make this decision.
My first thought was that I could implement a new type wrapper and use a custom deserialize implementation, but I now understand this doesn’t make sense because it needs to be generic over all deserializers, not just the deserializer I’m building. The wrapper still just asks the Deserializer to read a string.
I’m struggling to come up with a good solution here, and can’t find examples of other data formats that use multiple representations for a given data type.
Here's the EXTREMELY HACKY way that I'm planning on solving this using a new type wrapper and custom Deserialize implementation. More specifically, my Deserialize impl will have it's own Visitor and we're going abuse Visitor here.
First I was hoping that I could use TypeId on Visitor::Value, but Visitor doesn't have a static lifetime, so that doesn't work.
Instead, I'm using the "expected" error message as a type tag, using the serde::de::Expected trait:
fn deserialize_str<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
use serde::de::Expected;
let ty = format!("{}", &visitor as &dyn Expected);
if ty == "NullableString" {
// ... do nullable dstring deserialization logic here
}
// ...
}
This... works! But feels extremely gross. Not totally satisfied, but goign to proceed with this for now.
This question already has answers here:
Why are trait methods with generic type parameters object-unsafe?
(2 answers)
Closed 1 year ago.
Is this requirement really necessary for object safety or is it just an arbitrary limitation, enacted to make the compiler implementation simpler?
A method with type arguments is just a template for constructing multiple distinct methods with concrete types. It is known at compile time, which variants of the method are used. Therefore, in the context of a program, a typed method has the semantics of a finite collection of non-typed methods.
I would like to see if there are any mistakes in this reasoning.
I will take this opportunity to present withoutboat's nomenclature of Handshaking patterns, a set of ideas to reason about the decomposition of a functionality into two interconnected traits:
you want any type which implements trait Alpha to be composable with any type which implements trait Omega…
The example given is for serialization (although other use cases apply): a trait Serialize for types the values of which can be serialized (e.g. a data record type); and Serializer for types implementing a serialization format (e.g. a JSON serializer).
When the types of both can be statically inferred, designing the traits with the static handshake is ideal. The compiler will create only the necessary functions monomorphized against the types S needed by the program, while also providing the most room for optimizations.
trait Serialize {
fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
where S: Serializer;
}
trait Serializer {
//...
fn serialize_map_value<S>(&mut self, state: &mut Self::MapState, value: &S)
-> Result<(), Self::Error>
where S: Serialize;
fn serialize_seq_elt<S>(&mut self, state: &mut Self::SeqState, elt: &S)
-> Result<(), Self::Error>;
where S: Serialize;
//...
}
However, it is established that these traits cannot do dynamic dispatching. This is because once the concrete type is erased from the receiving type, that trait object is bound to a fixed table of its trait implementation, one entry per method. With this design, the compiler is unable to reason with a method containing type parameters, because it cannot monomorphize over that implementation at compile time.
A method with type arguments is just a template for constructing multiple distinct methods with concrete types. It is known at compile time, which variants of the method are used. Therefore, in the context of a program, a typed method has the semantics of a finite collection of non-typed methods.
One may be led to think that all trait implementations available are known, and therefore one could revamp the concept of a trait object to create a virtual table with multiple "layers" for a generic method, thus being able to do a form of one-sided monomorphization of that trait object. However, this does not account for two things:
The number of implementations can be huge. Just look, for example, at how many types implement Read and Write in the standard library. The number of monomorphized implementations that would have to be made present in the binary would be the product of all known implementations against the known parameter types of a given call. In the example above, it is particularly unwieldy: serializing dynamic data records to JSON and TOML would mean that there would have to be Serialize.serialize method implementations for both JSON and TOML, for each serializable type, regardless of how many of these types are effectively serialized in practice. This without accounting the other side of the handshake.
This expansion can only be done when all possible implementations are known at compile time, which is not necessarily the case. While not entirely common, it is currently possible for a trait object to be created from a dynamically linked shared object. In this case, there is never a chance to expand the method calls of that trait object against the target compilation item. With this in mind, the virtual function table created by a trait implementation is expected to be independent from the existence of other types and from how it is used.
To conclude: This is a conceptual limitation that actually makes sense when digging deeper. It is definitely not arbitrary or applied lightly. Generic method calls in trait objects is too unlikely to ever be supported, and so consumers should instead rely on employing the right interface design for the task. Thinking of handshake patterns is one possible way to mind-map these designs.
See also:
What is the cited problem with using generic type parameters in trait objects?
The Rust Programming Language, section 17.2: Object Safety Is Required for Trait Objects
I read these docs about structs, but I don't understand about unit structs. It says:
Unit structs are most commonly used as marker. They have a size of zero bytes, but unlike empty enums they can be instantiated, making them isomorphic to the unit type (). Unit structs are useful when you need to implement a trait on something, but don’t need to store any data inside it.
they only give this piece of code as an example:
struct Unit;
What is a real world example of using a unit struct?
Standard library
Global
The global memory allocator, Global, is a unit struct:
pub struct Global;
It has no state of its own (because the state is global), but it implements traits like Allocator.
std::fmt::Error
The error for string formatting, std::fmt::Error, is a unit struct:
pub struct Error;
It has no state of its own, but it implements traits like Error.
RangeFull
The type for the .. operator, RangeFull, is a unit struct:
pub struct RangeFull;
It has no state of its own, but it implements traits like RangeBounds.
Crates
chrono::Utc
The Utc timezone is a unit struct:
pub struct Utc;
It has no state of its own, but it implements traits like TimeZone and is thus usable as a generic argument to Date and DateTime.
In addition to providing a basis for stateless trait implementations, you may also see unit structs created simply to serve as a marker (as mentioned in the quote) to be used in some other structure. Some examples:
In std::marker, there are PhantomData and PhantomPinned which are used to augment other types in concert with the compiler for variance, auto-generated traits, or other behavior. These in particular are special cases and not really a usable pattern outside the standard library.
You can see marker types used in the typestate pattern. Like Rocket<P> in Rocket v0.5. It uses P as simply an indicator of what "pahse" the application is in and provides methods only with specific phases (like can't configure after it is started). Technically Build/Ingite/Orbit are empty structs and not unit structs, but that distinction isn't meaningful here.
You can also see marker types used similarly for defining Format options in the tracing-subscriber crate. The Compact type (a unit struct) is used in conjunction with it to provide different methods and behavior that are not tied to itself directly (Compact implements no traits, and has no methods; the specialization comes from Format<Compact, T> implementing FormatEvent).
The rust book says:
We can also implement Summary on Vec<T> in our aggregator crate,
because the trait Summary is local to our aggregator crate.
https://doc.rust-lang.org/book/ch10-02-traits.html#implementing-a-trait-on-a-type
If my package uses another crate from crates.io, like rand, and rand implements a trait on a type in the standard library, like Vec<T>, will my code be able to see those methods?
I know there's a restriction where a trait has to be in scope for you to use its methods. If rand implemented a custom trait on Vec<T>, and I tried to use one of the methods in that trait within my crate, would the compiler tell me that I need to import that trait from rand before using those methods, or would it tell me that the methods don't exist? If it's the former, if I import the trait from rand, can I then use those methods on Vec<T>?
From my experimentation, if a crate implements a trait on a foreign type, that trait is accessible using the normal rules (that is, in order to call methods of that trait, you must bring it in into scope, but otherwise, nothing special is required). You don't need to do anything else.
For example, consider the crate serde, which provides facilities to serialize and deserialize data. It provides the traits Serialize and Deserialize, which allow data structures to define how they are serialized and deserialized into various formats. Additionally, it provides implementations of those traits for many built-in and std types (see here and here). I made a quick experiment here, and my code can use those traits without me having to do anything extra (in fact, as you'll see, since I never directly use those traits, I don't even have to bring them into scope).
So, to the best of my knowledge, the answer to your question is yes, your code can use a trait implemented by rand for Vec<T>. All you need to do is import that trait from rand.