Serde Conditional Deserialization for Binary Formats (Versioning)

Serde Conditional Deserialization for Binary Formats (Versioning) - rust

I'm trying to deserialize data in a simple non-human readable and non-self describing format to Rust structs. I've implemented a custom Deserializer for this format and it works great when I'm deserializing the data into a struct like this for example:
#[derive(Serialize, Deserialize)]
pub struct Position {
x: f32,
z: f32,
y: f32,
}
However, let's say this Position struct had a new field added (could have been removed too) in a new version:
#[derive(Serialize, Deserialize)]
pub struct Position {
x: f32,
z: f32,
y: f32,
is_visible: bool, // This field was added in a new version
}
But I still need to support both data from both versions of Position. The version of the data (known at runtime) can be given to the Deserializer but how can the Deserializer know the version of a field (known at compile time)?
I've looked at #[serde(deserialize_with)] but it didn't work because I cannot get the needed version information.
I 've also looked at implementing Deserialize manually for Position and I can receive the versions of the fields of Position by implementing something like Position::get_version(field_name: &str).
However, I cannot figure how to get the version of the data currently being deserialized because Deserialize::deserialize only has a trait bound Deserializer<'de> and I cannot make that bound stricter by adding another bound (so it doesn't know about my custom Deserializer).
At this point, I'm thinking about giving the version data of each field when instantiating the Deserializer but I'm not sure if that will work or if there is a better way to go.

Multiple structs implementing a shared trait
If you have several different versions with several different types of struct, and you want a more robust way of handling different variants, it might be a better idea to write structs for each possible format. You can then define and implement a trait for shared behavior.
trait Position {
fn x(&self) -> f32;
fn y(&self) -> f32;
fn z(&self) -> f32;
fn version_number(&self) -> usize;
}
struct PositionV0 {
x: f32,
y: f32,
z: f32
}
impl Position for PositionV0 {
fn x(&self) -> f32 {
self.x
}
// You get the idea for the fn y, fn z implementations
fn version_number(&self) -> usize {
0
}
}
struct PositionV1 {
x: f32,
y: f32,
z: f32,
is_visible: bool,
}
impl Position for PositionV1 {
fn x(&self) -> f32 {
self.x
}
// You get the idea for the fn y, fn z implementations
fn version_number(&self) -> usize {
1
}
}

Carson's answer is great when you do not have a lot of versions but for me I am working with data structures that range over 20 different versions.
I went with a solution that while I don't think is the most idiomatic, is capable of handling an arbitrary number of versions.
In short:
we implement a Version trait which gives the necessary version info to the Deserializer
Deserializer has VersionedSeqAccess (implements serde::de::SeqAccess) that sets a flag
When flag is set, we put None for that field and immediately unset the flag
The idea is to implement the following trait for the struct:
pub trait Version {
/// We must specify the name of the struct so that any of the fields that
/// are structs won't confuse the Deserializer
fn name() -> &'static str;
fn version() -> VersionInfo;
}
#[derive(Debug, Clone)]
pub enum VersionInfo {
/// Present in all versions
All,
/// Present in this version
Version([u16; 4]),
/// Represent Versions of structs
Struct(&'static [VersionInfo]),
// we can add other ways of expressing the version like a version range for ex.
}
Here is how it will be implemented for the example struct Position. This type of manual deriving is error prone so this can be improved with a derive macro (see end):
struct Position {
x: f32,
z: f32,
y: f32,
is_visible: Option<bool>, // With this solution versioned field must be wrapped in Option
}
impl Version for Position {
fn version() -> VersionInfo {
VersionInfo::Struct(&[
VersionInfo::All,
VersionInfo::All,
VersionInfo::All,
VersionInfo::Version([1, 13, 0, 0]),
])
}
fn name() -> &'static str {
"Position"
}
}
Now, the deserializer will be instansiated with the version of the data format we are currently parsing:
pub struct Deserializer<'de> {
input: &'de [u8],
/// The version the `Deserializer` expect the data format to be
de_version: [u16; 4],
/// Versions of each field. (only used when deserialzing to a struct)
version_info: VersionInfo,
/// Whether to skip deserialzing current item. This flag is set by `VersionedSeqAccess`.
/// When set, the current item is deserialized to `None`
skip: bool,
/// Name of struct we are deserialzing into. We use this to make sure we call the correct
/// visitor for children of this struct who are also structs
name: &'static str,
}
pub fn from_slice<'a, T>(input: &'a [u8], de_version: [u16; 4]) -> Result<T, Error>
where
T: Deserialize<'a> + Version,
{
let mut deserializer = Deserializer::from_slice(input, de_version, T::version(), T::name());
let t = T::deserialize(&mut deserializer)?;
Ok(t)
}
Now that the deserializer has the all the information it needs, this is how we define deserialize_struct:
fn deserialize_struct<V>(
self, name: &'static str, fields: &'static [&'static str], visitor: V,
) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
if name == self.name {
if let VersionInfo::Struct(version_info) = self.version_info {
assert!(version_info.len() == fields.len()); // Make sure the caller implemented version info somewhat correctly. I use a derive macro to implement version so this is not a problem
visitor.visit_seq(VersionedSeqAccess::new(self, fields.len(), &version_info))
} else {
panic!("Struct must always have version info of `Struct` variant")
}
} else {
// This is for children structs of the main struct. We do not support versioning for those
visitor.visit_seq(SequenceAccess::new(self, fields.len()))
}
}
Here is how serde::de::SeqAccess will be implemented for VersionedSeqAccess:
struct VersionedSeqAccess<'a, 'de: 'a> {
de: &'a mut Deserializer<'de>,
version_info: &'static [VersionInfo],
len: usize,
curr: usize,
}
impl<'de, 'a> SeqAccess<'de> for VersionedSeqAccess<'a, 'de> {
type Error = Error;
fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Error>
where
T: DeserializeSeed<'de>,
{
if self.curr == self.len {
// We iterated through all fields
Ok(None)
} else {
// Get version of the current field
let version = &self.version_info[self.curr as usize];
self.de.version_info = version.clone();
// Set the flag if the version does not match
if !is_correct_version(&self.de.de_version, &version) {
self.de.skip = true;
}
self.curr += 1;
seed.deserialize(&mut *self.de).map(Some)
}
}
}
The final part of the puzzle is inside deserialize_option. If we are at a field not found in current data format the skip flag will be set here and we will produce None:
fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
if self.skip == true {
self.skip = false;
visitor.visit_none()
} else {
visitor.visit_some(self)
}
}
A lengthy solution but it works great for my usecase dealing with a lot of structs with lots of fields from different versions. Please do let me know how I can make this less verbose/better. I also implemented a derive macro (not shown here) for the Version trait to be able to do this:
#[derive(Debug, Clone, EventPrinter, Version)]
pub struct Position {
x: f32,
z: f32,
y: f32,
#[version([1, 13, 0, 0])]
is_visible: Option<bool>,
}
With this derive macro, I find that this solution tends to scale well for my usecase.

Related

Late type in Rust

I'm working with two crates: A and B. I control both. I'd like to create a struct in A that has a field whose type is known only to B (i.e., A is independent of B, but B is dependent on A).
crate_a:
#[derive(Clone)]
pub struct Thing {
pub foo: i32,
pub bar: *const i32,
}
impl Thing {
fn new(x: i32) -> Self {
Thing { foo: x, bar: &0 }
}
}
crate_b:
struct Value {};
fn func1() {
let mut x = A::Thing::new(1);
let y = Value {};
x.bar = &y as *const Value as *const i32;
...
}
fn func2() {
...
let y = unsafe { &*(x.bar as *const Value) };
...
}
This works, but it doesn't feel very "rusty". Is there a cleaner way to do this? I thought about using a trait object, but ran into issues with Clone.
Note: My reason for splitting these out is that the dependencies in B make compilation very slow. Value above is actually from llvm_sys. I'd rather not leak that into A, which has no other dependency on llvm.

The standard way to implement something like this is with generics, which are kind of like type variables: they can be "assigned" a particular type, possibly within some constraints. This is how the standard library can provide types like Vec that work with types that you declare in your crate.
Basically, generics allow Thing to be defined in terms of "some unknown type that will become known later when this type is actually used."
Given the example in your code, it looks like Thing's bar field may or may not be set, which suggests that the built-in Option enum should be used. All you have to do is put a type parameter on Thing and pass that through to Option, like so:
pub mod A {
#[derive(Clone)]
pub struct Thing<T> {
pub foo: i32,
pub bar: Option<T>,
}
impl<T> Thing<T> {
pub fn new(x: i32) -> Self {
Thing { foo: x, bar: None }
}
}
}
pub mod B {
use crate::A;
struct Value;
fn func1() {
let mut x = A::Thing::new(1);
let y = Value;
x.bar = Some(y);
// ...
}
fn func2(x: &A::Thing<Value>) {
// ...
let y: &Value = x.bar.as_ref().unwrap();
// ...
}
}
(Playground)
Here, the x in B::func1() has the type Thing<Value>. You can see with this syntax how Value is substituted for T, which makes the bar field Option<Value>.
If Thing's bar isn't actually supposed to be optional, just write pub bar: T instead, and accept a T in Thing::new() to initialize it:
pub mod A {
#[derive(Clone)]
pub struct Thing<T> {
pub foo: i32,
pub bar: T,
}
impl<T> Thing<T> {
pub fn new(x: i32, y: T) -> Self {
Thing { foo: x, bar: y }
}
}
}
pub mod B {
use crate::A;
struct Value;
fn func1() {
let mut x = A::Thing::new(1, Value);
// ...
}
fn func2(x: &A::Thing<Value>) {
// ...
let y: &Value = &x.bar;
// ...
}
}
(Playground)
Note that the definition of Thing in both of these cases doesn't actually require that T implement Clone; however, Thing<T> will only implement Clone if T also does. #[derive(Clone)] will generate an implementation like:
impl<T> Clone for Thing<T> where T: Clone { /* ... */ }
This can allow your type to be more flexible -- it can now be used in contexts that don't require T to implement Clone, while also being cloneable when T does implement Clone. You get the best of both worlds this way.

Default constructor implementation in a trait

Point and Vec2 are defined with the same variable and exactly the same constructor function:
pub struct Point {
pub x: f32,
pub y: f32,
}
pub struct Vec2 {
pub x: f32,
pub y: f32,
}
impl Point {
pub fn new(x: f32, y: f32) -> Self {
Self { x, y }
}
}
impl Vec2 {
pub fn new(x: f32, y: f32) -> Self {
Self { x, y }
}
}
Is it possible to define a trait to implement the constructor function?
So far I found it only possible to define the interface as the internal variables are not known:
pub trait TwoDimensional {
fn new(x: f32, y: f32) -> Self;
}

You can certainly define such a trait, and implement it for your 2 structs, but you will have to write the implementation twice. Even though traits can provide default implementations for functions, the following won't work:
trait TwoDimensional {
fn new(x: f32, y: f32) -> Self {
Self {
x,
y,
}
}
}
The reason why is fairly simple. What happens if you implement this trait for i32 or () or an enum?
Traits fundamentally don't have information about the underlying data structure that implements them. Rust does not support OOP, and trying to force it often leads to ugly, unidiomatic and less performant code.
If however, you have a bunch of structs and want to essentially "write the same impl multiple times without copy/pasting", a macro might be useful. This pattern is common in the standard library, where, for example, there are certain functions that are implemented for all integer types. For example:
macro_rules! impl_constructor {
($name:ty) => {
impl $name {
pub fn new(x: f32, y: f32) -> Self {
Self {
x, y
}
}
}
}
}
impl_constructor!(Point);
impl_constructor!(Vec2);
These macros expand at compile time, so if you do something invalid (e.g. impl_constructor!(i32), you'll get a compilation error, since the macro expansion woudl contain i32 { x, y }.
Personally I only use a macro when there is really a large number of types that need an implementation. This is just personal preference however, there is no runtime difference between a hand-written and a macro-generated impl block.

Is there a memory efficient way to change the behavior of an inherent implementation?

Is there a memory efficient way to change the behavior on an inherent implementation? At the moment, I can accomplish the change of behavior by storing a number of function pointers, which are then called by the inherent implementation. My difficulty is that there could potentially be a large number of such functions and a large number of objects that depend on these functions, so I'd like to reduce the amount of memory used. As an example, consider the code:
// Holds the data for some process
struct MyData {
x: f64,
y: f64,
fns: MyFns,
}
impl MyData {
// Create a new object
fn new(x: f64, y: f64) -> MyData {
MyData {
x,
y,
fns: CONFIG1,
}
}
// One of our functions
fn foo(&self) -> f64 {
(self.fns.f)(self.x, self.y)
}
// Other function
fn bar(&self) -> f64 {
(self.fns.g)(self.x, self.y)
}
}
// Holds the functions
struct MyFns {
f: fn(x: f64, y: f64) -> f64,
g: fn(x: f64, y: f64) -> f64,
}
// Some functions to use
fn add(x: f64, y: f64) -> f64 {
x + y
}
fn sub(x: f64, y: f64) -> f64 {
x - y
}
fn mul(x: f64, y: f64) -> f64 {
x * y
}
fn div(x: f64, y: f64) -> f64 {
x / y
}
// Create some configurations
const CONFIG1: MyFns = MyFns {
f: add,
g: mul,
};
const CONFIG2: MyFns = MyFns {
f: sub,
g: div,
};
fn main() {
// Create our structure
let mut data = MyData::new(1., 2.);
// Check our functions
println!(
"1: x={}, y={}, foo={}, bar={}",
data.x,
data.y,
data.foo(),
data.bar()
);
// Change the functions
data.fns = CONFIG2;
// Print the functions again
println!(
"2: x={}, y={}, foo={}, bar={}",
data.x,
data.y,
data.foo(),
data.bar()
);
// Change a single function
data.fns.f = add;
// Print the functions again
println!(
"3: x={}, y={}, foo={}, bar={}",
data.x,
data.y,
data.foo(),
data.bar()
);
}
This code allows the behavior of foo and bar to be changed by editing f and g. However, it also not flexible. I'd rather use a boxed trait object Box<dyn Fn(f64,f64)->f64, but then I can't create some default configurations like CONFIG1 and CONFIG2 because Box can not be used to create a constant object. In addition, if we have a large number of functions and objects, I'd like to share the memory for their implementation. For function pointers, this isn't a big deal, but for closures it is. Here, we can't create a constant Rc for the configuration to share the memory. Finally, we could have a static reference to a configuration, which would save memory, but then we could not change the individual functions. I'd rather we have a situation where most of the time we share memory for the functions, but have the ability hold its own memory and change the functions if desired.
I'm open to a better design if one is available. Ultimately, I'd like to change the behavior of foo and bar at runtime based on a function held, in some form or another, inside of MyData. Further, I'd like a way to do so where the memory is shared when possible and we have the ability to change an individual function and not just the entire configuration.

A plain dyn reference will work here - it allows references to objects that have a certain trait but with type known only at runtime.
(This is exactly what you want for function pointers. Think of it as each function having its own special type, but falling under a trait like Fn(f64,f64)->f64.)
So your struct could be defined as:
struct MyData<'a> {
x: f64,
y: f64,
f: &'a dyn Fn(f64, f64) -> f64,
g: &'a dyn Fn(f64, f64) -> f64,
}
(Notice, you need the lifetime specifier 'a to ensure the the lifetime of that references is not shorter than the struct itself.)
Then your impl could be like:
impl<'a> MyData<'a> {
// Create a new object
fn new(x: f64, y: f64) -> Self {
MyData {
x,
y,
f: &add, // f and g as in CONFIG1
g: &mul,
}
}
fn foo(&self) -> f64 {
(self.f)(self.x, self.y)
}
// etc...
}
Depending on how you want the default configurations to work, you could either make them as more inherent functions such as fn to_config2(&mut self); or you could make a separate struct just with the function pointers and then have a function to copy those function pointers into the MyData struct.

Can I implement a trait which adds information to an external type in Rust?

I just implemented a simple trait to keep the history of a struct property:
fn main() {
let mut weight = Weight::new(2);
weight.set(3);
weight.set(5);
println!("Current weight: {}. History: {:?}", weight.value, weight.history);
}
trait History<T: Copy> {
fn set(&mut self, value: T);
fn history(&self) -> &Vec<T>;
}
impl History<u32> for Weight {
fn set(&mut self, value: u32) {
self.history.push(self.value);
self.value = value;
}
fn history(&self) -> &Vec<u32> {
&self.history
}
}
pub struct Weight {
value: u32,
history: Vec<u32>,
}
impl Weight {
fn new(value: u32) -> Weight {
Weight {
value,
history: Vec::new(),
}
}
}
I don't expect this is possible, but could you add the History trait (or something equivalent) to something which doesn't already have a history property (like u32 or String), effectively tacking on some information about which values the variable has taken?

No. Traits cannot add data members to the existing structures. Actually, only a programmer can do that by modifying the definition of a structure. Wrapper structures or hash-tables are the ways to go.

No, traits can only contain behavior, not data. But you could make a struct.
If you could implement History for u32, you'd have to keep the entire history of every u32 object indefinitely, in case one day someone decided to call .history() on it. (Also, what would happen when you assign one u32 to another? Does its history come with it, or does the new value just get added to the list?)
Instead, you probably want to be able to mark specific u32 objects to keep a history. A wrapper struct, as red75prime's answer suggests, will work:
mod hist {
use std::mem;
pub struct History<T> {
value: T,
history: Vec<T>,
}
impl<T> History<T> {
pub fn new(value: T) -> Self {
History {
value,
history: Vec::new(),
}
}
pub fn set(&mut self, value: T) {
self.history.push(mem::replace(&mut self.value, value));
}
pub fn get(&self) -> T
where
T: Copy,
{
self.value
}
pub fn history(&self) -> &[T] {
&self.history
}
}
}
It's generic, so you can have a History<u32> or History<String> or whatever you want, but the get() method will only be implemented when the wrapped type is Copy.* Your Weight type could just be an alias for History<u32>. Here it is in the playground.
Wrapping this code in a module is a necessary part of maintaining the abstraction. That means you can't write weight.value, you have to call weight.get(). If value were marked pub, you could assign directly to weight.value (bypassing set) and then history would be inaccurate.
As a side note, you almost never want &Vec<T> when you can use &[T], so I changed the signature of history(). Another thing you might consider is returning an iterator over the previous values (perhaps in reverse order) instead of a slice.
* A better way of getting the T out of a History<T> is to implement Deref and write *foo instead of foo.get().

How to do Type-Length-Value (TLV) serialization with Serde?

I need to serialize a class of structs according to the TLV format with Serde. TLV can be nested in a tree format.
The fields of these structs are serialized normally, much like bincode does, but before the field data I must include a tag (to be associated, ideally) and the length, in bytes, of the field data.
Ideally, Serde would recognize the structs that need this kind of serialization, probably by having them implement a TLV trait. This part is optional, as I can also explicitly annotate each of these structs.
So this question breaks down in 3 parts, in order of priority:
How do I get the length data (from Serde?) before the serialization of that data has been performed?
How do I associate tags with structs (though I guess I could also include tags inside the structs..)?
How do I make Serde recognize a class of structs and apply custom serialization?
Note that 1) is the (core) question here. I will post 2) and 3) as individual questions if 1) can be solved with Serde.

Brace yourself, long post. Also, for convention: I'm picking both type and length to be unsigned 4 byte big endian. Let's start with the easy stuff:
How do I make Serde recognize a class of structs and apply custom serialization?
That's really a separate question, but you can either do that via the #[serde(serialize_with = …)] attributes, or in your serializer's fn serialize_struct(self, name: &'static str, _: usize) based on the name, depending on what exactly you have in mind.
How do I associate tags with structs (though I guess I could also include tags inside the structs..)?
This is a known limitation of serde, and the reason protobuf implementations typicall aren't based on serde (take e.g. prost), but have their own derive proc macros that allow to annotate structs and fields with the respective tags. You should probably do the same as it's clean and fast. But since you asked about serde, I'll pick an alternative inspired by serde_protobuf: if you look at it from a weird angle, serde is just a visitor-based reflection framework. It will provide you with structure information about the type you're currently (de-)serializing, e.g. it'll tell you type and name and fields of the type your visiting. All you need is a (user-supplied) function that maps from this type information to the tags. For example:
struct TLVSerializer<'a> {
ttf: &'a dyn Fn(TypeTagFor) -> u32,
…
}
impl<'a> Serializer for TLVSerializer<'a> {
fn serialize_bool(self, v: bool) -> Result<Self::Ok, Self::Error> {
let tag = &(self.ttf)(TypeTagFor::Bool).to_be_bytes();
let len = &1u32.to_be_bytes();
todo!("write");
}
fn serialize_i32(self, v: i32) -> Result<Self::Ok, Self::Error> {
let tag = &(self.ttf)(TypeTagFor::Int {
signed: true,
width: 4,
})
.to_be_bytes();
let len = &4u32.to_be_bytes();
todo!("write");
}
}
Then, you need to write a function that supplies the tags, e.g. something like:
enum TypeTagFor {
Bool,
Int { width: u8, signed: bool },
Struct { name: &'static str },
// ...
}
fn foobar_type_tag_for(ttf: TypeTagFor) -> u32 {
match ttf {
TypeTagFor::Int {
width: 4,
signed: true,
} => 0x69333200,
TypeTagFor::Bool => 0x626f6f6c,
_ => unreachable!(),
}
}
If you only have one set of type → tag mappings, you could also put it into the serializer directly.
How do I get the length data (from Serde?) before the serialization of that data has been performed?
The short answer is: Can't. The length can't be known without inspecting the entire structure (there could be Vecs in it, e.g.). But that also tells you what you need to do: You need to inspect the entire structure first, deduce the length, and then do the serialization. And you have precisely one method for inspecting the entire structure at hand: serde. So, you'll write a serializer that doesn't actually serialize anything and only records the length:
struct TLVLenVisitor;
impl Serializer for TLVLenVisitor {
type Ok = usize;
type SerializeSeq = TLVLenSumVisitor;
fn serialize_i32(self, _v: i32) -> Result<Self::Ok, Self::Error> {
Ok(4)
}
fn serialize_str(self, str: &str) -> Result<Self::Ok, Self::Error> {
Ok(str.len())
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Ok(TLVLenSumVisitor { sum: 0 })
}
}
struct TLVLenSumVisitor {
sum: usize,
}
impl serde::ser::SerializeSeq for TLVLenSumVisitor {
type Ok = usize;
fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<(), Self::Error> {
// The length of a sequence is the length of all its parts, plus the bytes for type tag and length
self.sum += value.serialize(TLVLenVisitor)? + HEADER_LEN;
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.sum)
}
}
Fortunately, serialization is non-destructive, so you can use this first serializer to get the length, and then do the actual serialization in a second pass:
let len = foobar.serialize(TLVLenVisitor).unwrap();
foobar.serialize(TLVSerializer {
target: &mut File::create("foobar").unwrap(), // No seeking performed on the file
len,
ttf: &foobar_type_tag_for,
})
.unwrap();
Since you already know the length of what you're serializing, the second serializer is relatively straightforward:
struct TLVSerializer<'a> {
target: &'a mut dyn Write, // Using dyn to reduce verbosity of the example
len: usize,
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> Serializer for TLVSerializer<'a> {
type Ok = ();
type SerializeSeq = TLVSeqSerializer<'a>;
// Glossing over error handling here.
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
self.target
.write_all(&(self.ttf)(TypeTagFor::Seq).to_be_bytes())
.unwrap();
// Normally, there'd be no way to find the length here.
// But since TLVSerializer has been told, there's no problem
self.target
.write_all(&u32::try_from(self.len).unwrap().to_be_bytes())
.unwrap();
Ok(TLVSeqSerializer {
target: self.target,
ttf: self.ttf,
})
}
}
The only snag you may hit is that the TLVLenVisitor only gave you one length. But you have many TLV-structures, recursively nested. When you want to write out one of the nested structures (e.g. a Vec), you just run the TLVLenVisitor again, for each element.
struct TLVSeqSerializer<'a> {
target: &'a mut dyn Write,
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> serde::ser::SerializeSeq for TLVSeqSerializer<'a> {
type Ok = ();
fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<(), Self::Error> {
value.serialize(TLVSerializer {
// Getting the length of a subfield here
len: value.serialize(TLVLenVisitor)?,
target: self.target,
ttf: self.ttf,
})
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(())
}
}
Playground
This also means that you may have to do many passes over the structure you're serializing. This might be fine if speed is not of the essence and you're memory-constrained, but in general, I don't think it's a good idea. You may be tempted to try to get all the lengths in the entire structure in a single pass, which can be done, but it'll either be brittle (since you'd have to rely on visiting order) or difficult (because you'd have to build a shadow structure which contains all the lengths).
Also, do note that this approach expects that two serializer invocations of the same struct traverse the same structure. But an implementer of Serialize is perfectly capable to generating random data on the fly or mutating itself via internal mutability. Which would make this serializer generate invalid data. You can ignore that problem since it's far-fetched, or add a check to the end call and make sure the written length matches the actual written data.
Really, I think it'd be best if you don't worry about finding the length before serialization and wrote the serialization result to memory first. To do so, you can first write all length fields as a dummy value to a Vec<u8>:
struct TLVSerializer<'a> {
target: &'a mut Vec<u8>,
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> Serializer for TLVSerializer<'a> {
type Ok = ();
type SerializeSeq = TLVSeqSerializer<'a>;
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
let idx = self.target.len();
self.target
.extend((self.ttf)(TypeTagFor::Seq).to_be_bytes());
// Writing dummy length here
self.target.extend(u32::MAX.to_be_bytes());
Ok(TLVSeqSerializer {
target: self.target,
idx,
ttf: self.ttf,
})
}
}
Then after you serialize the content and know its length, you can overwrite the dummies:
struct TLVSeqSerializer<'a> {
target: &'a mut Vec<u8>,
idx: usize, // This is how it knows where it needs to write the length
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> serde::ser::SerializeSeq for TLVSeqSerializer<'a> {
type Ok = ();
fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<(), Self::Error> {
value.serialize(TLVSerializer {
target: self.target,
ttf: self.ttf,
})
}
fn end(self) -> Result<Self::Ok, Self::Error> {
end(self.target, self.idx)
}
}
fn end(target: &mut Vec<u8>, idx: usize) -> Result<(), std::fmt::Error> {
let len = u32::try_from(target.len() - idx - HEADER_LEN)
.unwrap()
.to_be_bytes();
target[idx + 4..][..4].copy_from_slice(&len);
Ok(())
}
Playground. And there you go, single pass TLV serialization with serde.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Serde Conditional Deserialization for Binary Formats (Versioning) - rust

Related

Late type in Rust

Default constructor implementation in a trait

Is there a memory efficient way to change the behavior of an inherent implementation?

Can I implement a trait which adds information to an external type in Rust?

How to do Type-Length-Value (TLV) serialization with Serde?

Categories

Resources