Serialize and Deserialize from Vec<u32> - rust

I have a struct of 5 u32s that implements serialize/deserialize by simply serializing: (s.first, s.second, s.third, s.fourth, s.fifth).
However this needs to be packed and unpacked from a flat buffer of Vec<u32> or an Option<Vec32> that represent the data: essentially every 5 u32s is a new struct. I keep struggling with the visitor implementation. Is there an easy way to do this while sharing code between the Option and non Option cases?
I really want to do impl Serialize for Vec<MyType> (and Deserialize) but that doesn't work.

I ended up abandoning my Serialize and Deserialize impls and went with #[serde(with="my_mod"] for the Vec<MyType> case.
For the Option<Vec<MyType>> case I ended up creating wrapper types that inverted the relationship so that what I was really serializing/deserializing were Option<Wrapper { Vec<T> }>

Related

Rust "random access" iterator

I would like to write an iterator for objects that implement a trait that allows accessing items by index. Something like this:
trait Collection {
fn get_item(&self, index: usize) -> &Item;
fn get_item_mut(&mut self, index: usize) -> &mut Item;
}
I know how to implement next() function for both Iter and IterMut. My question is which other Iterator methods do I need to implement to make the iterators as efficient as possible. What I mean by this is that for example nth() goes directly to the item, not calling next() until it reaches it. I'd like to know the minimum amount of functions I need to implement to make it work as fast as a slice iterator.
I've googled about this but can't find anything concrete on this topic.
On nightly, you can implement advance_by(). This will give you an efficient nth() for free, as well as other code that uses it in the standard library.
On stable, you need to implement nth(). And of course, ExactSizeIterator and size_hint(). For extra bit of performance you can also implement TrustedLen on nightly, allowing e.g. collect::<Vec<_>>() to take advantage of the fact that the amount of items yielded is guaranteed.
Unfortunately, you can never be as efficient as slice iterators, because they implement various std-private traits (e.g. TrustedRandomAccess) and libstd uses those traits with various specializations to make them more efficient in various situations.

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.
What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.
You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

How to implement Serde Deserializer for wire protocol with multiple representations of strings?

I am trying to implement a Serde Serializer/Deserializer for the Kafka wire protocol. In this protocol, there are 4 different string representations. This poses a problem when implementing the deserializer: when deserialize_str is called to attempt to deserialize a string for a given message, there’s no way to know whether a string starts with i32 or a varint, since neither the deserialize_str method or the provided visitor provides any kind of metadata or type information that could be used to help make this decision.
My first thought was that I could implement a new type wrapper and use a custom deserialize implementation, but I now understand this doesn’t make sense because it needs to be generic over all deserializers, not just the deserializer I’m building. The wrapper still just asks the Deserializer to read a string.
I’m struggling to come up with a good solution here, and can’t find examples of other data formats that use multiple representations for a given data type.
Here's the EXTREMELY HACKY way that I'm planning on solving this using a new type wrapper and custom Deserialize implementation. More specifically, my Deserialize impl will have it's own Visitor and we're going abuse Visitor here.
First I was hoping that I could use TypeId on Visitor::Value, but Visitor doesn't have a static lifetime, so that doesn't work.
Instead, I'm using the "expected" error message as a type tag, using the serde::de::Expected trait:
fn deserialize_str<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
use serde::de::Expected;
let ty = format!("{}", &visitor as &dyn Expected);
if ty == "NullableString" {
// ... do nullable dstring deserialization logic here
}
// ...
}
This... works! But feels extremely gross. Not totally satisfied, but goign to proceed with this for now.

How can I sort fields in alphabetic order when serializing with serde?

I've an API that requires the object's fields to be sorted alphabetically because the struct has to be hashed.
In Java/Jackson, you can set a flag in the serializer: MapperFeature.SORT_PROPERTIES_ALPHABETICALLY. I can't find anything similar in Serde.
I'm using rmp-serde (MessagePack). It follows the annotations and serialization process used for JSON, so I thought that it would be fully compatible, but the sorting provided by #jonasbb doesn't work for it.
The struct has (a lot of) nested enums and structs which have to be flattened for the final representation. I'm using Serialize::serialize for that, but calling state.serialize_field at the right place (such that everything is alphabetical) is a pain because the enums need a match clause, so it has to be called multiple times for the same field in different places and the code is very difficult to follow.
As possible solutions, two ideas:
Create a new struct with the flat representation and sort the fields alphabetically manually.
This is a bit error prone, so a programmatic sorting solution for this flattened struct would be great.
Buffer the key values in Serialize::serialize (e.g. in a BTreeMap, which is sorted), and call state.serialize_field in a loop at the end.
The problem is that the values seem to have to be of type Serialize, which isn't object safe, so I wasn't able to figure out how to store them in the map.
How to sort HashMap keys when serializing with serde? is similar but not related because my question is about the sorting of the struct's fields/properties.
You are not writing which data format you are targetting. This makes it hard to find a solution, since some might not work in all cases.
This code works if you are using JSON (unless the preserve_order feature flag is used). The same would for for TOML by serializing into toml::Value as intermediate step.
The solution will also work for other data formats, but it might result in a different serialization, for example, emitting the data as a map instead of struct-like.
fn sort_alphabetically<T: Serialize, S: serde::Serializer>(value: &T, serializer: S) -> Result<S::Ok, S::Error> {
let value = serde_json::to_value(value).map_err(serde::ser::Error::custom)?;
value.serialize(serializer)
}
#[derive(Serialize)]
struct SortAlphabetically<T: Serialize>(
#[serde(serialize_with = "sort_alphabetically")]
T
);
#[derive(Serialize, Deserialize, Default, Debug)]
struct Foo {
z: (),
bar: (),
ZZZ: (),
aAa: (),
AaA: (),
}
println!("{}", serde_json::to_string_pretty(&SortAlphabetically(&Foo::default()))?);
because the struct has to be hashed
While field order is one source of indeterminism there are other factors too. Many formats allow different amounts of whitespace or different representations like Unicode escapes \u0066.

Indexing vector by a 32-bit integer

In Rust, vectors are indexed using usize, so when writing
let my_vec: Vec<String> = vec!["Hello", "world"];
let index: u32 = 0;
println!("{}", my_vec[index]);
you get an error, as index is expected to be of type usize. I'm aware that this can be fixed by explicitly converting index to usize:
my_vec[index as usize]
but this is tedious to write. Ideally I'd simply overload the [] operator by implementing
impl<T> std::ops::Index<u32> for Vec<T> { ... }
but that's impossible as Rust prohibits this (as neither the trait nor struct are local). The only alternative that I can see is to create a wrapper class for Vec, but that would mean having to write lots of function wrappers as well. Is there any more elegant way to address this?
Without a clear use case it is difficult to recommend the best approach.
There are basically two questions here:
do you really need indexing?
do you really need to use u32 for indices?
When using functional programming style, indexing is generally unnecessary as you operate on iterators instead. In this case, the fact that Vec only implements Index for usize really does not matter.
If your algorithm really needs indexing, then why not use usize? There are many ways to convert from u32 to usize, converting at the last moment possible is one possibility, but there are other sites where you could do the conversion, and if you find a chokepoint (or create it) you can get away with only a handful of conversions.
At least, that's the YAGNI point of view.
Personally, as a type freak, I tend to wrap things around a lot. I just like to add semantic information, because let's face it Vec<i32> just doesn't mean anything.
Rust offers a simple way to create wrapper structures: struct MyType(WrappedType);. That's it.
Once you have your own type, adding indexing is easy. There are several ways to add other operations:
if only a few operations make sense, then adding explicitly is best.
if many operations are necessary, and you do not mind exposing the fact that underneath is a Vec<X>, then you can expose it:
by making it public: struct MyType(pub WrappedType);, users can then call .0 to access it.
by implementing AsRef and AsMut, or creating a getter.
by implementing Deref and DerefMut (which is implicit, make sure you really want to).
Of course, breaking encapsulation can be annoying later, as it also prevents the maintenance of invariants, so I would consider it a last ditch solution.
I prefer to store "references" to nodes as u32 rather than usize. So when traversing the graph I keep retrieving adjacent vertex "references", which I then use to look up the actual vertex object in the Vec object
So actually you don't want u32, because you will never do calculations on it, and u32 easily allows you to do math. You want an index-type that can just do indexing but whose values are immutable otherwise.
I suggest you implement something along the line of rustc_data_structures::indexed_vec::IndexVec.
This custom IndexVec type is not only generic over the element type, but also over the index type, and thus allows you to use a NodeId newtype wrapper around u32. You'll never accidentally use a non-id u32 to index, and you can use them just as easily as a u32. You don't even have to create any of these indices by calculating them from the vector length, instead the push method returns the index of the location where the element has just been inserted.

Resources