Length in serde serialize

Length in serde serialize - rust

Maybe I haven't read the documentation too carefully but, I didn't manage to get answer to the following question.
When having a custom serialization logic defined like:
pub fn serialize_foo<S>(t: &Foo, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut map = serializer.serialize_map(Some(len))?;
/*
...
*/
map.end()
}
What is len? Is it the size of a type defined in bytes, or is it something else? For example, what would be the type for i32 in this case?

From Serializer::serialize_map:
The argument is the number of elements in the map, which may or may not be computable before the map is iterated. Some serializers only support maps whose length is known up front.
So it is the number of times that you will call .serialize_entry() (or .serialize_key() and .serialize_value() pairs) on map.

Related

Serde serialize_seq with unknown len

The documentation for serialize_seq states
Begin to serialize a variably sized sequence. This call must be followed by zero or more calls to serialize_element, then a call to end.
The argument is the number of elements in the sequence, which may or may not be computable before the sequence is iterated. Some serializers only support sequences whose length is known up front.
I want to serialize a sequence whose length is unknown before iterating over the sequence.
The trivial example they give is:
use serde::ser::{Serialize, Serializer, SerializeSeq};
impl<T> Serialize for Vec<T>
where
T: Serialize,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut seq = serializer.serialize_seq(Some(self.len()))?;
for element in self {
seq.serialize_element(element)?;
}
seq.end()
}
}
How can I modify the code above to support a length which is not computable before the sequence is iterated?
Maybe I have misunderstood the documentation, but I would expect something similar to this to work:
use serde::ser::{Serialize, Serializer, SerializeSeq};
impl<T> Serialize for Vec<T>
where
T: Serialize,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut lazy_len = 0
let mut seq = serializer.serialize_seq(Some(lazy_len))?;
for element in self {
lazy_len += 2;
seq.serialize_element(element)?;
}
seq.end()
}
}
Which would give twice the length of the sequence.

The length paramater is an Option<usize> because maybe you have one and maybe you don't. You don't have one, so use None instead of Some.
let mut seq = serializer.serialize_seq(None)?;
As the note says, some serializers require a length and others don't.
Serializers will use the value that you pass to serialize_seq to (for example) allocate a buffer immediately. They get better performance by doing this once and then serializing each element into that memory. Extending the length with each item would not be at all beneficial since they will need to extend the buffer each time too, which is exactly what they'll do if you pass None for the length.
This should work fine with Json, but some other more efficient formats like Bincode might refuse to serialize without it. In particular, some encoding formats will serialize the length as part of the format before the values.
Maybe I have misunderstood the documentation, but I would expect something similar to this to work:
...
let mut lazy_len = 0
let mut seq = serializer.serialize_seq(Some(lazy_len))?;
for element in self {
lazy_len += 2;
seq.serialize_element(element)?;
}
...
Integers are Copy types, so serializer.serialize_seq(Some(lazy_len)) just copies the value of lazy_len and mutating lazy_len afterwards will have no effect.

What is the difference between TryFrom<&[T]> and TryFrom<Vec<T>>?

There seem to be two ways to try to turn a vector into an array, either via a slice (fn a) or directly (fn b):
use std::array::TryFromSliceError;
use std::convert::TryInto;
type Input = Vec<u8>;
type Output = [u8; 1000];
// Rust 1.47
pub fn a(vec: Input) -> Result<Output, TryFromSliceError> {
vec.as_slice().try_into()
}
// Rust 1.48
pub fn b(vec: Input) -> Result<Output, Input> {
vec.try_into()
}
Practically speaking, what's the difference between these? Is it just the error type? The fact that the latter was added makes me wonder whether there's more to it than that.

They have slightly different behavior.
The slice to array implementation will copy the elements from the slice. It has to copy instead of move because the slice doesn't own the elements.
The Vec to array implementation will consume the Vec and move its contents to the new array. It can do this because it does own the elements.

How to use std::iter::Iterator::map for tree-like structures in Rust?

As I understand the idiomatic way to apply a function to each element of a structure in Rust, is to implement IntoIterator and FromIterator and use map and collect. Like this:
enum F<A> {
// fields omitted
}
impl<A> IntoIterator for F<A> {
// implementation omitted
}
impl<A> FromIterator<A> for F<A> {
// implementation omitted
}
fn mapF<A, B>(x : F<A>, f) -> F<B>
where f : Fn(A) -> B
{
x.into_iter().map(f).collect()
}
However it doesn't seem possible to implement FromIterator for a tree, because there are multiple ways to organize a sequence of values into a tree. Is there some way around this?

the idiomatic way to apply a function to each element of a structure in Rust, is to implement IntoIterator and FromIterator
This is not quite true. The idiomatic way is to provide one iterator, but you don't have to implement these traits.
Take for example &str: there isn't a canonical way to iterate on a string. You could iterate on its bytes or its characters, therefore it doesn't implement IntoIterator but has two methods bytes and chars returning a different type of iterator.
A tree would be similar: there isn't a single way to iterate a tree, so it could have a depth_first_search method returning a DepthFirstSearch iterator and a breadth_first_search method returning a BreadthFirstSearch iterator.
Similarly a String can be constructed from an iterator of &str or and iterator of char so String implements both FromIterator<&str> and FromIterator<char>, but it does not implement FromIterator<u8> because random bytes are unlikely to form a valid UTF-8 string.
That is, there isn't always a one-to-one relation between a collection, and its iterator.
and use […] collect
This is (mostly) incorrect. Collecting is not a good way to consume an iterator, unless you actually want to use the collected result afterwards. If you only want to execute the effect of an iterator, use for of the for_each method.

You could include information about tree structure into the iterator, something like
impl F {
pub fn path_iter(self) -> impl Iterator<Iter=(TreePath, A)> { ... }
// rest of impl
}
impl<A> FromIterator<(TreePath, A)> for F<A> {
// implementation omitted
}
fn mapF<A, B>(x : F<A>, f) -> F<B>
where f : Fn(A) -> B
{
x.path_iter().map(|pair| (pair.0, f(pair.1))).collect()
}
With TreePath a type specific for your tree. Probably better representing not the path itself but how to move to the next node.
I originally suggested implementing IntoIterator with Item = (TreePath, A) but on further thought the default iterator should still have Item = A.

Can I deserialize vectors with variable length prefix with Bincode?

I am having a problem with the Rust bincode library. When it serializes a vector, it always assumes the prefixed length is 8 bytes. This is a fine assumption when you always encode data using bincode because bincode can read it's own serialized data.
I am in the situation where I cannot influence the serializer as I did not write it and it has to stay the same for legacy reasons. It encodes its vectors as a length-prefixed array where the prefix is always 2 bytes (or in some cases it is 4 bytes but but I know these cases well. Once I know how to do it with 2 bytes 4 bytes should not be a problem).
How can I use bincode (and serde for that matter) to deserialize these fields? Can I work around the default 8 bytes of length hardcoded in bincode?

Bincode is not supposed to be compatible with any existing serializer or standard. Nor is, according to the comment, the format you are trying to read.
I suggest you get the bincode sources—they are MIT-licensed, so you are free to do basically whatever you please with them—and modify them to suit your format (and give it your name and include it in your project).
serde::Deserializer is quite well documented, as is the underlying data model, and the implementation in bincode is trivial to find (in de/mod.rs), so take it as your starting point and adjust as needed.

I have figured out a (possibly very ugly) way to do it without implementing my own deserializer — Bincode could do it after all. It looks something like this:
impl<'de> Deserialize<'de> for VarLen16 {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct VarLen16Visitor;
impl<'de> Visitor<'de> for VarLen16Visitor {
type Value = VarLen16;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("VarLen16")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut res: Vec<u8> = vec![];
let length: u16 = seq
.next_element()?
.ok_or_else(|| serde::de::Error::invalid_length(1, &self))?;
for i in 0..length {
res.push(
seq.next_element()?
.ok_or_else(|| serde::de::Error::invalid_length(1, &self))?,
);
}
return Ok(VarLen16(res));
}
}
return Ok(deserializer.deserialize_tuple(1 << 16, VarLen16Visitor)?);
}
}
In short, I make the system think I deserialize a tuple where I set the length to the maximum I need. I have tested this, it does not actually allocate that much memory. Then I act like the length is part of this tuple, read it first and then continue reading as far as this length tells me to. It's not pretty but it certainly works.

How is from_raw_parts_mut able to transmute between types of different sizes?

I am looking at the code of from_raw_parts_mut:
pub unsafe fn from_raw_parts_mut<'a, T>(p: *mut T, len: usize) -> &'a mut [T] {
mem::transmute(Repr { data: p, len: len })
}
It uses transmute to reinterpret a Repr to a &mut [T]. As far as I understand, Repr is a 128 bit struct. How does this transmute of differently sized types work?

mem::transmute() does only work when transmuting to a type of the same size - so that means an &mut[T] slice is also the same size.
Looking at Repr:
#[repr(C)]
struct Repr<T> {
pub data: *const T,
pub len: usize,
}
It has a pointer to some data and a length. This is exactly what a slice is - a pointer to an array of items (which might be an actual array, or owned by a Vec<T>, etc.) with a length to say how many items are valid.
The object which is passed around as a slice is (under the covers) exactly what the Repr looks like, even though the data it refers to can be anything from 0 to as many T as will fit into memory.
In Rust, some references are not just implemented as a pointer as in some other languages. Some types are "fat pointers". This might not be obvious at first since, especially if you are familiar with references/pointers in some other languages! Some examples are:
Slices &[T] and &mut [T], which as described above, are actually a pointer and length. The length is needed for bounds checks. For example, you can pass a slice corresponding to part of an array or Vec to a function.
Trait objects like &Trait or Box<Trait>, where Trait is a trait rather than a concrete type, are actually a pointer to the concrete type and a pointer to a vtable — the information needed to call trait methods on the object, given that its concrete type is not known.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Length in serde serialize - rust

Related

Serde serialize_seq with unknown len

What is the difference between TryFrom<&[T]> and TryFrom<Vec<T>>?

How to use std::iter::Iterator::map for tree-like structures in Rust?

Can I deserialize vectors with variable length prefix with Bincode?

How is from_raw_parts_mut able to transmute between types of different sizes?

Categories

Resources