Serde serialize_seq with unknown len

Serde serialize_seq with unknown len - rust

The documentation for serialize_seq states
Begin to serialize a variably sized sequence. This call must be followed by zero or more calls to serialize_element, then a call to end.
The argument is the number of elements in the sequence, which may or may not be computable before the sequence is iterated. Some serializers only support sequences whose length is known up front.
I want to serialize a sequence whose length is unknown before iterating over the sequence.
The trivial example they give is:
use serde::ser::{Serialize, Serializer, SerializeSeq};
impl<T> Serialize for Vec<T>
where
T: Serialize,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut seq = serializer.serialize_seq(Some(self.len()))?;
for element in self {
seq.serialize_element(element)?;
}
seq.end()
}
}
How can I modify the code above to support a length which is not computable before the sequence is iterated?
Maybe I have misunderstood the documentation, but I would expect something similar to this to work:
use serde::ser::{Serialize, Serializer, SerializeSeq};
impl<T> Serialize for Vec<T>
where
T: Serialize,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut lazy_len = 0
let mut seq = serializer.serialize_seq(Some(lazy_len))?;
for element in self {
lazy_len += 2;
seq.serialize_element(element)?;
}
seq.end()
}
}
Which would give twice the length of the sequence.

The length paramater is an Option<usize> because maybe you have one and maybe you don't. You don't have one, so use None instead of Some.
let mut seq = serializer.serialize_seq(None)?;
As the note says, some serializers require a length and others don't.
Serializers will use the value that you pass to serialize_seq to (for example) allocate a buffer immediately. They get better performance by doing this once and then serializing each element into that memory. Extending the length with each item would not be at all beneficial since they will need to extend the buffer each time too, which is exactly what they'll do if you pass None for the length.
This should work fine with Json, but some other more efficient formats like Bincode might refuse to serialize without it. In particular, some encoding formats will serialize the length as part of the format before the values.
Maybe I have misunderstood the documentation, but I would expect something similar to this to work:
...
let mut lazy_len = 0
let mut seq = serializer.serialize_seq(Some(lazy_len))?;
for element in self {
lazy_len += 2;
seq.serialize_element(element)?;
}
...
Integers are Copy types, so serializer.serialize_seq(Some(lazy_len)) just copies the value of lazy_len and mutating lazy_len afterwards will have no effect.

Related

Length in serde serialize

Maybe I haven't read the documentation too carefully but, I didn't manage to get answer to the following question.
When having a custom serialization logic defined like:
pub fn serialize_foo<S>(t: &Foo, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut map = serializer.serialize_map(Some(len))?;
/*
...
*/
map.end()
}
What is len? Is it the size of a type defined in bytes, or is it something else? For example, what would be the type for i32 in this case?

From Serializer::serialize_map:
The argument is the number of elements in the map, which may or may not be computable before the map is iterated. Some serializers only support maps whose length is known up front.
So it is the number of times that you will call .serialize_entry() (or .serialize_key() and .serialize_value() pairs) on map.

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?

The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.

TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

Formatting a byte slice in Rust

Using Rust, I want to take a slice of bytes from a vec and display them as hex, on the console, I can make this work using the itertools format function, and println!, but I cannot figure out how it works, here is the code, simplified...
use itertools::Itertools;
// Create a vec of bytes
let mut buf = vec![0; 1024];
... fill the vec with some data, doesn't matter how, I'm reading from a socket ...
// Create a slice into the vec
let bytes = &buf[..5];
// Print the slice, using format from itertools, example output could be: 30 27 02 01 00
println!("{:02x}", bytes.iter().format(" "));
(as an aside, I realize I can use the much simpler itertools join function, but in this case I don't want the default 0x## style formatting, as it is somewhat bulky)
How on earth does this work under the covers? I know itertools format is creating a "Format" struct, and I can see the source code here, https://github.com/rust-itertools/itertools/blob/master/src/format.rs , but I am none the wiser. I suspect the answer has to do with "macro_rules! impl_format" but that is just about where my head explodes.
Can some Rust expert explain the magic? I hate to blindly copy paste code without a clue, am I abusing itertools, maybe there a better, simpler way to go about this.

I suspect the answer has to do with "macro_rules! impl_format" but that is just about where my head explodes.
The impl_format! macro is used to implement the various formatting traits.
impl_format!{Display Debug
UpperExp LowerExp UpperHex LowerHex Octal Binary Pointer}
The author has chosen to write a macro because the implementations all look the same. The way repetitions work in macros means that macros can be very helpful even when they are used only once (here, we could do the same by invoking the macro once for each trait, but that's not true in general).
Let's expand the implementation of LowerHex for Format and look at it:
impl<'a, I> fmt::LowerHex for Format<'a, I>
where I: Iterator,
I::Item: fmt::LowerHex,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
self.format(f, fmt::LowerHex::fmt)
}
}
The fmt method calls another method, format, defined in the same module.
impl<'a, I> Format<'a, I>
where I: Iterator,
{
fn format<F>(&self, f: &mut fmt::Formatter, mut cb: F) -> fmt::Result
where F: FnMut(&I::Item, &mut fmt::Formatter) -> fmt::Result,
{
let mut iter = match self.inner.borrow_mut().take() {
Some(t) => t,
None => panic!("Format: was already formatted once"),
};
if let Some(fst) = iter.next() {
cb(&fst, f)?;
for elt in iter {
if self.sep.len() > 0 {
f.write_str(self.sep)?;
}
cb(&elt, f)?;
}
}
Ok(())
}
}
format takes two arguments: the formatter (f) and a formatting function (cb for callback). The formatting function here is fmt::LowerHex::fmt. This is the fmt method from the LowerHex trait; how does the compiler figure out which LowerHex implementation to use? It's inferred from format's type signature. The type of cb is F, and F must implement FnMut(&I::Item, &mut fmt::Formatter) -> fmt::Result. Notice the type of the first argument: &I::Item (I is the type of the iterator that was passed to format). LowerHex::fmt' signature is:
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result;
For any type Self that implements LowerHex, this function will implement FnMut(&Self, &mut fmt::Formatter) -> fmt::Result. Thus, the compiler infers that Self == I::Item.
One important thing to note here is that the formatting attributes (e.g. the 02 in your formatting string) are stored in the Formatter. Implementations of e.g. LowerHex will use methods such as Formatter::width to retrieve an attribute. The trick here is that the same formatter is used to format multiple values (with the same attributes).
In Rust, methods can be called in two ways: using method syntax and using function syntax. These two functions are equivalent:
use std::fmt;
pub fn method_syntax(f: &mut fmt::Formatter) -> fmt::Result {
use fmt::LowerHex;
let x = 42u8;
x.fmt(f)
}
pub fn function_syntax(f: &mut fmt::Formatter) -> fmt::Result {
let x = 42u8;
fmt::LowerHex::fmt(&x, f)
}
When format is called with fmt::LowerHex::fmt, this means that cb refers to fmt::LowerHex::fmt. format must use function call because there's no guarantee that the callback is even a method!
am I abusing itertools
Not at all; in fact, this is precisely how format is meant to be used.
maybe there a better, simpler way to go about this
There are simpler ways, sure, but using format is very efficient because it doesn't allocate dynamic memory.

Can I deserialize vectors with variable length prefix with Bincode?

I am having a problem with the Rust bincode library. When it serializes a vector, it always assumes the prefixed length is 8 bytes. This is a fine assumption when you always encode data using bincode because bincode can read it's own serialized data.
I am in the situation where I cannot influence the serializer as I did not write it and it has to stay the same for legacy reasons. It encodes its vectors as a length-prefixed array where the prefix is always 2 bytes (or in some cases it is 4 bytes but but I know these cases well. Once I know how to do it with 2 bytes 4 bytes should not be a problem).
How can I use bincode (and serde for that matter) to deserialize these fields? Can I work around the default 8 bytes of length hardcoded in bincode?

Bincode is not supposed to be compatible with any existing serializer or standard. Nor is, according to the comment, the format you are trying to read.
I suggest you get the bincode sources—they are MIT-licensed, so you are free to do basically whatever you please with them—and modify them to suit your format (and give it your name and include it in your project).
serde::Deserializer is quite well documented, as is the underlying data model, and the implementation in bincode is trivial to find (in de/mod.rs), so take it as your starting point and adjust as needed.

I have figured out a (possibly very ugly) way to do it without implementing my own deserializer — Bincode could do it after all. It looks something like this:
impl<'de> Deserialize<'de> for VarLen16 {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct VarLen16Visitor;
impl<'de> Visitor<'de> for VarLen16Visitor {
type Value = VarLen16;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("VarLen16")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut res: Vec<u8> = vec![];
let length: u16 = seq
.next_element()?
.ok_or_else(|| serde::de::Error::invalid_length(1, &self))?;
for i in 0..length {
res.push(
seq.next_element()?
.ok_or_else(|| serde::de::Error::invalid_length(1, &self))?,
);
}
return Ok(VarLen16(res));
}
}
return Ok(deserializer.deserialize_tuple(1 << 16, VarLen16Visitor)?);
}
}
In short, I make the system think I deserialize a tuple where I set the length to the maximum I need. I have tested this, it does not actually allocate that much memory. Then I act like the length is part of this tuple, read it first and then continue reading as far as this length tells me to. It's not pretty but it certainly works.

How do I compare a vector against a reversed version of itself?

Why won't this compile?
fn isPalindrome<T>(v: Vec<T>) -> bool {
return v.reverse() == v;
}
I get
error[E0308]: mismatched types
--> src/main.rs:2:25
|
2 | return v.reverse() == v;
| ^ expected (), found struct `std::vec::Vec`
|
= note: expected type `()`
found type `std::vec::Vec<T>`

Since you only need to look at the front half and back half, you can use the DoubleEndedIterator trait (methods .next() and .next_back()) to look at pairs of front and back elements this way:
/// Determine if an iterable equals itself reversed
fn is_palindrome<I>(iterable: I) -> bool
where
I: IntoIterator,
I::Item: PartialEq,
I::IntoIter: DoubleEndedIterator,
{
let mut iter = iterable.into_iter();
while let (Some(front), Some(back)) = (iter.next(), iter.next_back()) {
if front != back {
return false;
}
}
true
}
(run in playground)
This version is a bit more general, since it supports any iterable that is double ended, for example slice and chars iterators.
It only examines each element once, and it automatically skips the remaining middle element if the iterator was of odd length.

Read up on the documentation for the function you are using:
Reverse the order of elements in a slice, in place.
Or check the function signature:
fn reverse(&mut self)
The return value of the method is the unit type, an empty tuple (). You can't compare that against a vector.
Stylistically, Rust uses 4 space indents, snake_case identifiers for functions and variables, and has an implicit return at the end of blocks. You should adjust to these conventions in a new language.
Additionally, you should take a &[T] instead of a Vec<T> if you are not adding items to the vector.
To solve your problem, we will use iterators to compare the slice. You can get forward and backward iterators of a slice, which requires a very small amount of space compared to reversing the entire array. Iterator::eq allows you to do the comparison succinctly.
You also need to state that the T is comparable against itself, which requires Eq or PartialEq.
fn is_palindrome<T>(v: &[T]) -> bool
where
T: Eq,
{
v.iter().eq(v.iter().rev())
}
fn main() {
println!("{}", is_palindrome(&[1, 2, 3]));
println!("{}", is_palindrome(&[1, 2, 1]));
}
If you wanted to do the less-space efficient version, you have to allocate a new vector yourself:
fn is_palindrome<T>(v: &[T]) -> bool
where
T: Eq + Clone,
{
let mut reverse = v.to_vec();
reverse.reverse();
reverse == v
}
fn main() {
println!("{}", is_palindrome(&[1, 2, 3]));
println!("{}", is_palindrome(&[1, 2, 1]));
}
Note that we are now also required to Clone the items in the vector, so we add that trait bound to the method.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Serde serialize_seq with unknown len - rust

Related

Length in serde serialize

Proper signature for a function accepting an iterator of strings

Formatting a byte slice in Rust

Can I deserialize vectors with variable length prefix with Bincode?

How do I compare a vector against a reversed version of itself?

Categories

Resources