Custom data format - `Deserializer::deserialize_str` implementation - rust

Link to playground
I am trying to implement a custom data format with serde, I've been struggling with the deserialize_str method
pub struct Deserializer<R> {
rdr: R,
}
impl<'de, 'a, R: io::Read + 'de> de::Deserializer<'de> for &'a mut Deserializer<R> {
fn deserialize_str<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
let len = self.read_i16()?; // implemention below
if len == 0 || len == -1 {
return visitor.visit_borrowed_str("");
}
let len = len as usize;
let buf = self.read_exact(len)?; // implemention below
let out_str = std::str::from_utf8(&buf)?;
// visitor.visit_borrowed_str(out_str) doesn't compile
visitor.visit_str(out_str) // compiles but errors
}
}
impl<R: io::Read> Deserializer<R> {
fn read_exact(&mut self, len: usize) -> Result<Vec<u8>> {
let mut buf = vec![0; len];
self.rdr.read_exact(&mut buf)?;
Ok(buf)
}
fn read_i16(&mut self) -> io::Result<i8> {
self.rdr.read_i16::<byteorder::NetworkEndian>()
}
}
When using visitor.visit_borrowed_str(out_str), I get the error
|
94 | impl<'de, 'a, R: io::Read + 'de> de::Deserializer<'de> for &'a mut Deserializer<R> {
| --- lifetime `'de` defined here
...
149 | let out_str = std::str::from_utf8(&buf)?;
| ^^^^ borrowed value does not live long enough
150 |
151 | visitor.visit_borrowed_str(out_str)
| ----------------------------------- argument requires that `buf` is borrowed for `'de`
152 | }
| - `buf` dropped here while still borrowed
I understand that out_str needs to somehow live longer than its scope, but I can't find a way to go about it.

To use visit_borrowed_str, you need to hand it a reference to something that lives as long as your deserializer. Creating a new temporary Vec with read_exact won't do, you need to get access to the underlying slice, e.g. std::str::from_utf8(self.rdr.get_ref()[self.rdr.position()..][..len]) or similar. If you want to keep R a generic std::io::Read, I think you can't use visit_borrowed_str. serde_json e.g. handles this by having a special Read that returns a reference to the underlying data if it can, and then only uses visit_borrowed_str if it does have a reference to the underlying data.
Also, if you ask a deserializer to deserialize to a borrowed string when it can't, it must necessarily error. That holds true for serde_json as well. So the error from visit_str is not an error in your deserializer implementation, but an error in how you use the deserializer. You should have asked to deserialize to a String or Cow<str> instead (not that your serializer could ever give you a Cow::Borrowed, but asking for a &str just isn't a good idea with any deserializer, asking for a Cow<str> is the thing generally recommended instead).

Related

Lifetime issues: return struct containing reference to local closure

I am attempting to model some program state as Mutables from the futures-signals library, whose value I want to set generically from a serde_json Value identified by some string key.
For example, given I received some payload instructing me to update "my_int" with a Value, I want to be able to set the value of the Mutable that is known as "my_int".
My idea was to have a map from identifiers like "my_int" to a non-templated wrapper around a mutable's setter. It is important that said wrapper is non-templated, because otherwise I couldn't hold a collection of them in one map:
let my_int = Mutable::new(123);
let my_str = Mutable::new("asdf");
// ...
let setters = HashMap::from([
("my_int", /* magic wrapper around setter here somehow */),
("my_str", /* magic wrapper around setter here somehow */),
// ...
]);
let property_name = "my_int"; // constant for demo purposes
let value = Value::from(234); // constant for demo purposes
let setter = setters.get(property_name).unwrap();
(setter.deser_and_set)(value);
Right now said magic wrapper looks like this:
struct SetterWrapper<'a> {
name: &'static str,
deser_and_set: &'a dyn Fn(Value) -> Result<(), Error>,
// + some other unrelated fields
}
And I can create those inline, and it works:
let my_int_setter = SetterWrapper {
name: "my_int",
deser_and_set: &(|v: Value| {
my_int.set(serde_json::from_value(v)?);
Ok(())
}),
// + some other unrelated fields
};
But I have many mutables and don't want to repeat the above code for every one of them, so I attempted to put it into a function:
fn wrap_setter<'a, T>(name: &'static str, mutable: &'a Mutable<T>) -> SetterWrapper<'a>
where T: for<'de> Deserialize<'de>
{
let deser_and_set = |v: Value| {
mutable.set(serde_json::from_value::<T>(v)?);
Ok(())
};
SetterWrapper {
name,
deser_and_set: &deser_and_set,
}
}
which I intend to use like let my_int_setter = wrap_setter("my_int", &my_int);, however I am encountering the following error:
error[E0515]: cannot return value referencing local variable `deser_and_set`
--> src\main.rs:66:5
|
66 | / SetterWrapper {
67 | | name,
68 | | deser_and_set: &deser_and_set,
| | -------------- `deser_and_set` is borrowed here
69 | | }
| |_____^ returns a value referencing data owned by the current function
The error itself makes sense to me: of course I can't return references to local variables, as those would dangle. But I believe conceptually I could solve the issue by somehow marking the closure in the function to have the same lifetime as the mutable, namely 'a, but you cannot give variables lifetime annotations.
How can I solve this issue? Or is my approach already clumsy?
To work around the issue, one way I can think of is change the property deser_and_set to a Box from a reference. With that, the ownership of the Box can be moved out of the function. Give it a try.
struct SetterWrapper {
name: &'static str,
deser_and_set: Box<dyn Fn(Value) -> Result<(), Error>>,
// + some other unrelated fields
}
fn wrap_setter<T>(name: &'static str, mutable: &Mutable<T>) -> SetterWrapper
where T: for<'de> Deserialize<'de>
{
SetterWrapper {
name,
deser_and_set: Box::new(|v: Value| {
mutable.set(serde_json::from_value::<T>(v)?);
Ok(())
};),
}
}
Probably the answer from #Joe_Jingyu is cleaner but I want to point out a second way you could take:
make SetterWrapper a trait and implement it for Mutable:
trait SetterWrapper {
fn deser_and_set(&self, v: Value) -> Result<(), Error>;
}
impl<T> SetterWrapper for Mutable<T>
where
T: for<'de> serde::de::Deserialize<'de>,
{
fn deser_and_set(&self, v: Value) -> Result<(), Error> {
self.set(serde_json::from_value::<T>(v)?);
Ok(())
}
}
Now you can create the HashMap with the trait objects and set the value:
let setters = HashMap::from([
("my_int", &my_int as &dyn SetterWrapper),
("my_str", &my_str),
]);
let property_name = "my_int"; // constant for demo purposes
let value = Value::from(234); // constant for demo purposes
let setter = setters.get(property_name).unwrap();
// now the call can be direct
setter.deser_and_set(value).unwrap();
Playground link (Note: I have build a simple Mutable myself, just to make the example work)

"cannot infer an appropriate lifetime" when attempting to return a chunked response with hyper

I would like to return binary data in chunks of specific size. Here is a minimal example.
I made a wrapper struct for hyper::Response to hold my data like status, status text, headers and the resource to return:
pub struct Response<'a> {
pub resource: Option<&'a Resource>
}
This struct has a build method that creates the hyper::Response:
impl<'a> Response<'a> {
pub fn build(&mut self) -> Result<hyper::Response<hyper::Body>, hyper::http::Error> {
let mut response = hyper::Response::builder();
match self.resource {
Some(r) => {
let chunks = r.data
.chunks(100)
.map(Result::<_, std::convert::Infallible>::Ok);
response.body(hyper::Body::wrap_stream(stream::iter(chunks)))
},
None => response.body(hyper::Body::from("")),
}
}
}
There is also another struct holding the database content:
pub struct Resource {
pub data: Vec<u8>
}
Everything works until I try to create a chunked response. The Rust compiler gives me the following error:
error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
--> src/main.rs:14:15
|
14 | match self.resource {
| ^^^^^^^^^^^^^
|
note: first, the lifetime cannot outlive the lifetime `'a` as defined on the impl at 11:6...
--> src/main.rs:11:6
|
11 | impl<'a> Response<'a> {
| ^^
note: ...so that the types are compatible
--> src/main.rs:14:15
|
14 | match self.resource {
| ^^^^^^^^^^^^^
= note: expected `Option<&Resource>`
found `Option<&'a Resource>`
= note: but, the lifetime must be valid for the static lifetime...
note: ...so that the types are compatible
--> src/main.rs:19:31
|
19 | response.body(hyper::Body::wrap_stream(stream::iter(chunks)))
| ^^^^^^^^^^^^^^^^^^^^^^^^
= note: expected `From<&[u8]>`
found `From<&'static [u8]>`
I don't know how to fulfill these lifetime requirements. How can I do this correctly?
The problem is not in the 'a itself, but in the fact that the std::slice::chunks() function returns an iterator that borrows the original slice. You are trying to create a stream future from this Chunks<'_, u8> value, but the stream requires it to be 'static. Even if your Resource did not have the 'a lifetime, you would still have the r.data borrowed, and it would still fail.
Remember that here 'static does not mean that the value lives forever, but that it can be made to live as long as necessary. That is, the future must not hold any (non-'static) borrows.
You could clone all the data, but if it is very big, it can be costly. If so, you could try using Bytes, that is just like Vec<u8> but reference counted.
It looks like there is no Bytes::chunks() function that returns an iterator of Bytes. Fortunately it is easy to do it by hand.
Lastly, remember that iterators in Rust are lazy, so they keep the original data borrowed, even if it is a Bytes. So we need to collect them into a Vec to actually own the data (playground):
pub struct Resource {
pub data: Bytes,
}
impl<'a> Response<'a> {
pub fn build(&mut self) -> Result<hyper::Response<hyper::Body>, hyper::http::Error> {
let mut response = hyper::Response::builder();
match self.resource {
Some(r) => {
let len = r.data.len();
let chunks = (0..len)
.step_by(100)
.map(|x| {
let range = x..len.min(x + 100);
Ok(r.data.slice(range))
})
.collect::<Vec<Result<Bytes, std::convert::Infallible>>>();
response.body(hyper::Body::wrap_stream(stream::iter(chunks)))
}
None => response.body(hyper::Body::from("")),
}
}
}
UPDATE: We can avoid the call to collect() if we notice that stream::iter() takes ownership of an IntoIterator that can be evaluated lazily, as long as we make it 'static. It can be done if we do a (cheap) clone of r.data and move it into the lambda (playground):
let data = r.data.clone();
let len = data.len();
let chunks = (0..len).step_by(100)
.map(move |x| {
let range = x .. len.min(x + 100);
Result::<_, std::convert::Infallible>::Ok(data.slice(range))
});
response.body(hyper::Body::wrap_stream(stream::iter(chunks)))

How to correctly use trait objects , mutably iterating over mutable trait objects container, while mutating container itself?

I am trying to write a packet parser, where basically one builds up a packet by parsing each Layer in the packet. The packet then holds those 'layers' in a vector.
The ~pseudo code~ code with compilation errors is something like the following -
Also added comments below - for each step. I have experimented with RefCell , but could not get that working. Essentially the challenges are enumerated at the end of the code.
The basic pattern is as follows - Get the object of a Layer type (Every Layer type will return a default next object based upon some field in the current layer as a 'boxed trait object'.)
Edit: I am adding a code that's more than a pseudo code - Also added following compilation errors. May be a way to figure out how to fix these errors could solve the problems.!
#[derive(Debug, Default)]
pub struct Packet<'a> {
data: Option<&'a [u8]>,
meta: PacketMetadata,
layers: Vec<Box<dyn Layer<'a>>>,
}
pub trait Layer<'a>: Debug {
fn from_u8<'b>(&mut self, bytes: &'b [u8]) -> Result<(Option<Box<dyn Layer>>, usize), Error>;
}
#[derive(Debug, Default)]
pub struct PacketMetadata {
timestamp: Timestamp,
inface: i8,
len: u16,
caplen: u16,
}
impl<'a> Packet<'a> {
fn from_u8(bytes: &'a [u8], _encap: EncapType) -> Result<Self, Error> {
let mut p = Packet::default();
let eth = ethernet::Ethernet::default();
let mut layer: RefCell<Box<dyn Layer>> = RefCell::new(Box::new(eth));
let mut res: (Option<Box<dyn Layer>>, usize);
let mut start = 0;
loop {
let mut decode_layer = layer.borrow_mut();
// process it
res = decode_layer.from_u8(&bytes[start..])?;
if res.0.is_none() {
break;
}
// if the layer exists, get it in a layer.
let boxed = layer.replace(res.0.unwrap());
start = res.1;
// append the layer to layers.
p.layers.push(boxed);
}
Ok(p)
}
}
Compilation Errors
error[E0515]: cannot return value referencing local variable `decode_layer`
--> src/lib.rs:81:9
|
68 | res = decode_layer.from_u8(&bytes[start..])?;
| ------------ `decode_layer` is borrowed here
...
81 | Ok(p)
| ^^^^^ returns a value referencing data owned by the current function
error[E0515]: cannot return value referencing local variable `layer`
--> src/lib.rs:81:9
|
65 | let mut decode_layer = layer.borrow_mut();
| ----- `layer` is borrowed here
...
81 | Ok(p)
| ^^^^^ returns a value referencing data owned by the current function
error: aborting due to 2 previous errors; 3 warnings emitted
It's not clear why the above errors come. I am using the values returned by the calls. (The 3: warnings shown above can be ignored, they are unused warnings.)
The challenges -
p.layers.last_mut and p.layers.push are simultaneous mutable borrows - not allowed. I could somehow put it behind a RefCell, but how that's not clear.
This code is similar in pattern to syn::token::Tokens, however one basic difference being, there an Enum is used(TokenTree). In the above example I cannot use Enum because the list of protocols to be supported is potentially unbounded.
I cannot use Layer trait without Trait Objects due to the loop construct.
The pattern can be thought of as - mutably iterating over a container of Trait objects while updating the container itself.
Perhaps I am missing something very basic.
The problem with the above code is due to lifetime annotation on the Layer trait. If that lifetime annotation is removed, the above code indeed compiles with a few modifications as posted below -
// Layer Trait definition
pub trait Layer: Debug {
fn from_u8(&mut self, bytes: &[u8]) -> Result<(Option<Box<dyn Layer>>, usize), Error>;
}
impl<'a> Packet<'a> {
fn from_u8(bytes: &'a [u8], _encap: EncapType) -> Result<Self, Error> {
let mut p = Packet::default();
let eth = ethernet::Ethernet::default();
let layer: RefCell<Box<dyn Layer>> = RefCell::new(Box::new(eth));
let mut res: (Option<Box<dyn Layer>>, usize);
let mut start = 0;
loop {
{
// Do a `borrow_mut` in it's own scope, that gets dropped at the end.
let mut decode_layer = layer.borrow_mut();
res = decode_layer.from_u8(&bytes[start..])?;
}
if res.0.is_none() {
// This is just required to push something to the RefCell, that will get dropped anyways.
let fake_boxed = Box::new(FakeLayer {});
let boxed = layer.replace(fake_boxed);
p.layers.push(boxed);
break;
}
// if the layer exists, get it in a layer.
let boxed = layer.replace(res.0.unwrap());
start = res.1;
// append the layer to layers.
p.layers.push(boxed);
}
Ok(p)
}
}

Cannot split a string into string slices with explicit lifetimes because the string does not live long enough

I'm writing a library that should read from something implementing the BufRead trait; a network data stream, standard input, etc. The first function is supposed to read a data unit from that reader and return a populated struct filled mostly with &'a str values parsed from a frame from the wire.
Here is a minimal version:
mod mymod {
use std::io::prelude::*;
use std::io;
pub fn parse_frame<'a, T>(mut reader: T)
where
T: BufRead,
{
for line in reader.by_ref().lines() {
let line = line.expect("reading header line");
if line.len() == 0 {
// got empty line; done with header
break;
}
// split line
let splitted = line.splitn(2, ':');
let line_parts: Vec<&'a str> = splitted.collect();
println!("{} has value {}", line_parts[0], line_parts[1]);
}
// more reads down here, therefore the reader.by_ref() above
// (otherwise: use of moved value).
}
}
use std::io;
fn main() {
let stdin = io::stdin();
let locked = stdin.lock();
mymod::parse_frame(locked);
}
An error shows up which I cannot fix after trying different solutions:
error: `line` does not live long enough
--> src/main.rs:16:28
|
16 | let splitted = line.splitn(2, ':');
| ^^^^ does not live long enough
...
20 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the body at 8:4...
--> src/main.rs:8:5
|
8 | / {
9 | | for line in reader.by_ref().lines() {
10 | | let line = line.expect("reading header line");
11 | | if line.len() == 0 {
... |
22 | | // (otherwise: use of moved value).
23 | | }
| |_____^
The lifetime 'a is defined on a struct and implementation of a data keeper structure because the &str requires an explicit lifetime. These code parts were removed as part of the minimal example.
BufReader has a lines() method which returns Result<String, Err>. I handle errors using expect or match and thus unpack the Result so that the program now has the bare String. This will then be done multiple times to populate a data structure.
Many answers say that the unwrap result needs to be bound to a variable otherwise it gets lost because it is a temporary value. But I already saved the unpacked Result value in the variable line and I still get the error.
How to fix this error - could not get it working after hours trying.
Does it make sense to do all these lifetime declarations just for &str in a data keeper struct? This will be mostly a readonly data structure, at most replacing whole field values. String could also be used, but have found articles saying that String has lower performance than &str - and this frame parser function will be called many times and is performance-critical.
Similar questions exist on Stack Overflow, but none quite answers the situation here.
For completeness and better understanding, following is an excerpt from complete source code as to why lifetime question came up:
Data structure declaration:
// tuple
pub struct Header<'a>(pub &'a str, pub &'a str);
pub struct Frame<'a> {
pub frameType: String,
pub bodyType: &'a str,
pub port: &'a str,
pub headers: Vec<Header<'a>>,
pub body: Vec<u8>,
}
impl<'a> Frame<'a> {
pub fn marshal(&'a self) {
//TODO
println!("marshal!");
}
}
Complete function definition:
pub fn parse_frame<'a, T>(mut reader: T) -> Result<Frame<'a>, io::Error> where T: BufRead {
Your problem can be reduced to this:
fn foo<'a>() {
let thing = String::from("a b");
let parts: Vec<&'a str> = thing.split(" ").collect();
}
You create a String inside your function, then declare that references to that string are guaranteed to live for the lifetime 'a. Unfortunately, the lifetime 'a isn't under your control — the caller of the function gets to pick what the lifetime is. That's how generic parameters work!
What would happen if the caller of the function specified the 'static lifetime? How would it be possible for your code, which allocates a value at runtime, to guarantee that the value lives longer than even the main function? It's not possible, which is why the compiler has reported an error.
Once you've gained a bit more experience, the function signature fn foo<'a>() will jump out at you like a red alert — there's a generic parameter that isn't used. That's most likely going to mean bad news.
return a populated struct filled mostly with &'a str
You cannot possibly do this with the current organization of your code. References have to point to something. You are not providing anywhere for the pointed-at values to live. You cannot return an allocated String as a string slice.
Before you jump to it, no you cannot store a value and a reference to that value in the same struct.
Instead, you need to split the code that creates the String and that which parses a &str and returns more &str references. That's how all the existing zero-copy parsers work. You could look at those for inspiration.
String has lower performance than &str
No, it really doesn't. Creating lots of extraneous Strings is a bad idea, sure, just like allocating too much is a bad idea in any language.
Maybe the following program gives clues for others who also also having their first problems with lifetimes:
fn main() {
// using String und &str Slice
let my_str: String = "fire".to_owned();
let returned_str: MyStruct = my_func_str(&my_str);
println!("Received return value: {ret}", ret = returned_str.version);
// using Vec<u8> und &[u8] Slice
let my_vec: Vec<u8> = "fire".to_owned().into_bytes();
let returned_u8: MyStruct2 = my_func_vec(&my_vec);
println!("Received return value: {ret:?}", ret = returned_u8.version);
}
// using String -> str
fn my_func_str<'a>(some_str: &'a str) -> MyStruct<'a> {
MyStruct {
version: &some_str[0..2],
}
}
struct MyStruct<'a> {
version: &'a str,
}
// using Vec<u8> -> & [u8]
fn my_func_vec<'a>(some_vec: &'a Vec<u8>) -> MyStruct2<'a> {
MyStruct2 {
version: &some_vec[0..2],
}
}
struct MyStruct2<'a> {
version: &'a [u8],
}

How do I pass a mutable vector as a function parameter in Rust?

I am implementing a small program that evaluates the Collatz conjecture. As part of this, I have a function that I call recursively that I want to store the current number being evaluated, determine if it is a odd or even (or terminate if it's just a 1), perform that branch of the conjecture and then call itself with the new number.
To do this, I wanted to pass a vector into this function and push the current number onto that vector, but I am having a tough time understanding how to pass a mutable vector reference.
Here is the code that I have:
fn evaluate_conjecture(number_to_be_evaluated: u64, mut intermediate_results: &Vec<u64>) -> u64 {
intermediate_results.push(number_to_be_evaluated);
if number_to_be_evaluated == 1 {
0
} else if number_to_be_evaluated % 2 == 1 {
let odd_step_result = perform_odd_conjecture_step(number_to_be_evaluated);
evaluate_conjecture(odd_step_result, intermediate_results) + 1
} else {
let even_step_result = perform_even_conjecture_step(number_to_be_evaluated);
evaluate_conjecture(even_step_result, intermediate_results) + 1
}
}
fn perform_odd_conjecture_step(_: u64) -> u64 {
unimplemented!()
}
fn perform_even_conjecture_step(_: u64) -> u64 {
unimplemented!()
}
and here is the relevant part of my main
fn main() {
let input_number = 42;
let mut _intermediate_results: Vec<u64>;
let number_of_steps = evaluate_conjecture(input_number, &_intermediate_results);
}
Here is the error I am getting
error[E0596]: cannot borrow `*intermediate_results` as mutable, as it is behind a `&` reference
--> src/main.rs:2:5
|
1 | fn evaluate_conjecture(number_to_be_evaluated: u64, mut intermediate_results: &Vec<u64>) -> u64 {
| --------- help: consider changing this to be a mutable reference: `&mut std::vec::Vec<u64>`
2 | intermediate_results.push(number_to_be_evaluated);
| ^^^^^^^^^^^^^^^^^^^^ `intermediate_results` is a `&` reference, so the data it refers to cannot be borrowed as mutable
How do I pass this vector into the function so I can modify it each time the function is called?
&T is an immutable reference.
&mut T is a mutable reference.
Change your &Vec<u64> to &mut Vec<u64> and your &_intermediate_results to &mut _intermediate_results.
This is a thing which is fairly well documented; I suggest you read the documentation if you haven’t — it explains quite a lot. There's a section specifically about mutable references.

Resources