How to implement Borrow<T> when T wraps borrowed data? - rust

Given the following definitions from How to borrow a field for serialization but create it during deserialization?:
#[derive(Serialize)]
struct SerializeThing<'a> {
small_header: (u64, u64, u64),
big_body: &'a str,
}
#[derive(Deserialize)]
struct DeserializeThing {
small_header: (u64, u64, u64),
big_body: String,
}
How do I implement the Borrow trait so as to store the owned data naturally in (e.g.) HashMaps and query them by either them or by their borrowed counterparts? The closest thing that appears possible is as follows:
impl DeserializeThing {
fn as_serialize(&self) -> SerializeThing<'_> {
let DeserializeThing { small_header, big_body } = self;
let small_header = *small_header;
let big_body = big_body.as_str();
SerializeThing { small_header, big_body }
}
}
which is not quite sufficient.

You can't.
Borrow::borrow must return a reference, and there is no way you can get a reference to a SerializeThing from a reference to a DeserializeThing as these types are simply not ABI compatible.
If performance is important and you can't pay for the construction of DeserializeThing instances/allocation of strings, then you could use hashbrown::HashMap instead of std::collections::HashMap.
hashbrown is the library that the standard library uses for its own HashMap implementation, but it has some more useful methods.
One which would be useful for you now is the raw entry API (and mutable raw entry).
In particular, it allows you to get a map entry from its hash and a matching function:
pub fn from_hash<F>(self, hash: u64, is_match: F) -> Option<(&'a K, &'a > V)> where
F: FnMut(&K) -> bool
Access an entry by hash.
Since you can implement Hash for both DeserializeThing and SerializeThing to get the same the hashes for the same values, this API would be simple to use in your case.

Related

How can I write a self-referential Rust struct with Arc and BufReader?

I'm trying to write this following code for a server:
use std::io::{BufReader, BufWriter};
use std::net::TcpStream;
struct User<'a> {
stream: Arc<TcpStream>,
reader: BufReader<&'a TcpStream>,
writer: BufWriter<&'a TcpStream>,
}
fn accept_socket(users: &mut Vec<User>, stream: Arc<TcpStream>) {
let stream_clone = stream.clone();
let user = User {
stream: stream_clone,
reader: BufReader::new(stream_clone.as_ref()),
writer: BufWriter::new(stream_clone.as_ref()),
};
users.push(user);
}
The stream is behind an Arc because it is shared across threads. The BufReader and BufWriter point to the User's own Arc, but the compiler complains that the reference stream_clone.as_ref() does not live long enough, even though it obviously does (it points to the Arc, which isn't dropped as long as the User is alive). How do I get the compiler to accept this code?
Self-referential structs are a no-go. Rust has no way of updating the address in the references if the struct is moved since moving is always a simple bit copy. Unlike C++ with its move constructors, there's no way to attach behavior to moves.
What you can do instead is store Arcs inside the reader and writer so they share ownership of the TcpStream.
struct User {
stream: Arc<TcpStream>,
reader: BufReader<IoArc<TcpStream>>,
writer: BufWriter<IoArc<TcpStream>>,
}
The tricky part is that Arc doesn't implement Read and Write. You'll need a newtype that does (IoArc, above). Yoshua Wuyts wrote about this problem:
One of those patterns is perhaps lesser known but integral to std’s functioning: impl Read/Write for &Type. What this means is that if you have a reference to an IO type, such as File or TcpStream, you’re still able to call Read and Write methods thanks to some interior mutability tricks.
The implication of this is also that if you want to share a std::fs::File between multiple threads you don’t need to use an expensive Arc<Mutex<File>> because an Arc<File> suffices.
You might expect that if we wrap an IO type T in an Arc that it would implement Clone + Read + Write. But in reality it only implements Clone + Deref<T>... However, there's an escape hatch here: we can create a wrapper type around Arc<T> that implements Read + Write by dereferencing &T internally.
Here is his solution:
/// A variant of `Arc` that delegates IO traits if available on `&T`.
#[derive(Debug)]
pub struct IoArc<T>(Arc<T>);
impl<T> IoArc<T> {
/// Create a new instance of IoArc.
pub fn new(data: T) -> Self {
Self(Arc::new(data))
}
}
impl<T> Read for IoArc<T>
where
for<'a> &'a T: Read,
{
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
(&mut &*self.0).read(buf)
}
}
impl<T> Write for IoArc<T>
where
for<'a> &'a T: Write,
{
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
(&mut &*self.0).write(buf)
}
fn flush(&mut self) -> io::Result<()> {
(&mut &*self.0).flush()
}
}
MIT license
IoArc is available in the io_arc crate, though it is short enough to implement yourself if you don't want to pull in the dependency.
Simple answer: You can't.
In Rust, every type is implicitly movable by memcpy. So if your type stores references to itself, it would break as soon as the move happens; the references would be dangling.
More complex answer: You can't, unless you use Pin, unsafe and raw pointers.
But I'm pretty sure that using Arc for everything is the way to go instead.
Arc<TcpStream> does not implement Read or Write
You could just write a very thin wrapper struct around Arc<TcpStream> which implements Read and Write. It should be fairly easy.
Edit: Take a look at #JohnKugelman's anwser for such a wrapper.

How to share parts of a string with Rc?

I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).

Deserialize a Vec<Foobar<T>> as Vec<T> directly when Foobar has exactly one field

I'm given a data-format that includes a sequence of objects with exactly one named field value each. Can I remove this layer of indirection while deserializing?
When deserializing, the natural representation would be
/// Each record has it's own `{ value: ... }` object
#[derive(serde::Deserialize)]
struct Foobar<T> {
value: T,
}
/// The naive representation, via `Foobar`...
#[derive(serde::Deserialize)]
struct FoobarContainer {
values: Vec<Foobar<T>>,
}
While Foobar adds no extra cost beyond T, I'd like to remove this layer of indirection at the type-level:
#[derive(serde::Deserialize)]
struct FoobarContainer {
values: Vec<T>,
}
Can Foobar be removed from FoobarContainer, while still using it using deserialization?
In the general case, there's no trivial way to make this transformation. For that, review these existing answers:
How do I write a Serde Visitor to convert an array of arrays of strings to a Vec<Vec<f64>>?
How to transform fields during deserialization using Serde?
The first is my normal go-to solution and looks like this in this example.
However, in your specific case, you say:
objects with exactly one named field value
And you've identified a key requirement:
While Foobar adds no extra cost beyond T
This means that you can make Foobar have a transparent representation and use unsafe Rust to transmute between the types (although not actually with mem::transmute):
struct FoobarContainer<T> {
values: Vec<T>,
}
#[derive(serde::Deserialize)]
#[repr(transparent)]
struct Foobar<T> {
value: T,
}
impl<'de, T> serde::Deserialize<'de> for FoobarContainer<T>
where
T: serde::Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let mut v: Vec<Foobar<T>> = serde::Deserialize::deserialize(deserializer)?;
// I copied this from Stack Overflow without reading the surrounding
// text that describes why this is actually safe.
let values = unsafe {
let data = v.as_mut_ptr() as *mut T;
let len = v.len();
let cap = v.capacity();
std::mem::forget(v);
Vec::from_raw_parts(data, len, cap)
};
Ok(FoobarContainer { values })
}
}
See also:
How do I convert a Vec<T> to a Vec<U> without copying the vector?

Shared ownership of an str between a HashMap and a Vec

I come from a Java/C#/JavaScript background and I am trying to implement a Dictionary that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.
I thought that a struct with a HashMap and a Vec would do but it turned out to be more complicated than that.
I started with the usage of &str as a key for HashMap and an item of Vec like in the following sample. The value of HashMap serves as an index into Vec.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<&'a str>
}
impl<'a> Dictionary<'a> {
pub fn put_and_get_key(&mut self, value: &'a str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
self.keys_map.push(value);
self.values_map.insert(value, id);
id
},
Some(&mut id) => id
}
}
}
This works just fine until it turns out that the strs need to be stored somewhere, preferably in this same struct as well. I tried to store a Box<str> in the Vec and &'a str in the HashMap.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<Box<str>>
}
The borrow checker did not allow this of course because it would have allowed a dangling pointer in the HashMap when an item is removed from the Vec (or in fact sometimes when another item is added to the Vec but this is an off-topic here).
I understood that I either need to write unsafe code or use some form of shared ownership, the simplest kind of which seems to be an Rc. The usage of Rc<Box<str>> looks like introducing double indirection but there seems to be no simple way to construct an Rc<str> at the moment.
pub struct Dictionary {
values_map: HashMap<Rc<Box<str>>, u32>,
keys_map: Vec<Rc<Box<str>>>
}
impl Dictionary {
pub fn put_and_get_key(&mut self, value: &str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
let value_to_store = Rc::new(value.to_owned().into_boxed_str());
self.keys_map.push(value_to_store);
self.values_map.insert(value_to_store, id);
id
},
Some(&mut id) => id
}
}
}
Everything seems fine with regard to ownership semantics, but the code above does not compile because the HashMap now expects an Rc, not an &str:
error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied
--> src/file_structure/sample_dictionary.rs:14:31
|
14 | match self.values_map.get_mut(value) {
| ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>`
|
= help: the following implementations were found:
= help: <std::rc::Rc<T> as std::borrow::Borrow<T>>
Questions:
Is there a way to construct an Rc<str>?
Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps string-by-id and id-by-string and be able to retrieve an id by &str, i.e. without any excessive allocations.
Is there a way to construct an Rc<str>?
Annoyingly, not that I know of. Rc::new requires a Sized argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.
Which other structures, methods or approaches could help to resolve this problem?
If you look at the signature of get you'll notice:
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where K: Borrow<Q>, Q: Hash + Eq
As a result, you could search by &str if K implements Borrow<str>.
String implements Borrow<str>, so the simplest solution is to simply use String as a key. Sure it means you'll actually have two String instead of one... but it's simple. Certainly, a String is simpler to use than a Box<str> (although it uses 8 more bytes).
If you want to shave off this cost, you can use a custom structure instead:
#[derive(Clone, Debug)]
struct RcStr(Rc<String>);
And then implement Borrow<str> for it. You'll then have 2 allocations per key (1 for Rc and 1 for String). Depending on the size of your String, it might consume less or more memory.
If you wish to got further (why not?), here are some ideas:
implement your own reference-counted string, in a single heap-allocation,
use a single arena for the slice inserted in the Dictionary,
...

Vec<MyTrait> without N heap allocations?

I'm trying to port some C++ code to Rust. It composes a virtual (.mp4) file from a few kinds of slices (string reference, lazy-evaluated string reference, part of a physical file) and serves HTTP requests based on the result. (If you're curious, see Mp4File which takes advantage of the FileSlice interface and its concrete implementations in http.h.)
Here's the problem: I want require as few heap allocations as possible. Let's say I have a few implementations of resource::Slice that I can hopefully figure out on my own. Then I want to make the one that composes them all:
pub trait Slice : Send + Sync {
/// Returns the length of the slice in bytes.
fn len(&self) -> u64;
/// Writes bytes indicated by `range` to `out.`
fn write_to(&self, range: &ByteRange,
out: &mut io::Write) -> io::Result<()>;
}
// (used below)
struct SliceInfo<'a> {
range: ByteRange,
slice: &'a Slice,
}
/// A `Slice` composed of other `Slice`s.
pub struct Slices<'a> {
len: u64,
slices: Vec<SliceInfo<'a>>,
}
impl<'a> Slices<'a> {
pub fn new() -> Slices<'a> { ... }
pub fn append(&mut self, slice: &'a resource::Slice) { ... }
}
impl<'a> Slice for Slices<'a> { ... }
and use them to append lots and lots of slices with as few heap allocations as possible. Simplified, something like this:
struct ThingUsedWithinMp4Resource {
slice_a: resource::LazySlice,
slice_b: resource::LazySlice,
slice_c: resource::LazySlice,
slice_d: resource::FileSlice,
}
struct Mp4Resource {
slice_a: resource::StringSlice,
slice_b: resource::LazySlice,
slice_c: resource::StringSlice,
slice_d: resource::LazySlice,
things: Vec<ThingUsedWithinMp4Resource>,
slices: resource::Slices
}
impl Mp4Resource {
fn new() {
let mut f = Mp4Resource{slice_a: ...,
slice_b: ...,
slice_c: ...,
slices: resource::Slices::new()};
// ...fill `things` with hundreds of things...
slices.append(&f.slice_a);
for thing in f.things { slices.append(&thing.slice_a); }
slices.append(&f.slice_b);
for thing in f.things { slices.append(&thing.slice_b); }
slices.append(&f.slice_c);
for thing in f.things { slices.append(&thing.slice_c); }
slices.append(&f.slice_d);
for thing in f.things { slices.append(&thing.slice_d); }
f;
}
}
but this isn't working. The append lines cause errors "f.slice_* does not live long enough", "reference must be valid for the lifetime 'a as defined on the block at ...", "...but borrowed value is only valid for the block suffix following statement". I think this is similar to this question about the self-referencing struct. That's basically what this is, with more indirection. And apparently it's impossible.
So what can I do instead?
I think I'd be happy to give ownership to the resource::Slices in append, but I can't put a resource::Slice in the SliceInfo used in Vec<SliceInfo> because resource::Slice is a trait, and traits are unsized. I could do a Box<resource::Slice> instead but that means a separate heap allocation for each slice. I'd like to avoid that. (There can be thousands of slices per Mp4Resource.)
I'm thinking of doing an enum, something like:
enum BasicSlice {
String(StringSlice),
Lazy(LazySlice),
File(FileSlice)
};
and using that in the SliceInfo. I think I can make this work. But it definitely limits the utility of my resource::Slices class. I want to allow it to be used easily in situations I didn't anticipate, preferably without having to define a new enum each time.
Any other options?
You can add a User variant to your BasicSlice enum, which takes a Box<SliceInfo>. This way only the specialized case of users will take the extra allocation, while the normal path is optimized.

Resources