How can I specify an iterator over traits? - rust

I've been trying to make a websocket client, but one that has tons of options! I thought of using a builder style since the configuration can be stored in a nice way:
let client = Client::new()
.options(5)
.stuff(true)
// now users can store the config before calling build
.build();
I am having trouble creating a function that takes in a list of strings. Of course I have a few options:
fn strings(self, list: &[&str]) -> Self;
fn strings(self, list: Vec<String>) -> Self;
fn strings(self, list: &[&String]) -> Self;
// etc...
I would like to accept generously so I would like to accept &String, &str, and hopefully keys in a HashMap (since this might be used with a large routing table) so I thought I would accept an iterator over items that implement Borrow<str> like so:
fn strings<P, Sp>(self, P)
where P: Iterator<Item = &'p Sp>,
Sp: Borrow<str> + 'p;
A full example is available here.
This was great until I needed to add another optional list of strings (extensions) to the builder.
This meant that if I created a builder without specifying both lists of strings that the compiler would complain that it couldn't infer the type of the Builder, which makes sense. The only reason this is not OK is that both these fields are optional so the user might never know the type of a field it hasn't yet set.
Does anyone have any ideas on how to specify an iterator over traits? Then I wouldn't have to specify the type fully at compile time. Or maybe just a better way to do this entirely?

A pragmatic solution is to simply discard the concrete types of the types and introduce some indirection. We can Box the trait object and store that as a known type:
use std::borrow::Borrow;
struct Builder {
strings: Option<Box<Iterator<Item = Box<Borrow<str>>>>>,
}
impl Builder {
fn new() -> Self {
Builder { strings: None }
}
fn strings<I>(mut self, iter: I) -> Self
where I: IntoIterator + 'static,
I::Item: Borrow<str> + 'static,
{
let i = iter.into_iter().map(|x| Box::new(x) as Box<Borrow<str>>);
self.strings = Some(Box::new(i));
self
}
fn build(self) -> String {
match self.strings {
Some(iter) => {
let mut s = String::new();
for i in iter {
s.push_str((*i).borrow());
}
s
},
None => format!("No strings here!"),
}
}
}
fn main() {
let s =
Builder::new()
.strings(vec!["a", "b"])
.build();
println!("{}", s);
}
Here we convert the input iterator to a boxed iterator of boxed things that implement Borrow. We have to do some gyrations to convert the specific concrete type we have into a conceptually higher level type but that is still concrete.
This remainder doesn't directly answer your question about an iterator of traits, but it provides an alternate solution that I would use.
You have to pick between that might be a bit more optimal and have a worse user experience, or something that might be a bit suboptimal but a nicer user experience.
You are currently storing the iterator in the builder struct:
struct Builder
where I: Iterator
{
things: Option<I>,
}
This requires that the concrete type of I be known in order to instantiate a Builder. Specifically, the size of that type needs to be known in order to allocate enough space. There's nothing around this; if you want to store a generic type, you need to know what type it is.
For the same reasons, you cannot have this standalone statement:
let foo = None;
How much space needs to be allocated for foo? You cannot know until you know what type the Some might hold.
The way I would go would be to not add type parameters for the struct, but have them on the function. This means that the struct has to have a fixed type to store the values. In your example, a String is a good fit:
struct Builder {
strings: Vec<String>,
}
impl Builder {
fn strings<I>(mut self, iter: I) -> Self
where I: IntoIterator,
I::Item: Into<String>,
{
self.strings.extend(iter.into_iter().map(Into::into));
self
}
}
A Vec has very compact storage (it only takes 3 machine-sized values), and doesn't allocate any heap memory when it is empty. For that reason, I wouldn't wrap it in an Option unless you needed to tell 0 items from the absence of a provided value.
If you are just appending each value to one big string, you might as well do that in the strings method. That depends on your application.
You mention that you might be providing a large amount of data, but I'm not sure that holding the iterator until the build call will really help. You are going to pay the cost earlier or later.
If you are going to reuse the builder, then it depends on what is expensive. If iterating is expensive, then doing it once and reusing that for each build call will be more efficient. If holding onto the memory is expensive, then you don't want to have multiple builders or built items around concurrently. Since the builder will transfer ownership of the memory to the new item, there shouldn't be any waste here.

Related

Is there a way to reference an anonymous type as a struct member?

Consider the following Rust code:
use std::future::Future;
use std::pin::Pin;
fn main() {
let mut v: Vec<_> = Vec::new();
for _ in 1..10 {
v.push(wrap_future(Box::pin(async {})));
}
}
fn wrap_future<T>(a: Pin<Box<dyn Future<Output=T>>>) -> impl Future<Output=T> {
async {
println!("doing stuff before awaiting");
let result=a.await;
println!("doing stuff after awaiting");
result
}
}
As you can see, the futures I'm putting into the Vec don't need to be boxed, since they are all the same type and the compiler can infer what that type is.
I would like to create a struct that has this Vec<...> type as one of its members, so that I could add a line at the end of main():
let thing = MyStruct {myvec: v};
without any additional overhead (i.e. boxing).
Type inference and impl Trait syntax aren't allowed on struct members, and since the future type returned by an async block exists entirely within the compiler and is exclusive to that exact async block, there's no way to reference it by name. It seems to me that what I want to do is impossible. Is it? If so, will it become possible in a future version of Rust?
I am aware that it would be easy to sidestep this problem by simply boxing all the futures in the Vec as I did the argument to wrap_future() but I'd prefer not to do this if I can avoid it.
I am well aware that doing this would mean that there could be only one async block in my entire codebase whose result values could possibly be added to such a Vec, and thus that there could be only one function in my entire codebase that could create values that could possibly be pushed to it. I am okay with this limitation.
Nevermind, I'm stupid. I forgot that structs could have type parameters.
struct MyStruct<F> where F: Future<Output=()> {
myvec: Vec<F>,
}

Rust mutable container of immutable elements?

With Rust, is it in general possible to have a mutable container of immutable values?
Example:
struct TestStruct { value: i32 }
fn test_fn()
{
let immutable_instance = TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
Here, my TestStruct instance is wrapped in two containers: a Box, then a Vec. From the perspective of a new Rust user it's surprising that moving the Box into the Vec makes both the Box and the TestStruct instance mutable.
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Not really. You could easily create one (just create a wrapper object which implements Deref but not DerefMut), but the reality is that Rust doesn't really see (im)mutability that way, because its main concern is controlling sharing / visibility.
After all, for an external observer what difference is there between
mutable_vector[0].value = 456;
and
mutable_vector[0] = Box::new(TestStruct{value: 456});
?
None is the answer, because Rust's ownership system means it's not possible for an observer to have kept a handle on the original TestStruct, thus they can't know whether that structure was replaced or modified in place[1][2].
If you want to secure your internal state, use visibility instead: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8a9346072b32cedcf2fccc0eeb9f55c5
mod foo {
pub struct TestStruct { value: i32 }
impl TestStruct {
pub fn new(value: i32) -> Self { Self { value } }
}
}
fn test_fn() {
let immutable_instance = foo::TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
does not compile because from the point of view of test_fn, TestStruct::value is not accessible. Therefore test_fn has no way to mutate a TestStruct unless you add an &mut method on it.
[1]: technically they could check the address in memory and that might tell them, but even then it's not a sure thing (in either direction) hence pinning being a thing.
[2]: this observability distinction is also embraced by other languages, for instance the Clojure language largely falls on the "immutable all the things" side, however it has a concept of transients which allow locally mutable objects

Adapt a lending iterator type to a serde-style map visitor

I'm trying to adapt the data model I designed to a serde MapAccess interface. I have my own data model, which has a List type containing key-value pairs, which are exposed like this:
impl List {
fn next_item(&mut self) -> Option<(String, Item<'_>)>
}
In other words, the Item type mutably borrows from a List object, and must be fully processed before the next Item can be extracted from the List.
I need to adapt this data model to a deserializer trait that looks like this:
trait MapAccess {
fn next_key<T: Deserialize>(&mut self) -> Option<T>;
fn next_value<T: Deserialize>(&mut self) -> T;
}
The two methods should be called in alternating order; they're allowed to panic or return garbage results if you don't call them in the correct order. This trait is provided by the deserializer library I'm using, and is the one only of this problem I can't control or modify.
I'm trying to adapt a List or an &mut List into a MapAccess and I'm running into challenges. I do have a function which converts an Item<'_> to a T: Deserialize (and specifically ends the borrow of List):
fn deserialize_item<T: Deserialize>(item: Item<'_>) -> T;
The basic problem I'm running into is that MapAccess::next_key calls List::next_item and then deserializes the String as a key, and then needs to store the Item<'_> so that it can handled by next_value. I keep running into cases with types that are almost self referential (two fields that both mutably borrow the same thing). It's nothing truely self-referential so I'm not (yet) convinced that this is impossible. The closest I've come is some kind of enum that that resembles enum State {List(&'a mut List), Item(Item<'_>)} (because I don't need to access a List while I'm working on the Item, but I haven't found a way to restore access to the List for the next next_key after the Item is consumed.
As the author of the data model, I do have some leeway to change how it's designed, but if possible I'd like to hold on to some kind of borrowing model, which I use to enforce parse consistency (that is, the borrow prevents List from parsing the next item until after the Item has fully parsed the content of the current item.). I can of course design any additional types to implement this correctly.
Note: This is all serde related, but I'm using original types to simplify the interfaces. In particular I've omitted all the of the Result types and 'de lifetimes as I'm confident they're not relevant to this problem.
EDIT: Here's my attempt to make something that works. This doesn't compile, but it hopefully gets across what I'd like to do here.
fn deserialize_string<T: Deserialize>(value: String) -> T { todo!() }
fn deserialize_item<T: Deserialize>(item: Item<'_>) -> T { todo!() }
struct ListMapAccess<'a> {
list: &'a mut List,
item: Option<Item<'a>>,
}
impl<'a> MapAccess for ListMapAccess<'a> {
fn next_key<T: Deserialize>(&mut self) -> Option<T> {
let (key, item) = self.list.next_item()?;
self.item = Some(item);
Some(deserialize_string(key))
}
fn next_value<T: Deserialize>(&mut self) -> T {
let item = self.item.expect("Called next_item out of order");
deserialize_item(item)
}
}
Now, I'm really not at all tied to a ListMapAccess struct; I'm happy with any solution that integrates my List / Item interface into MapAccess.

How to create a Box<UnsafeCell<[T]>>

The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.
The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.
Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground

Shared ownership of an str between a HashMap and a Vec

I come from a Java/C#/JavaScript background and I am trying to implement a Dictionary that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.
I thought that a struct with a HashMap and a Vec would do but it turned out to be more complicated than that.
I started with the usage of &str as a key for HashMap and an item of Vec like in the following sample. The value of HashMap serves as an index into Vec.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<&'a str>
}
impl<'a> Dictionary<'a> {
pub fn put_and_get_key(&mut self, value: &'a str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
self.keys_map.push(value);
self.values_map.insert(value, id);
id
},
Some(&mut id) => id
}
}
}
This works just fine until it turns out that the strs need to be stored somewhere, preferably in this same struct as well. I tried to store a Box<str> in the Vec and &'a str in the HashMap.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<Box<str>>
}
The borrow checker did not allow this of course because it would have allowed a dangling pointer in the HashMap when an item is removed from the Vec (or in fact sometimes when another item is added to the Vec but this is an off-topic here).
I understood that I either need to write unsafe code or use some form of shared ownership, the simplest kind of which seems to be an Rc. The usage of Rc<Box<str>> looks like introducing double indirection but there seems to be no simple way to construct an Rc<str> at the moment.
pub struct Dictionary {
values_map: HashMap<Rc<Box<str>>, u32>,
keys_map: Vec<Rc<Box<str>>>
}
impl Dictionary {
pub fn put_and_get_key(&mut self, value: &str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
let value_to_store = Rc::new(value.to_owned().into_boxed_str());
self.keys_map.push(value_to_store);
self.values_map.insert(value_to_store, id);
id
},
Some(&mut id) => id
}
}
}
Everything seems fine with regard to ownership semantics, but the code above does not compile because the HashMap now expects an Rc, not an &str:
error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied
--> src/file_structure/sample_dictionary.rs:14:31
|
14 | match self.values_map.get_mut(value) {
| ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>`
|
= help: the following implementations were found:
= help: <std::rc::Rc<T> as std::borrow::Borrow<T>>
Questions:
Is there a way to construct an Rc<str>?
Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps string-by-id and id-by-string and be able to retrieve an id by &str, i.e. without any excessive allocations.
Is there a way to construct an Rc<str>?
Annoyingly, not that I know of. Rc::new requires a Sized argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.
Which other structures, methods or approaches could help to resolve this problem?
If you look at the signature of get you'll notice:
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where K: Borrow<Q>, Q: Hash + Eq
As a result, you could search by &str if K implements Borrow<str>.
String implements Borrow<str>, so the simplest solution is to simply use String as a key. Sure it means you'll actually have two String instead of one... but it's simple. Certainly, a String is simpler to use than a Box<str> (although it uses 8 more bytes).
If you want to shave off this cost, you can use a custom structure instead:
#[derive(Clone, Debug)]
struct RcStr(Rc<String>);
And then implement Borrow<str> for it. You'll then have 2 allocations per key (1 for Rc and 1 for String). Depending on the size of your String, it might consume less or more memory.
If you wish to got further (why not?), here are some ideas:
implement your own reference-counted string, in a single heap-allocation,
use a single arena for the slice inserted in the Dictionary,
...

Resources