Pass an iterable to a function and iterate twice in rust - rust

I have a function which looks like
fn do_stuff(values: HashSet<String>) {
// Count stuff
for s in values.iter() {
prepare(s);
}
// Process stuff
for s in values.iter() {
process(s);
}
}
This works fine. For a unit test, I want to pass a two value collection where the elements are passed in a known order. (Processing them in the other order won't test the case I am trying to test.) HashSet doesn't guarantee an order, so I would like to pass a Vec instead.
I would like to change the argument to Iterable, but it appears that only IntoIter exists. I tried
fn do_stuff<C>(values: C)
where C: IntoIterator<Item=String>
{
// Count stuff
for s in values {
prepare(s);
}
// Process stuff
for s in values {
process(s);
}
}
which fails because the first iteration consumes values. The compiler suggests borrowing values, but
fn do_stuff<C>(values: C)
where C: IntoIterator<Item=String>
{
// Count stuff
for s in &values {
prepare(s);
}
// Process stuff
for s in values {
process(s);
}
}
fails because
the trait Iterator is not implemented for &C
I could probably make something with clone work, but the actual set will be large and I would like to avoid copying it if possible.
Thinking about that, the signature probably should be do_stuff(values: &C), so if that makes the problem simpler, then that is an acceptable solution.
SO suggests Writing a generic function that takes an iterable container as parameter in Rust as a related question, but that is a lifetime problem. I am not having problems with lifetimes.
It looks like How to create an `Iterable` trait for references in Rust? may actually be the solution. But I'm having trouble getting it to compile.
My first attempt is
pub trait Iterable {
type Item;
type Iter: Iterator<Item = Self::Item>;
fn iterator(&self) -> Self::Iter;
}
impl Iterable for HashSet<String> {
type Item = String;
type Iter = HashSet<String>::Iterator;
fn iterator(&self) -> Self::Iter {
self.iter()
}
}
which fails with
error[E0223]: ambiguous associated type
--> src/file.rs:178:17
|
178 | type Iter = HashSet<String>::Iterator;
| ^^^^^^^^^^^^^^^^^^^^^^^^^ help: use fully-qualified syntax: `<HashSet<std::string::String> as Trait>::Iterator`
Following that suggestion:
impl Iterable for HashSet<String> {
type Item = String;
type Iter = <HashSet<std::string::String> as Trait>::Iterator;
fn iterator(&self) -> Self::Iter {
self.iter()
}
}
failed with
error[E0433]: failed to resolve: use of undeclared type `Trait`
--> src/file.rs:178:50
|
178 | type Iter = <HashSet<std::string::String> as Trait>::Iterator;
| ^^^^^ use of undeclared type `Trait`
The rust documents don't seem to include Trait as a known type. If I replace Trait with HashSet, it doesn't recognize Iterator or IntoIter as the final value in the expression.
Implementation of accepted answer
Attempting to implement #eggyal answer, I was able to get this to compile
use std::collections::HashSet;
fn do_stuff<I>(iterable: I)
where
I: IntoIterator + Copy,
I::Item: AsRef<str>,
{
// Count stuff
for s in iterable {
prepare(s.as_ref());
}
// Process stuff
for s in iterable {
process(s.as_ref());
}
}
fn prepare(s: &str) {
println!("prepare: {}", s)
}
fn process(s: &str) {
println!("process: {}", s)
}
#[cfg(test)]
mod test_cluster {
use super::*;
#[test]
fn doit() {
let vec: Vec<String> = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let set = vec.iter().cloned().collect::<HashSet<_>>();
do_stuff(&vec);
do_stuff(&set);
}
}
which had this output
---- simple::test_cluster::doit stdout ----
prepare: a
prepare: b
prepare: c
process: a
process: b
process: c
prepare: c
prepare: b
prepare: a
process: c
process: b
process: a

IntoIterator is not only implemented by the collection types themselves, but in most cases (including Vec and HashSet) it is also implemented by their borrows (yielding an iterator of borrowed items). Moreover, immutable borrows are always Copy. So you can do:
fn do_stuff<I>(iterable: I)
where
I: IntoIterator + Copy,
I::Item: AsRef<str>,
{
// Count stuff
for s in iterable {
prepare(s);
}
// Process stuff
for s in iterable {
process(s);
}
}
And this would then be invoked by passing in a borrow of the relevant collection:
let vec = vec!["a", "b", "c"];
let set = vec.iter().cloned().collect::<HashSet<_>>();
do_stuff(&vec);
do_stuff(&set);
Playground.
However, depending on your requirements (whether all items must first be prepared before any can be processed), it may be possible in this case to combine the preparation and processing into a single pass of the iterator.

Iterators over containers can be cloned if you want to iterate the container twice, so accepting an IntoIterator + Clone should work for you. Example code:
fn do_stuff<I>(values: I)
where
I: IntoIterator + Clone,
{
// Count stuff
for s in values.clone() {
prepare(s);
}
// Process stuff
for s in values {
process(s);
}
}
You can now pass in e.g. either a hash set or a vector, and both of them can be iterated twice:
let vec = vec!["a", "b", "c"];
let set: HashSet<_> = vec.iter().cloned().collect();
do_stuff(vec);
do_stuff(set);
(Playground)

Related

Returning iterator from weak references for mapping and modifying values

I'm trying quite complex stuff with Rust where I need the following attributes, and am fighting the compiler.
Object which itself lives from start to finish of application, however, where internal maps/vectors could be modified during application lifetime
Multiple references to object that can read internal maps/vectors of an object
All single threaded
Multiple nested iterators which are map/modified in lazy manner to perform fast and complex calculations (see example below)
A small example, which already causes problems:
use std::cell::RefCell;
use std::rc::Rc;
use std::sync::Weak;
pub struct Holder {
array_ref: Weak<RefCell<Vec<isize>>>,
}
impl Holder {
pub fn new(array_ref: Weak<RefCell<Vec<isize>>>) -> Self {
Self { array_ref }
}
fn get_iterator(&self) -> impl Iterator<Item = f64> + '_ {
self.array_ref
.upgrade()
.unwrap()
.borrow()
.iter()
.map(|value| *value as f64 * 2.0)
}
}
get_iterator is just one of the implementations of a trait, but even this example already does not work.
The reason for Weak/Rc is to make sure that multiple places points to object (from point (1)) and other place can modify its internals (Vec<isize>).
What is the best way to approach this situation, given that end goal is performance critical?
EDIT:
Person suggested using https://doc.rust-lang.org/std/cell/struct.Ref.html#method.map
But unfortunately still can't get - if I should also change return type - or maybe the closure function is wrong here
fn get_iterator(&self) -> impl Iterator<Item=f64> + '_ {
let x = self.array_ref.upgrade().unwrap().borrow();
let map1 = Ref::map(x, |x| &x.iter());
let map2 = Ref::map(map1, |iter| &iter.map(|y| *y as f64 * 2.0));
map2
}
IDEA say it has wrong return type
the trait `Iterator` is not implemented for `Ref<'_, Map<std::slice::Iter<'_, isize>, [closure#src/bin/main.rs:30:46: 30:65]>>`
This won't work because self.array_ref.upgrade() creates a local temporary Arc value, but the Ref only borrows from it. Obviously, you can't return a value that borrows from a local.
To make this work you need a second structure to own the Arc, which can implement Iterator in this case since the produced items aren't references:
pub struct HolderIterator(Arc<RefCell<Vec<isize>>>, usize);
impl Iterator for HolderIterator {
type Item = f64;
fn next(&mut self) -> Option<f64> {
let r = self.0.borrow().get(self.1)
.map(|&y| y as f64 * 2.0);
if r.is_some() {
self.1 += 1;
}
r
}
}
// ...
impl Holder {
// ...
fn get_iterator<'a>(&'a self) -> Option<impl Iterator<Item=f64>> {
self.array_ref.upgrade().map(|rc| HolderIterator(rc, 0))
}
}
Alternatively, if you want the iterator to also weakly-reference the value contained within, you can have it hold a Weak instead and upgrade on each next() call. There are performance implications, but this also makes it easier to have get_iterator() be able to return an iterator directly instead of an Option, and the iterator written so that a failed upgrade means the sequence has ended:
pub struct HolderIterator(Weak<RefCell<Vec<isize>>>, usize);
impl Iterator for HolderIterator {
type Item = f64;
fn next(&mut self) -> Option<f64> {
let r = self.0.upgrade()?
.borrow()
.get(self.1)
.map(|&y| y as f64 * 2.0);
if r.is_some() {
self.1 += 1;
}
r
}
}
// ...
impl Holder {
// ...
fn get_iterator<'a>(&'a self) -> impl Iterator<Item=f64> {
HolderIterator(Weak::clone(&self.array_ref), 0)
}
}
This will make it so that you always get an iterator, but it's empty if the Weak is dead. The Weak can also die during iteration, at which point the sequence will abruptly end.

Multiple Mutable Borrows from Struct Hashmap

Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.
This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.

How can you use an immutable Option by reference that contains a mutable reference?

Here's a Thing:
struct Thing(i32);
impl Thing {
pub fn increment_self(&mut self) {
self.0 += 1;
println!("incremented: {}", self.0);
}
}
And here's a function that tries to mutate a Thing and returns either true or false, depending on if a Thing is available:
fn try_increment(handle: Option<&mut Thing>) -> bool {
if let Some(t) = handle {
t.increment_self();
true
} else {
println!("warning: increment failed");
false
}
}
Here's a sample of usage:
fn main() {
try_increment(None);
let mut thing = Thing(0);
try_increment(Some(&mut thing));
try_increment(Some(&mut thing));
try_increment(None);
}
As written, above, it works just fine (link to Rust playground). Output below:
warning: increment failed
incremented: 1
incremented: 2
warning: increment failed
The problem arises when I want to write a function that mutates the Thing twice. For example, the following does not work:
fn try_increment_twice(handle: Option<&mut Thing>) {
try_increment(handle);
try_increment(handle);
}
fn main() {
try_increment_twice(None);
let mut thing = Thing(0);
try_increment_twice(Some(&mut thing));
try_increment_twice(None);
}
The error makes perfect sense. The first call to try_increment(handle) gives ownership of handle away and so the second call is illegal. As is often the case, the Rust compiler yields a sensible error message:
|
24 | try_increment(handle);
| ------ value moved here
25 | try_increment(handle);
| ^^^^^^ value used here after move
|
In an attempt to solve this, I thought it would make sense to pass handle by reference. It should be an immutable reference, mind, because I don't want try_increment to be able to change handle itself (assigning None to it, for example) only to be able to call mutations on its value.
My problem is that I couldn't figure out how to do this.
Here is the closest working version that I could get:
struct Thing(i32);
impl Thing {
pub fn increment_self(&mut self) {
self.0 += 1;
println!("incremented: {}", self.0);
}
}
fn try_increment(handle: &mut Option<&mut Thing>) -> bool {
// PROBLEM: this line is allowed!
// (*handle) = None;
if let Some(ref mut t) = handle {
t.increment_self();
true
} else {
println!("warning: increment failed");
false
}
}
fn try_increment_twice(mut handle: Option<&mut Thing>) {
try_increment(&mut handle);
try_increment(&mut handle);
}
fn main() {
try_increment_twice(None);
let mut thing = Thing(0);
try_increment_twice(Some(&mut thing));
try_increment_twice(None);
}
This code runs, as expected, but the Option is now passed about by mutable reference and that is not what I want:
I'm allowed to mutate the Option by reassigning None to it, breaking all following mutations. (Uncomment line 12 ((*handle) = None;) for example.)
It's messy: There are a whole lot of extraneous &mut's lying about.
It's doubly messy: Heaven only knows why I must use ref mut in the if let statement while the convention is to use &mut everywhere else.
It defeats the purpose of having the complicated borrow-checking and mutability checking rules in the compiler.
Is there any way to actually achieve what I want: passing an immutable Option around, by reference, and actually being able to use its contents?
You can't extract a mutable reference from an immutable one, even a reference to its internals. That's kind of the point! Multiple aliases of immutable references are allowed so, if Rust allowed you to do that, you could have a situation where two pieces of code are able to mutate the same data at the same time.
Rust provides several escape hatches for interior mutability, for example the RefCell:
use std::cell::RefCell;
fn try_increment(handle: &Option<RefCell<Thing>>) -> bool {
if let Some(t) = handle {
t.borrow_mut().increment_self();
true
} else {
println!("warning: increment failed");
false
}
}
fn try_increment_twice(handle: Option<RefCell<Thing>>) {
try_increment(&handle);
try_increment(&handle);
}
fn main() {
let mut thing = RefCell::new(Thing(0));
try_increment_twice(Some(thing));
try_increment_twice(None);
}
TL;DR: The answer is No, I can't.
After the discussions with #Peter Hall and #Stargateur, I have come to understand why I need to use &mut Option<&mut Thing> everywhere. RefCell<> would also be a feasible work-around but it is no neater and does not really achieve the pattern I was originally seeking to implement.
The problem is this: if one were allowed to mutate the object for which one has only an immutable reference to an Option<&mut T> one could use this power to break the borrowing rules entirely. Concretely, you could, essentially, have many mutable references to the same object because you could have many such immutable references.
I knew there was only one mutable reference to the Thing (owned by the Option<>) but, as soon as I started taking references to the Option<>, the compiler no longer knew that there weren't many of those.
The best version of the pattern is as follows:
fn try_increment(handle: &mut Option<&mut Thing>) -> bool {
if let Some(ref mut t) = handle {
t.increment_self();
true
}
else {
println!("warning: increment failed");
false
}
}
fn try_increment_twice(mut handle: Option<&mut Thing>) {
try_increment(&mut handle);
try_increment(&mut handle);
}
fn main() {
try_increment_twice(None);
let mut thing = Thing(0);
try_increment_twice(Some(&mut thing));
try_increment_twice(None);
}
Notes:
The Option<> holds the only extant mutable reference to the Thing
try_increment_twice() takes ownership of the Option<>
try_increment() must take the Option<> as &mut so that the compiler knows that it has the only mutable reference to the Option<>, during the call
If the compiler knows that try_increment() has the only mutable reference to the Option<> which holds the unique mutable reference to the Thing, the compiler knows that the borrow rules have not been violated.
Another Experiment
The problem of the mutability of Option<> remains because one can call take() et al. on a mutable Option<>, breaking everything following.
To implement the pattern that I wanted, I need something that is like an Option<> but, even if it is mutable, it cannot be mutated. Something like this:
struct Handle<'a> {
value: Option<&'a mut Thing>,
}
impl<'a> Handle<'a> {
fn new(value: &'a mut Thing) -> Self {
Self {
value: Some(value),
}
}
fn empty() -> Self {
Self {
value: None,
}
}
fn try_mutate<T, F: Fn(&mut Thing) -> T>(&mut self, mutation: F) -> Option<T> {
if let Some(ref mut v) = self.value {
Some(mutation(v))
}
else {
None
}
}
}
Now, I thought, I can pass around &mut Handle's all day long and know that someone who has a Handle can only mutate its contents, not the handle itself. (See Playground)
Unfortunately, even this gains nothing because, if you have a mutable reference, you can always reassign it with the dereferencing operator:
fn try_increment(handle: &mut Handle) -> bool {
if let Some(_) = handle.try_mutate(|t| { t.increment_self() }) {
// This breaks future calls:
(*handle) = Handle::empty();
true
}
else {
println!("warning: increment failed");
false
}
}
Which is all fine and well.
Bottom line conclusion: just use &mut Option<&mut T>

Iterating over named regex groups in Rust

I wish to extract all named groups from a match into a HashMap and I'm running into a "does not live long enough" error while trying to compile this code:
extern crate regex;
use std::collections::HashMap;
use regex::Regex;
pub struct Route {
regex: Regex,
}
pub struct Router<'a> {
pub namespace_seperator: &'a str,
routes: Vec<Route>,
}
impl<'a> Router<'a> {
// ...
pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
for route in &self.routes {
if route.regex.is_match(path) {
let mut hash = HashMap::new();
for cap in route.regex.captures_iter(path) {
for (name, value) in cap.iter_named() {
hash.insert(name, value.unwrap());
}
}
return Some(hash);
}
}
None
}
}
fn main() {}
Here's the error output:
error: `cap` does not live long enough
--> src/main.rs:23:42
|>
23 |> for (name, value) in cap.iter_named() {
|> ^^^
note: reference must be valid for the anonymous lifetime #1 defined on the block at 18:79...
--> src/main.rs:18:80
|>
18 |> pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
|> ^
note: ...but borrowed value is only valid for the for at 22:16
--> src/main.rs:22:17
|>
22 |> for cap in route.regex.captures_iter(path) {
|> ^
Obviously I still have a thing or two to learn about Rust lifetimes.
Let's follow the lifetime lines:
route.regex.captures_iter(path) creates a FindCapture<'r, 't> where the lifetime 'r is that of route.regex and the lifetime 't is that of path
this iterator yields a Captures<'t>, only linked to the lifetime of path
whose method iter_named(&'t self) yields a SubCapture<'t> itself linked to the lifetime of path and the lifetime of the cap
this iterator yields a (&'t str, Option<&'t str>) so that both keys and values of the HashMap are linked to the lifetime of path and the lifetime of the cap
Therefore, it is unfortunately impossible to have the HashMap outlive the cap variable as this variable is used by the code as a "marker" to keep the buffers containing the groups alive.
I am afraid that the only solution without significant re-structuring is to return a HashMap<String, String>, as unsatisfying as it is. It also occurs to me that a single capture group may match multiple times, not sure if you want to bother with this.
Matthieu M. already explained the lifetime situation well. The good news is that the regex crate recognized the problem and there's a fix in the pipeline for 1.0.
As stated in the commit message:
It was always possible to work around this by using indices.
It is also possible to work around this by using Regex::capture_names, although it's a bit more nested this way:
pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
for route in &self.routes {
if let Some(captures) = route.regex.captures(path) {
let mut hash = HashMap::new();
for name in route.regex.capture_names() {
if let Some(name) = name {
if let Some(value) = captures.name(name) {
hash.insert(name, value);
}
}
}
return Some(hash);
}
}
None
}
Note that I also removed the outer is_match — it's inefficient to run the regex once and then again.

What's the fastest idiomatic way to mutate multiple struct fields at the same time?

Many libraries allow you to define a type which implements a given trait to be used as a callback handler. This requires you to lump all of the data you'll need to handle the event together in a single data type, which complicates borrows.
For instance, mio allows you to implement Handler and provide your struct when you run the EventLoop. Consider an example with these trivialized data types:
struct A {
pub b: Option<B>
};
struct B;
struct MyHandlerType {
pub map: BTreeMap<Token, A>,
pub pool: Pool<B>
}
Your handler has a map from Token to items of type A. Each item of type A may or may not already have an associated value of type B. In the handler, you want to look up the A value for a given Token and, if it doesn't already have a B value, get one out of the handler's Pool<B>.
impl Handler for MyHandlerType {
fn ready(&mut self, event_loop: &mut EventLoop<MyHandlerType>,
token: Token, events: EventSet) {
let a : &mut A = self.map.get_mut(token).unwrap();
let b : B = a.b.take().or_else(|| self.pool.new()).unwrap();
// Continue working with `a` and `b`
// ...
}
}
In this arrangement, even though it's intuitively possible to see that self.map and self.pool are distinct entities, the borrow checker complains that self is already borrowed (via self.map) when we go to access self.pool.
One possible approach to this would be to wrap each field in MyHandlerType in Option<>. Then, at the start of the method call, take() those values out of self and restore them at the end of the call:
struct MyHandlerType {
// Wrap these fields in `Option`
pub map: Option<BTreeMap<Token, A>>,
pub pool: Option<Pool<B>>
}
// ...
fn ready(&mut self, event_loop: &mut EventLoop<MyHandlerType>,
token: Token, events: EventSet) {
// Move these values out of `self`
let map = self.map.take().unwrap();
let pool = self.pool.take().unwrap();
let a : &mut A = self.map.get_mut(token).unwrap();
let b : B = a.b.take().or_else(|| self.pool.new()).unwrap();
// Continue working with `a` and `b`
// ...
// Restore these values to `self`
self.map = Some(map);
self.pool = Some(pool);
}
This works but feels a bit kluge-y. It also introduces the overhead of moving values in and out of self for each method call.
What's the best way to do this?
To get simultaneous mutable references to different parts of the struct, use destructuring. Example here.
struct Pair {
x: Vec<u32>,
y: Vec<u32>,
}
impl Pair {
fn test(&mut self) -> usize {
let Pair{ ref mut x, ref mut y } = *self;
// Both references coexist now
return x.len() + y.len();
}
}
fn main() {
let mut nums = Pair {
x: vec![1, 2, 3],
y: vec![4, 5, 6, 7],
};
println!("{}", nums.test());
}

Resources