How is it that I can circumvent "cannot borrow as mutable more than once at a time" with semantically equivalent code? - rust

I have the following for a merge sort problem with huge files:
struct MergeIterator<'a, T> where T: Copy {
one: &'a mut dyn Iterator<Item=T>,
two: &'a mut dyn Iterator<Item=T>,
a: Option<T>,
b: Option<T>
}
impl<'m, T> MergeIterator<'m, T> where T: Copy {
pub fn new(i1: &'m mut dyn Iterator<Item=T>,
i2: &'m mut dyn Iterator<Item=T>) -> MergeIterator<'m, T> {
let mut m = MergeIterator {one:i1, two:i2, a: None, b: None};
m.a = m.one.next();
m.b = m.two.next();
m
}
}
This seems to make rustc happy. However, I started with this (imho) less clumsy body of the new() function:
MergeIterator {one:i1, two:i2, a: i1.next(), b: i2.next()}
and got harsh feedback from the compiler saying
cannot borrow `*i1` as mutable more than once at a time
and likewise for i2.
I'd like to understand where the semantic difference is between initializing the data elements through the m.one reference vs the i1 argument? Why must I write clumsy imperative code here to achieve what I need?

This will probably be clearer if you write it in lines so that the sequence of operations is visible:
MergeIterator {
one: i1,
two: i2,
a: i1.next(),
b: i2.next(),
}
You're giving i1 to the new struct, you don't have it anymore to call next.
The solution is to change the order of operations to call next first before giving away the mutable reference:
MergeIterator {
a: i1.next(),
b: i2.next(),
one: i1,
two: i2,
}
To make it clearer, you must understand that i1.next() borrows i1 only for the time of the function call while i1 gives away the mutable reference. Reversing the order isn't equivalent.

Related

Value of type `Vec<&str>` cannot be built from `std::iter::Iterator<Item=&&str>`

Why does this code work properly
fn main() {
let v1 = vec!["lemonade", "lemon", "type", "lid"];
println!("{:?}", v1.iter().filter(|item| item.starts_with("l")).collect::<Vec<_>>());
}
While this code gets me an error, I kinda get the sense why this isn't working and I get how to fix it, but I don't really understand what type is it returning so I can replace the "_" to something not that generic
fn main() {
let v1 = vec!["lemonade", "lemon", "type", "lid"];
println!("{:?}", v1.iter().filter(|item| item.starts_with("l")).collect::<Vec<&str>>());
}
The error
error[E0277]: a value of type `Vec<&str>` cannot be built from an iterator over elements of
type `&&str`
--> src\main.rs:3:69
|
3 | println!("{:?}", v1.iter().filter(|item| item.starts_with("l")).collect::
<Vec<&str>>());
| ^^^^^^^ value of type
`Vec<&str>` cannot be built from `std::iter::Iterator<Item=&&str>`
|
= help: the trait `FromIterator<&&str>` is not implemented for `Vec<&str>`
= help: the trait `FromIterator<T>` is implemented for `Vec<T>`
note: required by a bound in `collect`
--> C:\Users\Mululi\.rustup\toolchains\stable-x86_64-pc-windows-
msvc\lib/rustlib/src/rust\library\core\src\iter\traits\iterator.rs:1788:19
|
1788 | fn collect<B: FromIterator<Self::Item>>(self) -> B
| ^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `collect`
I couldn't fully understand what the compiler meant by that
Explaining the error
Let's look at the types in detail. It will take a couple of documentation derefs to get to the bottom so bear with me.
let v1 = vec!["lemonade", "lemon", "type", "lid"];
v1 is a Vec<&str>.
v1.iter()
Vec doesn't have an iter method, but it implements Deref<Target = [T]> which does:
pub fn iter(&self) -> Iter<'_, T>
Iter here is a struct named std::slice::Iter. What kind of iterator is it? We need to take a look at its Iterator implementation:
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
}
This tells us that the iterator yields item of type &T. Remember that we have a Vec<&str>, so T = &str. That means that what we really have is an Iterator<Item = &&str>. That's the source of the error you're getting.
Fixes
So, how do we get an Iterator<Item = &str> instead of an Iterator<Item = &&str>? There are several ways, including:
1. into_iterator()
Use Vec's implementation of IntoIterator which has Item = T. into_iterator() yields Ts instead of &Ts.
v1.into_iter().filter(...)
This is efficient, but note that it consumes the vector. It doesn't return an iterator that references the items, it actually moves the items out of the vector and into the consuming code. The vector is unuseable after calling into_iter.
2. iter().copied()
If T: Copy you can call copied to turn an Iterator<Item = &T> into an Iterator<Item = T>.
v1.iter().copied().filter(...)
This technique is great because it doesn't consume the vector, and it will work because &str is indeed Copy. All references are Copy thanks to this blanket implementation:
impl<T: ?Sized> Copy for &T {}
3. iter().cloned()
If T: Clone you can call cloned to clone all the items and turn an Iterator<Item = &T> into an Iterator<Item = T>.
v1.iter().cloned().filter(...)
Anything that's Copy is also Clone, so you could indeed clone the &str references.
You need to collect into Vec<&&str>, not Vec<&str>. The original vector is Vec<&str>, and iter() produces an iterator over references to the elements, so the iterator yields &&strs.

What does this Rust Closure argument syntax mean?

I modified code found on the internet to create a function that obtains the statistical mode of any Hashable type that implements Eq, but I do not understand some of the syntax. Here is the function:
use std::hash::Hash;
use std::collections::HashMap;
pub fn mode<'a, I, T>(items: I) -> &'a T
where I: IntoIterator<Item = &'a T>,
T: Hash + Clone + Eq, {
let mut occurrences: HashMap<&T, usize> = HashMap::new();
for value in items.into_iter() {
*occurrences.entry(value).or_insert(0) += 1;
}
occurrences
.into_iter()
.max_by_key(|&(_, count)| count)
.map(|(val, _)| val)
.expect("Cannot compute the mode of zero items")
}
(I think requiring Clone may be overkill.)
The syntax I do not understand is in the closure passed to map_by_key:
|&(_, count)| count
What is the &(_, count) doing? I gather the underscore means I can ignore that parameter. Is this some sort of destructuring of a tuple in a parameter list? Does this make count take the reference of the tuple's second item?
.max_by_key(|&(_, count)| count) is equivalent to .max_by_key(f) where f is this:
fn f<T>(t: &(T, usize)) -> usize {
(*t).1
}
f() could also be written using pattern matching, like this:
fn f2<T>(&(_, count): &(T, usize)) -> usize {
count
}
And f2() is much closer to the first closure you're asking about.
The second closure is essentially the same, except there is no reference slightly complicating matters.

What is the proper way to coerce an iterator to return a value instead of a reference (or vice versa)?

The general setup is I have an array of values I'd like to map() and then chain() with 1 additional value. I've learned from this answer that the proper way to construct that final value is to use std::iter::once. This works and eliminated the below problem, but I would still like to understand it better.
In my broken, likely rust-anti-pattern-riddled example, I was using an array of a single element and then calling into_iter(). This produced a value / reference type-mismatch in the chain.
Question: What is the Rust-idiomatic mechanism for correcting this value / reference mismatch? Particularly if clone and copy are unavailable.
Background: Why is there a type mis-match to begin with?
This much I believe I understand. Based on the definition of std::iter::Map, the item type for the iterator is type Item = B where B is constrained by F: FnMut(<I as Iterator>::Item) -> B (i.e. the mapped type). However array defines the following 2 IntoIterator implementations, both of which appear to produce references.
impl<'a, const N: usize, T> IntoIterator for &'a [T; N] where
[T; N]: LengthAtMost32,
type Item = &'a T
impl<'a, const N: usize, T> IntoIterator for &'a mut [T; N] where
[T; N]: LengthAtMost32,
type Item = &'a mut T
Example demonstrating the issue:
#[derive(PartialEq, Eq, Clone, Copy)]
enum Enum1 {
A, B, C
}
#[derive(PartialEq, Eq, Clone, Copy)]
enum Enum2 {
X, Y, Z
}
struct Data {
// Other data omitted
e1: Enum1,
e2: Enum2
}
struct Consumer {
// Other data omitted
/** Predicate which evaluates if this consumer can consume given Data */
consumes: Box<dyn Fn(&Data) -> bool>
}
fn main() {
// Objective: 3 consumers which consume data with A, B, and X respectively
let v: Vec<Consumer> = [Enum1::A, Enum1::B].iter()
.map(|&e1| Consumer { consumes: Box::new(move |data| data.e1 == e1) })
// This chain results in an iterator type-mismatch:
// expected &Consumer, found Consumer
.chain([Consumer { consumes: Box::new(move |data| data.e2 == Enum2::X) }].into_iter())
.collect(); // Fails as well due to the chain failure
}
Error:
error[E0271]: type mismatch resolving `<std::slice::Iter<'_, Consumer> as std::iter::IntoIterator>::Item == Consumer`
--> src/main.rs:52:10
|
52 | .chain([Consumer { consumes: Box::new(move |data| data.e2 == Enum2::X) }].into_iter())
| ^^^^^ expected reference, found struct `Consumer`
|
= note: expected type `&Consumer`
found type `Consumer`
Rust playground example.
There is a long-standing issue regarding this. The technical details are a bit heavy, but essentially, due to underlying, technical reasons, you cannot take ownership of a fixed-size array and return owned references without a lot of hocus pocus. This becomes obvious when you think about what a fixed-size array is and how it is stored in memory, and how you can get elements out without cloning them.
As a result, due to the implementations you found already, you can only get borrowed references. You can bypass this with arrayvec (as they have a sound implementation of IntoIterator for ArrayVec with owned types), or you can require that all your T: Clone and deal with it that way, at a cost of extra items in memory (temporarily; 90% of the time the compiler optimizes this away).

How do I build a Cacher in Rust without relying on the Copy trait?

I am trying to implement a Cacher as mentioned in Chapter 13 of the Rust book and running into trouble.
My Cacher code looks like:
use std::collections::HashMap;
use std::hash::Hash;
pub struct Cacher<T, K, V>
where
T: Fn(K) -> V,
{
calculation: T,
values: HashMap<K, V>,
}
impl<T, K: Eq + Hash, V> Cacher<T, K, V>
where
T: Fn(K) -> V,
{
pub fn new(calculation: T) -> Cacher<T, K, V> {
Cacher {
calculation,
values: HashMap::new(),
}
}
pub fn value(&mut self, k: K) -> &V {
let result = self.values.get(&k);
match result {
Some(v) => {
return v;
}
None => {
let v = (self.calculation)(k);
self.values.insert(k, v);
&v
}
}
}
}
and my test case for this lib looks like:
mod cacher;
#[cfg(test)]
mod tests {
use cacher::Cacher;
#[test]
fn repeated_runs_same() {
let mut cacher = Cacher::new(|x| x);
let run1 = cacher.value(5);
let run2 = cacher.value(7);
assert_ne!(run1, run2);
}
}
I ran into the following problems when running my test case:
error[E0499]: cannot borrow cacher as mutable more than once at a time
Each time I make a run1, run2 value it is trying to borrow cacher as a mutable borrow. I don't understand why it is borrowing at all - I thought cacher.value() should be returning a reference to the item that is stored in the cacher which is not a borrow.
error[E0597]: v does not live long enough pointing to the v I return in the None case of value(). How do I properly move the v into the HashMap and give it the same lifetime as the HashMap? Clearly the lifetime is expiring as it returns, but I want to just return a reference to it to use as the return from value().
error[E0502]: cannot borrowself.valuesas mutable because it is also borrowed as immutable in value(). self.values.get(&k) is an immutable borrow and self.values.insert(k,v) is a mutable borrow - though I thought .get() was an immutable borrow and .insert() was a transfer of ownership.
and a few other errors related to moving which I should be able to handle separately. These are much more fundamental errors that indicate I have misunderstood the idea of ownership in Rust, yet rereading the segment of the book doesn't make clear to me what I missed.
I think there a quite a few issues to look into here:
First, for the definition of the function value(&mut self, k: K) -> &V ; the compiler will insert the lifetimes for you so that it becomes value(&'a mut self, k: K) -> &'a V. This means, the lifetime of the self cannot shrink for the sake of the function, because there is reference coming out of the function with the same lifetime, and will live for as long as the scope. Since it is a mutable reference, you cannot borrow it again. Hence the error error[E0499]: cannot borrow cacher as mutable more than once at a time.
Second, you call the calculation function that returns the value within some inner scope of the function value() and then you return the reference to it, which is not possible. You expect the reference to live longer than the the referent. Hence the error error[E0597]: v does not live long enough
The third error is a bit involved. You see, let result = self.values.get(&k); as mentioned in the first statement, causes k to be held immutably till the end of the function. result returned will live for as long your function value(), which means you cannot take a borrow(mutable) in the same scope, giving the error
error[E0502]: cannot borrow self.values as mutable because it is also borrowed as immutable in value() self.values.get(&k)
Your K needs to be a Clone, reason being k will be moved into the function calculation, rendering it unusable during insert.
So with K as a Clone, the Cacher implementation will be:
impl<T, K: Eq + Hash + Clone, V> Cacher<T, K, V>
where
T: Fn(K) -> V,
{
pub fn new(calculation: T) -> Cacher<T, K, V> {
Cacher {
calculation,
values: hash_map::HashMap::new(),
}
}
pub fn value(&mut self, k: K) -> &V {
if self.values.contains_key(&k) {
return &self.values[&k];
}
self.values.insert(k.clone(), (self.calculation)(k.clone()));
self.values.get(&k).unwrap()
}
}
This lifetimes here are based on the branching control flow. The if self.values.contains_key ... block always returns, hence the code after if block can only be executed when if self.values.contains_key ... is false. The tiny scope created for if condition, will only live within the condition check, i.e reference taken (and returned) for if self.values.contains_key(... will go away with this tiny scope.
For more please refer NLL RFC
As mentioned by #jmb in his answer, for your test to work, V will need to be a Clone (impl <... V:Clone> Cacher<T, K, V>) to return by value or use shared ownership like Rc to avoid the cloning cost.
eg.
fn value(&mut self, k: K) -> V { ..
fn value(&mut self, k: K) -> Rc<V> { ..
Returning a reference to a value is the same thing as borrowing that value. Since that value is owned by the cacher, it implicitly borrows the cacher too. This makes sense: if you take a reference to a value inside the cacher then destroy the cacher, what happens to your reference? Note also that if you modify the cacher (e.g. by inserting a new element), this could reallocate the storage, which would invalidate any references to values stored inside.
You need your values to be at least Clone so that Cacher::value can return by value instead of by reference. You can use Rc if your values are too expensive to clone and you are ok with all callers getting the same instance.
The naive way to get the instance that was stored in the HashMap as opposed to the temporary you allocated to build it would be to call self.values.get (k).unwrap() after inserting the value in the map. In order to avoid the cost of computing twice the location of the value in the map, you can use the Entry interface:
pub fn value(&mut self, k: K) -> Rc<V> {
self.values.entry (&k).or_insert_with (|| Rc::new (self.calculation (k)))
}
I believe my answer to point 2 also solves this point.

How do you create a Box<dyn Trait>, or a boxed unsized value in general?

I have the following code
extern crate rand;
use rand::Rng;
pub struct Randomizer {
rand: Box<Rng>,
}
impl Randomizer {
fn new() -> Self {
let mut r = Box::new(rand::thread_rng()); // works
let mut cr = Randomizer { rand: r };
cr
}
fn with_rng(rng: &Rng) -> Self {
let mut r = Box::new(*rng); // doesn't work
let mut cr = Randomizer { rand: r };
cr
}
}
fn main() {}
It complains that
error[E0277]: the trait bound `rand::Rng: std::marker::Sized` is not satisfied
--> src/main.rs:16:21
|
16 | let mut r = Box::new(*rng);
| ^^^^^^^^ `rand::Rng` does not have a constant size known at compile-time
|
= help: the trait `std::marker::Sized` is not implemented for `rand::Rng`
= note: required by `<std::boxed::Box<T>>::new`
I don't understand why it requires Sized on Rng when Box<T> doesn't impose this on T.
More about the Sized trait and bound - it's a rather special trait, which is implicitly added to every function, which is why you don't see it listed in the prototype for Box::new:
fn new(x: T) -> Box<T>
Notice that it takes x by value (or move), so you need to know how big it is to even call the function.
In contrast, the Box type itself does not require Sized; it uses the (again special) trait bound ?Sized, which means "opt out of the default Sized bound":
pub struct Box<T> where T: ?Sized(_);
If you look through, there is one way to create a Box with an unsized type:
impl<T> Box<T> where T: ?Sized
....
unsafe fn from_raw(raw: *mut T) -> Box<T>
so from unsafe code, you can create one from a raw pointer. From then on, all the normal things work.
The problem is actually quite simple: you have a trait object, and the only two things you know about this trait object are:
its list of available methods
the pointer to its data
When you request to move this object to a different memory location (here on the heap), you are missing one crucial piece of information: its size.
How are you going to know how much memory should be reserved? How many bits to move?
When an object is Sized, this information is known at compile-time, so the compiler "injects" it for you. In the case of a trait-object, however, this information is unknown (unfortunately), and therefore this is not possible.
It would be quite useful to make this information available and to have a polymorphic move/clone available, but this does not exist yet and I do not remember any proposal for it so far and I have no idea what the cost would be (in terms of maintenance, runtime penalty, ...).
I also want to post the answer, that one way to deal with this situation is
fn with_rng<TRand: Rng>(rng: &TRand) -> Self {
let r = Box::new(*rng);
Randomizer { rand: r }
}
Rust's monomorphism will create the necessary implementation of with_rng replacing TRand by a concrete sized type. In addition, you may add a trait bound requiring TRand to be Sized.

Resources