Struggling with closures and lifetimes in Rust

Struggling with closures and lifetimes in Rust - rust

I'm trying to port a little benchmark from F# to Rust. The F# code looks like this:
let inline iterNeighbors f (i, j) =
f (i-1, j)
f (i+1, j)
f (i, j-1)
f (i, j+1)
let rec nthLoop n (s1: HashSet<_>) (s2: HashSet<_>) =
match n with
| 0 -> s1
| n ->
let s0 = HashSet(HashIdentity.Structural)
let add p =
if not(s1.Contains p || s2.Contains p) then
ignore(s0.Add p)
Seq.iter (fun p -> iterNeighbors add p) s1
nthLoop (n-1) s0 s1
let nth n p =
nthLoop n (HashSet([p], HashIdentity.Structural)) (HashSet(HashIdentity.Structural))
(nth 2000 (0, 0)).Count
It computes the nth-nearest neighbor shells from an initial vertex in a potentially infinite graph. I used something similar during my PhD to study amorphous materials.
I've spent many hours trying and failing to port this to Rust. I have managed to get one version working but only by manually inlining the closure and converting the recursion into a loop with local mutables (yuk!).
I tried writing the iterNeighbors function like this:
use std::collections::HashSet;
fn iterNeighbors<F>(f: &F, (i, j): (i32, i32)) -> ()
where
F: Fn((i32, i32)) -> (),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. I seem to have to double bracket things: is that correct?
I tried writing a recursive version like this:
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return &s1;
} else {
let mut s0 = HashSet::new();
for &p in s1 {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
}
return &nthLoop(n - 1, s0, s1);
}
}
Note that I haven't even bothered with the call to iterNeighbors yet.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
The caller would look something like this:
fn nth<'a>(n: i32, p: (i32, i32)) -> &'a HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
return &nthLoop(n, &s1, s0);
}
I gave up on that and wrote it as a while loop with mutable locals instead:
fn nth<'a>(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let mut n = n;
let mut s0 = HashSet::new();
let mut s1 = HashSet::new();
let mut s2 = HashSet::new();
s1.insert(p);
while n > 0 {
for &p in &s1 {
let add = &|p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
};
iterNeighbors(&add, p);
}
std::mem::swap(&mut s0, &mut s1);
std::mem::swap(&mut s0, &mut s2);
s0.clear();
n -= 1;
}
return s1;
}
This works if I inline the closure by hand, but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
The main function is then:
fn main() {
let s = nth(2000, (0, 0));
println!("{}", s.len());
}
So... what am I doing wrong? :-)
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?

I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. I seem to have to double bracket things: is that correct?
You need the double brackets because you're passing a 2-tuple to the closure, which matches your original F# code.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
The problem is that you're using references to HashSets when you should just use HashSets directly. Your signature for nthLoop is already correct; you just need to remove a few occurrences of &.
To deallocate s2, you can write drop(s2). Note that Rust doesn't have guaranteed tail calls, so each recursive call will still take a bit of stack space (you can see how much with the mem::size_of function), but the drop call will purge the data on the heap.
The caller would look something like this:
Again, you just need to remove the &'s here.
Note that I haven't even bothered with the call to iterNeighbors yet.
This works if I inline the closure by hand but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
There are three types of closures in Rust: Fn, FnMut and FnOnce. They differ by the type of their self argument. The distinction is important because it puts restrictions on what the closure is allowed to do and on how the caller can use the closure. The Rust book has a chapter on closures that already explains this well.
Your closure needs to mutate s0. However, iterNeighbors is defined as expecting an Fn closure. Your closure cannot implement Fn because Fn receives &self, but to mutate s0, you need &mut self. iterNeighbors cannot use FnOnce, since it needs to call the closure more than once. Therefore, you need to use FnMut.
Also, it's not necessary to pass the closure by reference to iterNeighbors. You can just pass it by value; each call to the closure will only borrow the closure, not consume it.
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
There's no purely functional set implementation in the standard library (maybe there's one on crates.io?). While Rust embraces functional programming, it also takes advantage of its ownership and borrowing system to make imperative programming safer. A functional set would probably impose using some form of reference counting or garbage collection in order to share items across sets.
However, HashSet does implement set-theoretic operations. There are two ways to use them: iterators (difference, symmetric_difference, intersection, union), which generate the sequence lazily, or operators (|, &, ^, -, as listed in the trait implementations for HashSet), which produce new sets containing clones of the values from the source sets.
Here's the working code:
use std::collections::HashSet;
fn iterNeighbors<F>(mut f: F, (i, j): (i32, i32)) -> ()
where
F: FnMut((i32, i32)) -> (),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return s1;
} else {
let mut s0 = HashSet::new();
for &p in &s1 {
let add = |p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
};
iterNeighbors(add, p);
}
drop(s2);
return nthLoop(n - 1, s0, s1);
}
}
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let mut s1 = HashSet::new();
s1.insert(p);
let s2 = HashSet::new();
return nthLoop(n, s1, s2);
}
fn main() {
let s = nth(2000, (0, 0));
println!("{}", s.len());
}

I seem to have to double bracket things: is that correct?
No: the double bracketes are because you've chosen to use tuples and calling a function that takes a tuple requires creating the tuple first, but one can have closures that take multiple arguments, like F: Fn(i32, i32). That is, one could write that function as:
fn iterNeighbors<F>(i: i32, j: i32, f: F)
where
F: Fn(i32, i32),
{
f(i - 1, j);
f(i + 1, j);
f(i, j - 1);
f(i, j + 1);
}
However, it seems that retaining the tuples makes sense for this case.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
No need for references (and hence no need for lifetimes), just pass the data through directly:
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return s1;
} else {
let mut s0 = HashSet::new();
for &p in &s1 {
iterNeighbors(p, |p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
})
}
drop(s2); // guarantees timely deallocation
return nthLoop(n - 1, s0, s1);
}
}
The key here is you can do everything by value, and things passed around by value will of course keep their values around.
However, this fails to compile:
error[E0387]: cannot borrow data mutably in a captured outer variable in an `Fn` closure
--> src/main.rs:21:21
|
21 | s0.insert(p);
| ^^
|
help: consider changing this closure to take self by mutable reference
--> src/main.rs:19:30
|
19 | iterNeighbors(p, |p| {
| ______________________________^
20 | | if !(s1.contains(&p) || s2.contains(&p)) {
21 | | s0.insert(p);
22 | | }
23 | | })
| |_____________^
That is to say, the closure is trying to mutate values it captures (s0), but the Fn closure trait doesn't allow this. That trait can be called in a more flexible manner (when shared), but this imposes more restrictions on what the closure can do internally. (If you're interested, I've written more about this)
Fortunately there's an easy fix: using the FnMut trait, which requires that the closure can only be called when one has unique access to it, but allows the internals to mutate things.
fn iterNeighbors<F>((i, j): (i32, i32), mut f: F)
where
F: FnMut((i32, i32)),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
The caller would look something like this:
Values work here too: returning a reference in that case would be returning a pointer to s0, which lives the stack frame that is being destroyed as the function returns. That is, the reference is pointing to dead data.
The fix is just not using references:
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
return nthLoop(n, s1, s0);
}
This works if I inline the closure by hand but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
(I don't understand what this means, including the compiler error messages you're having trouble with helps us help you.)
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
Depending on exactly what you want, no, e.g. both HashSet and BTreeSet provide various set-theoretic operations as methods which return iterators.
Some small points:
explicit/named lifetimes allow the compiler to reason about the static validity of data, they don't control it (i.e. they allow the compiler to point out when you do something wrong, but language still has the same sort of static resource usage/life-cycle guarantees as C++)
the version with a loop is likely to be more efficient as written, as it reuses memory directly (swapping the sets, plus the s0.clear(), however, the same benefit can be realised with a recursive version by passing s2 down for reuse instead of dropping it.
the while loop could be for _ in 0..n
there's no need to pass closures by reference, but with or without the reference, there's still static dispatch (the closure is a type parameter, not a trait object).
conventionally, closure arguments are last, and not taken by reference, because it makes defining & passing them inline easier to read (e.g. foo(x, |y| bar(y + 1)) instead of foo(&|y| bar(y + 1), x))
the return keyword isn't necessary for trailing returns (if the ; is omitted):
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
nthLoop(n, s1, s0)
}

Related

Rust string comparison same speed as Python . Want to parallelize the program

I am new to rust. I want to write a function which later can be imported into Python as a module using the pyo3 crate.
Below is the Python implementation of the function I want to implement in Rust:
def pcompare(a, b):
letters = []
for i, letter in enumerate(a):
if letter != b[i]:
letters.append(f'{letter}{i + 1}{b[i]}')
return letters
The first Rust implemention I wrote looks like this:
use pyo3::prelude::*;
#[pyfunction]
fn compare_strings_to_vec(a: &str, b: &str) -> PyResult<Vec<String>> {
if a.len() != b.len() {
panic!(
"Reads are not the same length!
First string is length {} and second string is length {}.",
a.len(), b.len());
}
let a_vec: Vec<char> = a.chars().collect();
let b_vec: Vec<char> = b.chars().collect();
let mut mismatched_chars = Vec::new();
for (mut index,(i,j)) in a_vec.iter().zip(b_vec.iter()).enumerate() {
if i != j {
index += 1;
let mutation = format!("{i}{index}{j}");
mismatched_chars.push(mutation);
}
}
Ok(mismatched_chars)
}
#[pymodule]
fn compare_strings(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(compare_strings_to_vec, m)?)?;
Ok(())
}
Which I builded in --release mode. The module could be imported to Python, but the performance was quite similar to the performance of the Python implementation.
My first question is: Why is the Python and Rust function similar in speed?
Now I am working on a parallelization implementation in Rust. When just printing the result variable, the function works:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
println!("{mutation}");
//mismatched_chars.push(mutation);
}
});
}
However, when I try to push the mutation variable to the mismatched_charsvector:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
//println!("{mutation}");
mismatched_chars.push(mutation);
}
});
}
I get the following error:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
For more information about this error, try `rustc --explain E0596`.
error: could not compile `testing_compare_strings` due to previous error
I tried A LOT of different things. When I do:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<&str> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
mismatched_chars.push(&mutation);
}
});
}
The error becomes:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(&mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0597]: `mutation` does not live long enough
--> src/main.rs:16:35
|
10 | let mut mismatched_chars: Vec<&str> = Vec::new();
| -------------------- lifetime `'1` appears in the type of `mismatched_chars`
...
16 | mismatched_chars.push(&mutation);
| ----------------------^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `mutation` is borrowed for `'1`
17 | }
| - `mutation` dropped here while still borrowed
I suspect that the solution is quite simple, but I cannot see it myself.

You have the right idea with what you are doing, but you will want to try to use an iterator chain with filter and map to remove or convert iterator items into different values. Rayon also provides a collect method similar to regular iterators to convert items into a type T: FromIterator (such as Vec<T>).
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
// Same as with the if statement, but just a little shorter to write
// Plus, it will print out the two values it is comparing if it errors.
assert_eq!(a.len(), b.len(), "Reads are not the same length!");
// Zip the character iterators from a and b together
a.chars().zip(b.chars())
// Iterate with the index of each item
.enumerate()
// Rayon function which turns a regular iterator into a parallel one
.par_bridge()
// Filter out values where the characters are the same
.filter(|(_, (a, b))| a != b)
// Convert the remaining values into an error string
.map(|(index, (a, b))| {
format!("{}{}{}", a, index + 1, b)
})
// Turn the items of this iterator into a Vec (Or any other FromIterator type).
.collect()
}
Rust Playground
Optimizing for speed
On the other hand, if you want speed we need to approach this problem from a different direction. You may have noticed, but the rayon version is quite slow since the cost of spawning a thread and using concurrency structures is orders of magnitude more than just simply comparing the bytes in the original thread. In my benchmarks, I found that even with better workload distribution, additional threads were only helpful on my machine (64GB RAM, 16 cores) when the strings were at least 1-2 million bytes long. Given that you have stated they are typically ~30,000 bytes long I think using rayon (or really any other threading for comparisons of this size) will only slow down your code.
Using criterion for benchmarking, I eventually came to this implementation. It generally gets about 2.8156 µs per run on strings of 30,000 characters with 10 different bytes. For comparison, the code posted in the original question usually gets around 61.156 µs on my system under the same conditions so this should give a ~20x speedup. It can vary a bit, but it consistently got the best results in the benchmark. I'm guessing this should be fast enough to have this step no-longer be the bottleneck in your code.
This key focus of this implementation is to do the comparisons in batches. We can take advantage of the 128bit registers on most CPUs to compare the input in 16 byte batches. Upon an inequality being found, the 16 byte section it covers is re-scanned for the exact position of the discrepancy. This gives a decent boost to performance. I initially thought that a usize would work better, but it seems that was not the case. I also attempted to use the portable_simd nightly feature to write a simd version of this code, but I was unable to match the speed of this code. I suspect this was either due to missed optimizations or a lack of experience to effectively use simd on my part.
I was worried about drops in speed due to alignment of chunks not being enforced for u128 values, but it seems to mostly be a non-issue. First of all, it is generally quite difficult to find allocators which are willing to allocate to an address which is not a multiple of the system word size. Of course, this is due to practicality rather than any actual requirement. When I manually gave it unaligned slices (unaligned for u128s), it is not significantly effected. This is why I do not attempt to enforce that the start index of the slice be aligned to align_of::<u128>().
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
let a_bytes = a.as_bytes();
let b_bytes = b.as_bytes();
let remainder = a_bytes.len() % size_of::<u128>();
// Strongly suggest to the compiler we are iterating though u128
a_bytes
.chunks_exact(size_of::<u128>())
.zip(b_bytes.chunks_exact(size_of::<u128>()))
.enumerate()
.filter(|(_, (a, b))| {
let a_block: &[u8; 16] = (*a).try_into().unwrap();
let b_block: &[u8; 16] = (*b).try_into().unwrap();
u128::from_ne_bytes(*a_block) != u128::from_ne_bytes(*b_block)
})
.flat_map(|(word_index, (a, b))| {
fast_path(a, b).map(move |x| word_index * size_of::<u128>() + x)
})
.chain(
fast_path(
&a_bytes[a_bytes.len() - remainder..],
&b_bytes[b_bytes.len() - remainder..],
)
.map(|x| a_bytes.len() - remainder + x),
)
.map(|index| {
format!(
"{}{}{}",
char::from(a_bytes[index]),
index + 1,
char::from(b_bytes[index])
)
})
.collect()
}
/// Very similar to regular route, but with nothing fancy, just get the indices of the overlays
#[inline(always)]
fn fast_path<'a>(a: &'a [u8], b: &'a [u8]) -> impl 'a + Iterator<Item = usize> {
a.iter()
.zip(b.iter())
.enumerate()
.filter_map(|(x, (a, b))| (a != b).then_some(x))
}

You cannot directly access the field mismatched_chars in a multithreading environment.
You can use Arc<RwLock> to access the field in multithreading.
use rayon::prelude::*;
use std::sync::{Arc, RwLock};
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mismatched_chars: Arc<RwLock<Vec<String>>> = Arc::new(RwLock::new(Vec::new()));
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y);
mismatched_chars
.write()
.expect("could not acquire write lock")
.push(mutation);
}
});
for mismatch in mismatched_chars
.read()
.expect("could not acquire read lock")
.iter()
{
eprintln!("{}", mismatch);
}
}

How do I test this swap number function?

I'm currently writing a simple function to swap numbers in Rust:
fn swapnumbers() {
let a = 1;
let b = 2;
let (a, b) = (b, a);
println!("{}, {}", a, b);
}
I am now trying to make a test for it, how do I do it? All my other attempts have failed.

I would suggest modifying the function to return something instead of printing it, and then using either the assert_eq! or assert! macros to test for proper function. (docs for assert_eq!, docs for assert!)
fn swapnumbers() -> (i32, i32) {
let a = 1;
let b = 2;
let (a, b) = (b, a);
return (a, b);
}
assert_eq!(swapnumbers(), (2, 1));
(-> (i32, i32) means that this function returns a tuple of two i32s)
And if you're unfamiliar with testing in Rust, the official Rust book tutorial can help you out with that!
If you want to actually swap numbers, you would need to do something like this:
fn swapnumbers(a: &mut i32, b: &mut i32) {
std::mem::swap(a, b);
}
Note the types specified after the parameter names. &mut i32 means the passed value must be a mutable reference of an i32 The parameter must be mutable for you to be able to assign to it and change its value, and it must be a reference so that the function does not actually take ownership of the data.

Swap two elements in a vector in rust

I want to swap two elements in a vector.
I wrote this function to swap elements, but it gives it gives an error.
fn swap<T>(arr: &mut Vec<T>, i: usize, j: usize) {
let temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
error[E0507]: cannot move out of index of `Vec<T>`
--> src/quick_sort.rs:2:16
|
2 | let temp = arr[i];
| ^^^^^^
| |
| move occurs because value has type `T`, which does not implement the `Copy` trait
| help: consider borrowing here: `&arr[i]`

You are in luck, the implementers thought that people may want an easy way of swapping elements and added Vec::swap. This method is also implemented with slices. If you want to swap the values for two mutable references you can use std::mem::swap.
fn swap<T>(arr: &mut Vec<T>, i: usize, j: usize) {
arr.swap(i, j);
}
Alternatively while it is a bit of a pain to do, you can split a slice or array into two or more non-overlapping mutable slices of the original. This allows you to take multiple multiple references into an slice at once.
pub fn swap(arr: &mut [Foo], i: usize, j: usize) {
let (low, high) = match i.cmp(&j) {
Ordering::Less => (i, j),
Ordering::Greater => (j, i),
Ordering::Equal => return,
};
let (a, b) = arr.split_at_mut(high);
std::mem::swap(&mut a[low], &mut b[0]);
}

Because you haven't added any constraints to T, your generic swap<T>() function needs to be able to work for any type T. Importantly, it needs to be able to work for types even if they don't implement the Copy trait, therefore the assignment operator (=) performs a move. You can't move the value out of the vector like this, or this would invalidate the vector. Of course, you plan to fix up the vector so that it is valid again, but the compiler doesn't see the big picture here, it only sees the initial move as invalidating the vector, and therefore is illegal.
To implement swap here, you would need to use unsafe code. However, swap is a common problem, so the Rust standard library exposes functions to do this so you don't have to (std::mem::swap() or Vec::swap() as #Locke mentioned).
Alternatively, you could specify that your swap function only works for types which implement the Copy trait, like so:
fn swap<T: Copy>(arr: &mut Vec<T>, i: usize, j: usize) {
let temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
However, there is no advantage to writing your own swap over std::mem::swap().

How can the state be shared between the returned result and the next iteration when using the scan iterator?

I would like to use the Scan iterator to construct a vector in a declarative fashion. It is clear how to achieve this by copying the intermediate state. The following expression compiles and produced the desired series:
let vec = (0..10).scan(0, |state, current| {
*state = *state + current;
Some(*state)
}).collect::<Vec<_>>();
However, if I try to achieve the same behavior by moving the state instead of copying it, I get in trouble with lifetimes. For example, when working with vectors instead of integers, one cannot move the state out of the closure and reuse it in the next iteration. The expression
let vec = (0..10).map(|x| vec![x]).scan(vec![0], |state, current| {
*state = vec![state[0] + current[0]];
Some(*state)
}).collect::<Vec<_>>();
fails to compile due to
error: cannot move out of borrowed content [E0507]
Some(*state)
^~~~~~
see for example this MVCE.
Borrowing the state instead of moving would also be an option:
let start = &vec![0];
let vec = (0..10).map(|x| vec![x]).scan(start, |state, current| {
*state = &vec![state[0] + current[0]];
Some(*state)
}).collect::<Vec<_>>();
but this fails because the new value falls out of scope when the state is returned.
error: borrowed value does not live long enough
*state = &vec![state[0] + current[0]]
What I ended up doing is using the for loop
let xs = &mut Vec::<Vec<i32>>::with_capacity(10);
xs.push[vec!(0)];
for x in 1..10 {
let z = vec![xs.last().unwrap()[0] + x];
xs.push(z);
};
but I wold prefer a chaining solution.

Let's check the definition of scan:
fn scan<St, B, F>(self, initial_state: St, f: F) -> Scan<Self, St, F>
where F: FnMut(&mut St, Self::Item) -> Option<B>
Note that B is distinct from St. The idea of scan is that:
you keep an accumulator of type St
at each iteration, you produce a value of type B
and indeed it is not quite suited to returning values of type St because you are only borrowing St and do not control its lifetime.
scan is made for you to return a brand new value each time:
let xs = (0..10).scan(0, |state, current| {
*state += current;
Some(NonCopy::new(*state))
}).collect::<Vec<_>>();
and that's it!
A note on efficiency.
The state of scan is a sunk cost so it is best to use a cheap state (a single integer here).
If you need a larger type X and wish to "get your memory back", then you can pass an &mut Option<X> and then use .take() after the scan:
let mut x = Some(NonCopy::new(0));
let xs = (0..10).scan(&mut x, |state, current| {
let i: &mut i32 = &mut state.as_mut().unwrap().value;
*i += current;
Some(NonCopy::new(*i))
}).collect::<Vec<_>>();
let _ = x.take();
It's not as elegant, of course.

I don't think it is possible to do it without cloning value using scan method.
When you return a non-Copy value from the closure, you lose ownership of that value. And it's not possible to keep any reference to it, because it's new owner could move the value in memory anywhere it wants (for example, during vector resizing), and Rust is intended to protect against this kind of errors.

By chaining solution, do you mean this?
let vec = (0..10)
.fold((Vec::with_capacity(10), 0), |(mut vec, previous), x| {
vec.push(vec![previous + x]);
(vec, previous + x)
})
.0;

How to idiomatically copy a slice?

In Go, copying slices is standard-fare and looks like this:
# It will figure out the details to match slice sizes
dst = copy(dst[n:], src[:m])
In Rust, I couldn't find a similar method as replacement. Something I came up with looks like this:
fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
let mut c = 0;
for (&mut d, &s) in dst.iter_mut().zip(src.iter()) {
d = s;
c += 1;
}
c
}
Unfortunately, I get this compile-error that I am unable to solve:
error[E0384]: re-assignment of immutable variable `d`
--> src/main.rs:4:9
|
3 | for (&mut d, &s) in dst.iter_mut().zip(src.iter()) {
| - first assignment to `d`
4 | d = s;
| ^^^^^ re-assignment of immutable variable
How can I set d? Is there a better way to copy a slice?

Yes, use the method clone_from_slice(), it is generic over any element type that implements Clone.
fn main() {
let mut x = vec![0; 8];
let y = [1, 2, 3];
x[..3].clone_from_slice(&y);
println!("{:?}", x);
// Output:
// [1, 2, 3, 0, 0, 0, 0, 0]
}
The destination x is either a &mut [T] slice, or anything that derefs to that, like a mutable Vec<T> vector. You need to slice the destination and source so that their lengths match.
As of Rust 1.9, you can also use copy_from_slice(). This works the same way but uses the Copy trait instead of Clone, and is a direct wrapper of memcpy. The compiler can optimize clone_from_slice to be equivalent to copy_from_slice when applicable, but it can still be useful.

This code works, even though I am not sure if it the best way to do it.
fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
let mut c = 0;
for (d, s) in dst.iter_mut().zip(src.iter()) {
*d = *s;
c += 1;
}
c
}
Apparently not specifying access permissions explicitly did the trick. However, I am still confused about this and my mental model doesn't yet cover what's truly going on there.
My solutions are mostly trial and error when it comes to these things, and I'd rather like to truly understand instead.

Another variant would be
fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
dst.iter_mut().zip(src).map(|(x, y)| *x = *y).count()
}
Note that you have to use count in this case, since len would use the ExactSizeIterator shortcut and thus never call next, resulting in a no-op.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Struggling with closures and lifetimes in Rust - rust

Related

Rust string comparison same speed as Python . Want to parallelize the program

How do I test this swap number function?

Swap two elements in a vector in rust

How can the state be shared between the returned result and the next iteration when using the scan iterator?

How to idiomatically copy a slice?

Categories

Resources