How to idiomatically copy a slice? - rust

In Go, copying slices is standard-fare and looks like this:
# It will figure out the details to match slice sizes
dst = copy(dst[n:], src[:m])
In Rust, I couldn't find a similar method as replacement. Something I came up with looks like this:
fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
let mut c = 0;
for (&mut d, &s) in dst.iter_mut().zip(src.iter()) {
d = s;
c += 1;
}
c
}
Unfortunately, I get this compile-error that I am unable to solve:
error[E0384]: re-assignment of immutable variable `d`
--> src/main.rs:4:9
|
3 | for (&mut d, &s) in dst.iter_mut().zip(src.iter()) {
| - first assignment to `d`
4 | d = s;
| ^^^^^ re-assignment of immutable variable
How can I set d? Is there a better way to copy a slice?

Yes, use the method clone_from_slice(), it is generic over any element type that implements Clone.
fn main() {
let mut x = vec![0; 8];
let y = [1, 2, 3];
x[..3].clone_from_slice(&y);
println!("{:?}", x);
// Output:
// [1, 2, 3, 0, 0, 0, 0, 0]
}
The destination x is either a &mut [T] slice, or anything that derefs to that, like a mutable Vec<T> vector. You need to slice the destination and source so that their lengths match.
As of Rust 1.9, you can also use copy_from_slice(). This works the same way but uses the Copy trait instead of Clone, and is a direct wrapper of memcpy. The compiler can optimize clone_from_slice to be equivalent to copy_from_slice when applicable, but it can still be useful.

This code works, even though I am not sure if it the best way to do it.
fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
let mut c = 0;
for (d, s) in dst.iter_mut().zip(src.iter()) {
*d = *s;
c += 1;
}
c
}
Apparently not specifying access permissions explicitly did the trick. However, I am still confused about this and my mental model doesn't yet cover what's truly going on there.
My solutions are mostly trial and error when it comes to these things, and I'd rather like to truly understand instead.

Another variant would be
fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
dst.iter_mut().zip(src).map(|(x, y)| *x = *y).count()
}
Note that you have to use count in this case, since len would use the ExactSizeIterator shortcut and thus never call next, resulting in a no-op.

Related

Rust string comparison same speed as Python . Want to parallelize the program

I am new to rust. I want to write a function which later can be imported into Python as a module using the pyo3 crate.
Below is the Python implementation of the function I want to implement in Rust:
def pcompare(a, b):
letters = []
for i, letter in enumerate(a):
if letter != b[i]:
letters.append(f'{letter}{i + 1}{b[i]}')
return letters
The first Rust implemention I wrote looks like this:
use pyo3::prelude::*;
#[pyfunction]
fn compare_strings_to_vec(a: &str, b: &str) -> PyResult<Vec<String>> {
if a.len() != b.len() {
panic!(
"Reads are not the same length!
First string is length {} and second string is length {}.",
a.len(), b.len());
}
let a_vec: Vec<char> = a.chars().collect();
let b_vec: Vec<char> = b.chars().collect();
let mut mismatched_chars = Vec::new();
for (mut index,(i,j)) in a_vec.iter().zip(b_vec.iter()).enumerate() {
if i != j {
index += 1;
let mutation = format!("{i}{index}{j}");
mismatched_chars.push(mutation);
}
}
Ok(mismatched_chars)
}
#[pymodule]
fn compare_strings(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(compare_strings_to_vec, m)?)?;
Ok(())
}
Which I builded in --release mode. The module could be imported to Python, but the performance was quite similar to the performance of the Python implementation.
My first question is: Why is the Python and Rust function similar in speed?
Now I am working on a parallelization implementation in Rust. When just printing the result variable, the function works:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
println!("{mutation}");
//mismatched_chars.push(mutation);
}
});
}
However, when I try to push the mutation variable to the mismatched_charsvector:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
//println!("{mutation}");
mismatched_chars.push(mutation);
}
});
}
I get the following error:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
For more information about this error, try `rustc --explain E0596`.
error: could not compile `testing_compare_strings` due to previous error
I tried A LOT of different things. When I do:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<&str> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
mismatched_chars.push(&mutation);
}
});
}
The error becomes:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(&mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0597]: `mutation` does not live long enough
--> src/main.rs:16:35
|
10 | let mut mismatched_chars: Vec<&str> = Vec::new();
| -------------------- lifetime `'1` appears in the type of `mismatched_chars`
...
16 | mismatched_chars.push(&mutation);
| ----------------------^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `mutation` is borrowed for `'1`
17 | }
| - `mutation` dropped here while still borrowed
I suspect that the solution is quite simple, but I cannot see it myself.
You have the right idea with what you are doing, but you will want to try to use an iterator chain with filter and map to remove or convert iterator items into different values. Rayon also provides a collect method similar to regular iterators to convert items into a type T: FromIterator (such as Vec<T>).
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
// Same as with the if statement, but just a little shorter to write
// Plus, it will print out the two values it is comparing if it errors.
assert_eq!(a.len(), b.len(), "Reads are not the same length!");
// Zip the character iterators from a and b together
a.chars().zip(b.chars())
// Iterate with the index of each item
.enumerate()
// Rayon function which turns a regular iterator into a parallel one
.par_bridge()
// Filter out values where the characters are the same
.filter(|(_, (a, b))| a != b)
// Convert the remaining values into an error string
.map(|(index, (a, b))| {
format!("{}{}{}", a, index + 1, b)
})
// Turn the items of this iterator into a Vec (Or any other FromIterator type).
.collect()
}
Rust Playground
Optimizing for speed
On the other hand, if you want speed we need to approach this problem from a different direction. You may have noticed, but the rayon version is quite slow since the cost of spawning a thread and using concurrency structures is orders of magnitude more than just simply comparing the bytes in the original thread. In my benchmarks, I found that even with better workload distribution, additional threads were only helpful on my machine (64GB RAM, 16 cores) when the strings were at least 1-2 million bytes long. Given that you have stated they are typically ~30,000 bytes long I think using rayon (or really any other threading for comparisons of this size) will only slow down your code.
Using criterion for benchmarking, I eventually came to this implementation. It generally gets about 2.8156 µs per run on strings of 30,000 characters with 10 different bytes. For comparison, the code posted in the original question usually gets around 61.156 µs on my system under the same conditions so this should give a ~20x speedup. It can vary a bit, but it consistently got the best results in the benchmark. I'm guessing this should be fast enough to have this step no-longer be the bottleneck in your code.
This key focus of this implementation is to do the comparisons in batches. We can take advantage of the 128bit registers on most CPUs to compare the input in 16 byte batches. Upon an inequality being found, the 16 byte section it covers is re-scanned for the exact position of the discrepancy. This gives a decent boost to performance. I initially thought that a usize would work better, but it seems that was not the case. I also attempted to use the portable_simd nightly feature to write a simd version of this code, but I was unable to match the speed of this code. I suspect this was either due to missed optimizations or a lack of experience to effectively use simd on my part.
I was worried about drops in speed due to alignment of chunks not being enforced for u128 values, but it seems to mostly be a non-issue. First of all, it is generally quite difficult to find allocators which are willing to allocate to an address which is not a multiple of the system word size. Of course, this is due to practicality rather than any actual requirement. When I manually gave it unaligned slices (unaligned for u128s), it is not significantly effected. This is why I do not attempt to enforce that the start index of the slice be aligned to align_of::<u128>().
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
let a_bytes = a.as_bytes();
let b_bytes = b.as_bytes();
let remainder = a_bytes.len() % size_of::<u128>();
// Strongly suggest to the compiler we are iterating though u128
a_bytes
.chunks_exact(size_of::<u128>())
.zip(b_bytes.chunks_exact(size_of::<u128>()))
.enumerate()
.filter(|(_, (a, b))| {
let a_block: &[u8; 16] = (*a).try_into().unwrap();
let b_block: &[u8; 16] = (*b).try_into().unwrap();
u128::from_ne_bytes(*a_block) != u128::from_ne_bytes(*b_block)
})
.flat_map(|(word_index, (a, b))| {
fast_path(a, b).map(move |x| word_index * size_of::<u128>() + x)
})
.chain(
fast_path(
&a_bytes[a_bytes.len() - remainder..],
&b_bytes[b_bytes.len() - remainder..],
)
.map(|x| a_bytes.len() - remainder + x),
)
.map(|index| {
format!(
"{}{}{}",
char::from(a_bytes[index]),
index + 1,
char::from(b_bytes[index])
)
})
.collect()
}
/// Very similar to regular route, but with nothing fancy, just get the indices of the overlays
#[inline(always)]
fn fast_path<'a>(a: &'a [u8], b: &'a [u8]) -> impl 'a + Iterator<Item = usize> {
a.iter()
.zip(b.iter())
.enumerate()
.filter_map(|(x, (a, b))| (a != b).then_some(x))
}
You cannot directly access the field mismatched_chars in a multithreading environment.
You can use Arc<RwLock> to access the field in multithreading.
use rayon::prelude::*;
use std::sync::{Arc, RwLock};
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mismatched_chars: Arc<RwLock<Vec<String>>> = Arc::new(RwLock::new(Vec::new()));
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y);
mismatched_chars
.write()
.expect("could not acquire write lock")
.push(mutation);
}
});
for mismatch in mismatched_chars
.read()
.expect("could not acquire read lock")
.iter()
{
eprintln!("{}", mismatch);
}
}

Swap two elements in a vector in rust

I want to swap two elements in a vector.
I wrote this function to swap elements, but it gives it gives an error.
fn swap<T>(arr: &mut Vec<T>, i: usize, j: usize) {
let temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
error[E0507]: cannot move out of index of `Vec<T>`
--> src/quick_sort.rs:2:16
|
2 | let temp = arr[i];
| ^^^^^^
| |
| move occurs because value has type `T`, which does not implement the `Copy` trait
| help: consider borrowing here: `&arr[i]`
You are in luck, the implementers thought that people may want an easy way of swapping elements and added Vec::swap. This method is also implemented with slices. If you want to swap the values for two mutable references you can use std::mem::swap.
fn swap<T>(arr: &mut Vec<T>, i: usize, j: usize) {
arr.swap(i, j);
}
Alternatively while it is a bit of a pain to do, you can split a slice or array into two or more non-overlapping mutable slices of the original. This allows you to take multiple multiple references into an slice at once.
pub fn swap(arr: &mut [Foo], i: usize, j: usize) {
let (low, high) = match i.cmp(&j) {
Ordering::Less => (i, j),
Ordering::Greater => (j, i),
Ordering::Equal => return,
};
let (a, b) = arr.split_at_mut(high);
std::mem::swap(&mut a[low], &mut b[0]);
}
Because you haven't added any constraints to T, your generic swap<T>() function needs to be able to work for any type T. Importantly, it needs to be able to work for types even if they don't implement the Copy trait, therefore the assignment operator (=) performs a move. You can't move the value out of the vector like this, or this would invalidate the vector. Of course, you plan to fix up the vector so that it is valid again, but the compiler doesn't see the big picture here, it only sees the initial move as invalidating the vector, and therefore is illegal.
To implement swap here, you would need to use unsafe code. However, swap is a common problem, so the Rust standard library exposes functions to do this so you don't have to (std::mem::swap() or Vec::swap() as #Locke mentioned).
Alternatively, you could specify that your swap function only works for types which implement the Copy trait, like so:
fn swap<T: Copy>(arr: &mut Vec<T>, i: usize, j: usize) {
let temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
However, there is no advantage to writing your own swap over std::mem::swap().

What do the ampersand '&' and star '*' symbols mean in Rust?

Despite thoroughly reading the documentation, I'm rather confused about the meaning of the & and * symbol in Rust, and more generally about what is a Rust reference exactly.
In this example, it seems to be similar to a C++ reference (that is, an address that is automatically dereferenced when used):
fn main() {
let c: i32 = 5;
let rc = &c;
let next = rc + 1;
println!("{}", next); // 6
}
However, the following code works exactly the same:
fn main() {
let c: i32 = 5;
let rc = &c;
let next = *rc + 1;
println!("{}", next); // 6
}
Using * to dereference a reference wouldn't be correct in C++. So I'd like to understand why this is correct in Rust.
My understanding so far, is that, inserting * in front of a Rust reference dereferences it, but the * is implicitly inserted anyway so you don't need to add it (while in C++, it's implicitly inserted and if you insert it you get a compilation error).
However, something like this doesn't compile:
fn main() {
let mut c: i32 = 5;
let mut next: i32 = 0;
{
let rc = &mut c;
next = rc + 1;
}
println!("{}", next);
}
error[E0369]: binary operation `+` cannot be applied to type `&mut i32`
--> src/main.rs:6:16
|
6 | next = rc + 1;
| ^^^^^^
|
= note: this is a reference to a type that `+` can be applied to; you need to dereference this variable once for this operation to work
= note: an implementation of `std::ops::Add` might be missing for `&mut i32`
But this works:
fn main() {
let mut c: i32 = 5;
let mut next: i32 = 0;
{
let rc = &mut c;
next = *rc + 1;
}
println!("{}", next); // 6
}
It seems that implicit dereferencing (a la C++) is correct for immutable references, but not for mutable references. Why is this?
Using * to dereference a reference wouldn't be correct in C++. So I'd like to understand why this is correct in Rust.
A reference in C++ is not the same as a reference in Rust. Rust's references are much closer (in usage, not in semantics) to C++'s pointers. With respect to memory representation, Rust's references often are just a single pointer, while C++'s references are supposed to be alternative names of the same object (and thus have no memory representation).
The difference between C++ pointers and Rust references is that Rust's references are never NULL, never uninitialized and never dangling.
The Add trait is implemented (see the bottom of the doc page) for the following pairs and all other numeric primitives:
&i32 + i32
i32 + &i32
&i32 + &i32
This is just a convenience thing the std-lib developers implemented. The compiler can figure out that a &mut i32 can be used wherever a &i32 can be used, but that doesn't work (yet?) for generics, so the std-lib developers would need to also implement the Add traits for the following combinations (and those for all primitives):
&mut i32 + i32
i32 + &mut i32
&mut i32 + &mut i32
&mut i32 + &i32
&i32 + &mut i32
As you can see that can get quite out of hand. I'm sure that will go away in the future. Until then, note that it's rather rare to end up with a &mut i32 and trying to use it in a mathematical expression.
This answer is for those looking for the basics (e.g. coming from Google).
From the Rust book's References and Borrowing:
fn main() {
let s1 = String::from("hello");
let len = calculate_length(&s1);
println!("The length of '{}' is {}.", s1, len);
}
fn calculate_length(s: &String) -> usize {
s.len()
}
These ampersands represent references, and they allow you to refer to some value without taking ownership of it [i.e. borrowing].
The opposite of referencing by using & is dereferencing, which is accomplished with the dereference operator, *.
And a basic example:
let x = 5;
let y = &x; //set y to a reference to x
assert_eq!(5, x);
assert_eq!(5, *y); // dereference y
If we tried to write assert_eq!(5, y); instead, we would get a compilation error can't compare `{integer}` with `&{integer}`.
(You can read more in the Smart Pointers chapter.)
And from Method Syntax:
Rust has a feature called automatic referencing and dereferencing. Calling methods is one of the few places in Rust that has this behavior.
Here’s how it works: when you call a method with object.something(), Rust automatically adds in &, &mut, or * so object matches the signature of the method. In other words, the following are the same:
p1.distance(&p2);
(&p1).distance(&p2);
From the docs for std::ops::Add:
impl<'a, 'b> Add<&'a i32> for &'b i32
impl<'a> Add<&'a i32> for i32
impl<'a> Add<i32> for &'a i32
impl Add<i32> for i32
It seems the binary + operator for numbers is implemented for combinations of shared (but not mutable) references of the operands and owned versions of the operands. It has nothing to do with automatic dereferencing.

Struggling with closures and lifetimes in Rust

I'm trying to port a little benchmark from F# to Rust. The F# code looks like this:
let inline iterNeighbors f (i, j) =
f (i-1, j)
f (i+1, j)
f (i, j-1)
f (i, j+1)
let rec nthLoop n (s1: HashSet<_>) (s2: HashSet<_>) =
match n with
| 0 -> s1
| n ->
let s0 = HashSet(HashIdentity.Structural)
let add p =
if not(s1.Contains p || s2.Contains p) then
ignore(s0.Add p)
Seq.iter (fun p -> iterNeighbors add p) s1
nthLoop (n-1) s0 s1
let nth n p =
nthLoop n (HashSet([p], HashIdentity.Structural)) (HashSet(HashIdentity.Structural))
(nth 2000 (0, 0)).Count
It computes the nth-nearest neighbor shells from an initial vertex in a potentially infinite graph. I used something similar during my PhD to study amorphous materials.
I've spent many hours trying and failing to port this to Rust. I have managed to get one version working but only by manually inlining the closure and converting the recursion into a loop with local mutables (yuk!).
I tried writing the iterNeighbors function like this:
use std::collections::HashSet;
fn iterNeighbors<F>(f: &F, (i, j): (i32, i32)) -> ()
where
F: Fn((i32, i32)) -> (),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. I seem to have to double bracket things: is that correct?
I tried writing a recursive version like this:
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return &s1;
} else {
let mut s0 = HashSet::new();
for &p in s1 {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
}
return &nthLoop(n - 1, s0, s1);
}
}
Note that I haven't even bothered with the call to iterNeighbors yet.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
The caller would look something like this:
fn nth<'a>(n: i32, p: (i32, i32)) -> &'a HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
return &nthLoop(n, &s1, s0);
}
I gave up on that and wrote it as a while loop with mutable locals instead:
fn nth<'a>(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let mut n = n;
let mut s0 = HashSet::new();
let mut s1 = HashSet::new();
let mut s2 = HashSet::new();
s1.insert(p);
while n > 0 {
for &p in &s1 {
let add = &|p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
};
iterNeighbors(&add, p);
}
std::mem::swap(&mut s0, &mut s1);
std::mem::swap(&mut s0, &mut s2);
s0.clear();
n -= 1;
}
return s1;
}
This works if I inline the closure by hand, but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
The main function is then:
fn main() {
let s = nth(2000, (0, 0));
println!("{}", s.len());
}
So... what am I doing wrong? :-)
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. I seem to have to double bracket things: is that correct?
You need the double brackets because you're passing a 2-tuple to the closure, which matches your original F# code.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
The problem is that you're using references to HashSets when you should just use HashSets directly. Your signature for nthLoop is already correct; you just need to remove a few occurrences of &.
To deallocate s2, you can write drop(s2). Note that Rust doesn't have guaranteed tail calls, so each recursive call will still take a bit of stack space (you can see how much with the mem::size_of function), but the drop call will purge the data on the heap.
The caller would look something like this:
Again, you just need to remove the &'s here.
Note that I haven't even bothered with the call to iterNeighbors yet.
This works if I inline the closure by hand but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
There are three types of closures in Rust: Fn, FnMut and FnOnce. They differ by the type of their self argument. The distinction is important because it puts restrictions on what the closure is allowed to do and on how the caller can use the closure. The Rust book has a chapter on closures that already explains this well.
Your closure needs to mutate s0. However, iterNeighbors is defined as expecting an Fn closure. Your closure cannot implement Fn because Fn receives &self, but to mutate s0, you need &mut self. iterNeighbors cannot use FnOnce, since it needs to call the closure more than once. Therefore, you need to use FnMut.
Also, it's not necessary to pass the closure by reference to iterNeighbors. You can just pass it by value; each call to the closure will only borrow the closure, not consume it.
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
There's no purely functional set implementation in the standard library (maybe there's one on crates.io?). While Rust embraces functional programming, it also takes advantage of its ownership and borrowing system to make imperative programming safer. A functional set would probably impose using some form of reference counting or garbage collection in order to share items across sets.
However, HashSet does implement set-theoretic operations. There are two ways to use them: iterators (difference, symmetric_difference, intersection, union), which generate the sequence lazily, or operators (|, &, ^, -, as listed in the trait implementations for HashSet), which produce new sets containing clones of the values from the source sets.
Here's the working code:
use std::collections::HashSet;
fn iterNeighbors<F>(mut f: F, (i, j): (i32, i32)) -> ()
where
F: FnMut((i32, i32)) -> (),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return s1;
} else {
let mut s0 = HashSet::new();
for &p in &s1 {
let add = |p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
};
iterNeighbors(add, p);
}
drop(s2);
return nthLoop(n - 1, s0, s1);
}
}
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let mut s1 = HashSet::new();
s1.insert(p);
let s2 = HashSet::new();
return nthLoop(n, s1, s2);
}
fn main() {
let s = nth(2000, (0, 0));
println!("{}", s.len());
}
I seem to have to double bracket things: is that correct?
No: the double bracketes are because you've chosen to use tuples and calling a function that takes a tuple requires creating the tuple first, but one can have closures that take multiple arguments, like F: Fn(i32, i32). That is, one could write that function as:
fn iterNeighbors<F>(i: i32, j: i32, f: F)
where
F: Fn(i32, i32),
{
f(i - 1, j);
f(i + 1, j);
f(i, j - 1);
f(i, j + 1);
}
However, it seems that retaining the tuples makes sense for this case.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
No need for references (and hence no need for lifetimes), just pass the data through directly:
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return s1;
} else {
let mut s0 = HashSet::new();
for &p in &s1 {
iterNeighbors(p, |p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
})
}
drop(s2); // guarantees timely deallocation
return nthLoop(n - 1, s0, s1);
}
}
The key here is you can do everything by value, and things passed around by value will of course keep their values around.
However, this fails to compile:
error[E0387]: cannot borrow data mutably in a captured outer variable in an `Fn` closure
--> src/main.rs:21:21
|
21 | s0.insert(p);
| ^^
|
help: consider changing this closure to take self by mutable reference
--> src/main.rs:19:30
|
19 | iterNeighbors(p, |p| {
| ______________________________^
20 | | if !(s1.contains(&p) || s2.contains(&p)) {
21 | | s0.insert(p);
22 | | }
23 | | })
| |_____________^
That is to say, the closure is trying to mutate values it captures (s0), but the Fn closure trait doesn't allow this. That trait can be called in a more flexible manner (when shared), but this imposes more restrictions on what the closure can do internally. (If you're interested, I've written more about this)
Fortunately there's an easy fix: using the FnMut trait, which requires that the closure can only be called when one has unique access to it, but allows the internals to mutate things.
fn iterNeighbors<F>((i, j): (i32, i32), mut f: F)
where
F: FnMut((i32, i32)),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
The caller would look something like this:
Values work here too: returning a reference in that case would be returning a pointer to s0, which lives the stack frame that is being destroyed as the function returns. That is, the reference is pointing to dead data.
The fix is just not using references:
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
return nthLoop(n, s1, s0);
}
This works if I inline the closure by hand but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
(I don't understand what this means, including the compiler error messages you're having trouble with helps us help you.)
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
Depending on exactly what you want, no, e.g. both HashSet and BTreeSet provide various set-theoretic operations as methods which return iterators.
Some small points:
explicit/named lifetimes allow the compiler to reason about the static validity of data, they don't control it (i.e. they allow the compiler to point out when you do something wrong, but language still has the same sort of static resource usage/life-cycle guarantees as C++)
the version with a loop is likely to be more efficient as written, as it reuses memory directly (swapping the sets, plus the s0.clear(), however, the same benefit can be realised with a recursive version by passing s2 down for reuse instead of dropping it.
the while loop could be for _ in 0..n
there's no need to pass closures by reference, but with or without the reference, there's still static dispatch (the closure is a type parameter, not a trait object).
conventionally, closure arguments are last, and not taken by reference, because it makes defining & passing them inline easier to read (e.g. foo(x, |y| bar(y + 1)) instead of foo(&|y| bar(y + 1), x))
the return keyword isn't necessary for trailing returns (if the ; is omitted):
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
nthLoop(n, s1, s0)
}

How to use a map over vectors?

Although vectors are best suited for procedural programming, I would like to use a map function on them. The following snippet works:
fn map<A, B>(u: &Vec<A>, f: &Fn(&A) -> B) -> Vec<B> {
let mut res: Vec<B> = Vec::with_capacity(u.len());
for x in u.iter() {
res.push(f(x));
}
res
}
fn f(x: &i32) -> i32 {
*x + 1
}
fn main() {
let u = vec![1, 2, 3];
let v = map(&u, &f);
println!("{} {} {}", v[0], v[1], v[2]);
}
Why isn't there any such function in the standard library (and also in std::collections::LinkedList)? Is there another way to deal with it?
Rust likes to be more general than that; mapping is done over iterators, rather than over solely vectors or slices.
A couple of demonstrations:
let u = vec![1, 2, 3];
let v: Vec<_> = u.iter().map(f).collect();
let u = vec![1, 2, 3];
let v = u.iter().map(|&x| x + 1).collect::<Vec<_>>();
.collect() is probably the most magic part of it, and allows you to collect all the elements of the iterator into a large variety of different types, as shown by the implementors of FromIterator. For example, an iterator of Ts can be collected to Vec<T>, of chars can be collected to a String, of (K, V) pairs to a HashMap<K, V>, and so forth.
This way of working with iterators also means that you often won’t even need to create intermediate vectors where in other languages or with other techniques you would; this is more efficient and typically just as natural.
As pointed out by bluss, you can also use the mutable iterator to mutate the value in place, without changing the type:
let mut nums = nums;
for num in &mut nums { *num += 1 }
println!("{:p} - {:?}", &nums, nums);
The function Vec::map_in_place was deprecated in Rust 1.3 and is no longer present in Rust 1.4.
Chris Morgan's answer is the best solution 99% of the time. However, there is a specialized function called Vec::map_in_place. This has the benefit of not requiring any additional memory allocations, but it requires that the input and output type are the same size (thanks Levans) and is currently unstable:
fn map_in_place<U, F>(self, f: F) -> Vec<U>
where F: FnMut(T) -> U
An example:
#![feature(collections)]
fn main() {
let nums = vec![1,2,3];
println!("{:p} - {:?}", &nums, nums);
let nums = nums.map_in_place(|v| v + 1);
println!("{:p} - {:?}", &nums, nums);
}

Resources