In Rust, is there any manner to handle operator functions such as add, or sub? I need to get the reference for those functions, but I can only find about traits. I'll leave here a comparative of what I need (like the wrapper methods) in Python.
A = 1
B = 2
A.__add__(B)
#Or maybe do something more, like
C = int(1).__add__
C(2)
You can obtain a function pointer to a trait method of a specific type via the universal function call syntax:
let fptr = <i32 as std::ops::Add>::add; // type: `fn(i32, i32) -> i32`
fptr(1, 3); // returns 4
Bigger example (Playground):
use std::ops;
fn calc(a: i32, b: i32, op: fn(i32, i32) -> i32) -> i32 {
op(a, b)
}
fn main() {
println!("{}", calc(2, 5, <i32 as ops::Add>::add)); // prints 7
println!("{}", calc(2, 5, <i32 as ops::Sub>::sub)); // prints -3
println!("{}", calc(2, 5, <i32 as ops::Mul>::mul)); // prints 10
}
Your int(1).__add__ example is a bit more complicated because we have a partially applied function here. Rust does not have this built into the language, but you can easily use closures to achieve the same effect:
let op = |b| 1 + b;
op(4); // returns 5
Today's Rust mystery is from section 4.9 of The Rust Programming Language, First Edition. The example of references and borrowing has this example:
fn main() {
fn sum_vec(v: &Vec<i32>) -> i32 {
return v.iter().fold(0, |a, &b| a + b);
}
fn foo(v1: &Vec<i32>) -> i32 {
sum_vec(v1);
}
let v1 = vec![1, 2, 3];
let answer = foo(&v1);
println!("{}", answer);
}
That seems reasonable. It prints "6", which is what you'd expect if the
v of sum_vec is a C++ reference; it's just a name for a memory
location, the vector v1 we defined in main().
Then I replaced the body of sum_vec with this:
fn sum_vec(v: &Vec<i32>) -> i32 {
return (*v).iter().fold(0, |a, &b| a + b);
}
It compiled and worked as expected. Okay, that's not… entirely crazy. The compiler is trying to make my life easier, I get that. Confusing, something that I have to memorize as a specific tic of the language, but not entirely crazy. Then I tried:
fn sum_vec(v: &Vec<i32>) -> i32 {
return (**v).iter().fold(0, |a, &b| a + b);
}
It still worked! What the hell?
fn sum_vec(v: &Vec<i32>) -> i32 {
return (***v).iter().fold(0, |a, &b| a + b);
}
type [i32] cannot be dereferenced. Oh, thank god, something that makes sense. But I would have expected that almost two iterations earlier!
References in Rust aren't C++ "names for another place in memory," but what are they? They're not pointers either, and the rules about them seem to be either esoteric or highly ad-hoc. What is happening such that a reference, a pointer, and a pointer-to-a-pointer all work equally well here?
The rules are not ad-hoc nor really esoteric. Inspect the type of v and it's various dereferences:
fn sum_vec(v: &Vec<i32>) {
let () = v;
}
You'll get:
v -> &std::vec::Vec<i32>
*v -> std::vec::Vec<i32>
**v -> [i32]
The first dereference you already understand. The second dereference is thanks to the Deref trait. Vec<T> dereferences to [T].
When performing method lookup, there's a straight-forward set of rules:
If the type has the method, use it and exit the lookup.
If a reference to the type has the method, use it and exit the lookup.
If the type can be dereferenced, do so, then return to step 1.
Else the lookup fails.
References in Rust aren't C++ "names for another place in memory,"
They absolutely are names for a place in memory. In fact, they compile down to the same C / C++ pointer you know.
I'm trying to port a little benchmark from F# to Rust. The F# code looks like this:
let inline iterNeighbors f (i, j) =
f (i-1, j)
f (i+1, j)
f (i, j-1)
f (i, j+1)
let rec nthLoop n (s1: HashSet<_>) (s2: HashSet<_>) =
match n with
| 0 -> s1
| n ->
let s0 = HashSet(HashIdentity.Structural)
let add p =
if not(s1.Contains p || s2.Contains p) then
ignore(s0.Add p)
Seq.iter (fun p -> iterNeighbors add p) s1
nthLoop (n-1) s0 s1
let nth n p =
nthLoop n (HashSet([p], HashIdentity.Structural)) (HashSet(HashIdentity.Structural))
(nth 2000 (0, 0)).Count
It computes the nth-nearest neighbor shells from an initial vertex in a potentially infinite graph. I used something similar during my PhD to study amorphous materials.
I've spent many hours trying and failing to port this to Rust. I have managed to get one version working but only by manually inlining the closure and converting the recursion into a loop with local mutables (yuk!).
I tried writing the iterNeighbors function like this:
use std::collections::HashSet;
fn iterNeighbors<F>(f: &F, (i, j): (i32, i32)) -> ()
where
F: Fn((i32, i32)) -> (),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. I seem to have to double bracket things: is that correct?
I tried writing a recursive version like this:
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return &s1;
} else {
let mut s0 = HashSet::new();
for &p in s1 {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
}
return &nthLoop(n - 1, s0, s1);
}
}
Note that I haven't even bothered with the call to iterNeighbors yet.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
The caller would look something like this:
fn nth<'a>(n: i32, p: (i32, i32)) -> &'a HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
return &nthLoop(n, &s1, s0);
}
I gave up on that and wrote it as a while loop with mutable locals instead:
fn nth<'a>(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let mut n = n;
let mut s0 = HashSet::new();
let mut s1 = HashSet::new();
let mut s2 = HashSet::new();
s1.insert(p);
while n > 0 {
for &p in &s1 {
let add = &|p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
};
iterNeighbors(&add, p);
}
std::mem::swap(&mut s0, &mut s1);
std::mem::swap(&mut s0, &mut s2);
s0.clear();
n -= 1;
}
return s1;
}
This works if I inline the closure by hand, but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
The main function is then:
fn main() {
let s = nth(2000, (0, 0));
println!("{}", s.len());
}
So... what am I doing wrong? :-)
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. I seem to have to double bracket things: is that correct?
You need the double brackets because you're passing a 2-tuple to the closure, which matches your original F# code.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
The problem is that you're using references to HashSets when you should just use HashSets directly. Your signature for nthLoop is already correct; you just need to remove a few occurrences of &.
To deallocate s2, you can write drop(s2). Note that Rust doesn't have guaranteed tail calls, so each recursive call will still take a bit of stack space (you can see how much with the mem::size_of function), but the drop call will purge the data on the heap.
The caller would look something like this:
Again, you just need to remove the &'s here.
Note that I haven't even bothered with the call to iterNeighbors yet.
This works if I inline the closure by hand but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
There are three types of closures in Rust: Fn, FnMut and FnOnce. They differ by the type of their self argument. The distinction is important because it puts restrictions on what the closure is allowed to do and on how the caller can use the closure. The Rust book has a chapter on closures that already explains this well.
Your closure needs to mutate s0. However, iterNeighbors is defined as expecting an Fn closure. Your closure cannot implement Fn because Fn receives &self, but to mutate s0, you need &mut self. iterNeighbors cannot use FnOnce, since it needs to call the closure more than once. Therefore, you need to use FnMut.
Also, it's not necessary to pass the closure by reference to iterNeighbors. You can just pass it by value; each call to the closure will only borrow the closure, not consume it.
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
There's no purely functional set implementation in the standard library (maybe there's one on crates.io?). While Rust embraces functional programming, it also takes advantage of its ownership and borrowing system to make imperative programming safer. A functional set would probably impose using some form of reference counting or garbage collection in order to share items across sets.
However, HashSet does implement set-theoretic operations. There are two ways to use them: iterators (difference, symmetric_difference, intersection, union), which generate the sequence lazily, or operators (|, &, ^, -, as listed in the trait implementations for HashSet), which produce new sets containing clones of the values from the source sets.
Here's the working code:
use std::collections::HashSet;
fn iterNeighbors<F>(mut f: F, (i, j): (i32, i32)) -> ()
where
F: FnMut((i32, i32)) -> (),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return s1;
} else {
let mut s0 = HashSet::new();
for &p in &s1 {
let add = |p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
};
iterNeighbors(add, p);
}
drop(s2);
return nthLoop(n - 1, s0, s1);
}
}
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let mut s1 = HashSet::new();
s1.insert(p);
let s2 = HashSet::new();
return nthLoop(n, s1, s2);
}
fn main() {
let s = nth(2000, (0, 0));
println!("{}", s.len());
}
I seem to have to double bracket things: is that correct?
No: the double bracketes are because you've chosen to use tuples and calling a function that takes a tuple requires creating the tuple first, but one can have closures that take multiple arguments, like F: Fn(i32, i32). That is, one could write that function as:
fn iterNeighbors<F>(i: i32, j: i32, f: F)
where
F: Fn(i32, i32),
{
f(i - 1, j);
f(i + 1, j);
f(i, j - 1);
f(i, j + 1);
}
However, it seems that retaining the tuples makes sense for this case.
I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call?
No need for references (and hence no need for lifetimes), just pass the data through directly:
fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
if n == 0 {
return s1;
} else {
let mut s0 = HashSet::new();
for &p in &s1 {
iterNeighbors(p, |p| {
if !(s1.contains(&p) || s2.contains(&p)) {
s0.insert(p);
}
})
}
drop(s2); // guarantees timely deallocation
return nthLoop(n - 1, s0, s1);
}
}
The key here is you can do everything by value, and things passed around by value will of course keep their values around.
However, this fails to compile:
error[E0387]: cannot borrow data mutably in a captured outer variable in an `Fn` closure
--> src/main.rs:21:21
|
21 | s0.insert(p);
| ^^
|
help: consider changing this closure to take self by mutable reference
--> src/main.rs:19:30
|
19 | iterNeighbors(p, |p| {
| ______________________________^
20 | | if !(s1.contains(&p) || s2.contains(&p)) {
21 | | s0.insert(p);
22 | | }
23 | | })
| |_____________^
That is to say, the closure is trying to mutate values it captures (s0), but the Fn closure trait doesn't allow this. That trait can be called in a more flexible manner (when shared), but this imposes more restrictions on what the closure can do internally. (If you're interested, I've written more about this)
Fortunately there's an easy fix: using the FnMut trait, which requires that the closure can only be called when one has unique access to it, but allows the internals to mutate things.
fn iterNeighbors<F>((i, j): (i32, i32), mut f: F)
where
F: FnMut((i32, i32)),
{
f((i - 1, j));
f((i + 1, j));
f((i, j - 1));
f((i, j + 1));
}
The caller would look something like this:
Values work here too: returning a reference in that case would be returning a pointer to s0, which lives the stack frame that is being destroyed as the function returns. That is, the reference is pointing to dead data.
The fix is just not using references:
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
return nthLoop(n, s1, s0);
}
This works if I inline the closure by hand but I cannot figure out how to invoke the closure. Ideally, I'd like static dispatch here.
(I don't understand what this means, including the compiler error messages you're having trouble with helps us help you.)
Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). Am I correct in assuming that?
Depending on exactly what you want, no, e.g. both HashSet and BTreeSet provide various set-theoretic operations as methods which return iterators.
Some small points:
explicit/named lifetimes allow the compiler to reason about the static validity of data, they don't control it (i.e. they allow the compiler to point out when you do something wrong, but language still has the same sort of static resource usage/life-cycle guarantees as C++)
the version with a loop is likely to be more efficient as written, as it reuses memory directly (swapping the sets, plus the s0.clear(), however, the same benefit can be realised with a recursive version by passing s2 down for reuse instead of dropping it.
the while loop could be for _ in 0..n
there's no need to pass closures by reference, but with or without the reference, there's still static dispatch (the closure is a type parameter, not a trait object).
conventionally, closure arguments are last, and not taken by reference, because it makes defining & passing them inline easier to read (e.g. foo(x, |y| bar(y + 1)) instead of foo(&|y| bar(y + 1), x))
the return keyword isn't necessary for trailing returns (if the ; is omitted):
fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
let s0 = HashSet::new();
let mut s1 = HashSet::new();
s1.insert(p);
nthLoop(n, s1, s0)
}
This code fails as expected at let c = a; with compile error "use of moved value: a":
fn main() {
let a: &mut i32 = &mut 0;
let b = a;
let c = a;
}
a is moved into b and is no longer available for an assignment to c. So far, so good.
However, if I just annotate b's type and leave everything else alone:
fn main() {
let a: &mut i32 = &mut 0;
let b: &mut i32 = a;
let c = a;
}
the code fails again at let c = a;
But this time with a very different error message: "cannot move out of a because it is borrowed ... borrow of *a occurs here: let b: &mut i32 = a;"
So, if I just annotate b's type: no move of a into b, but instead a "re"-borrow of *a?
What am I missing?
Cheers.
So, if I just annotate b's type: no move of a into b, but instead a "re"-borrow of *a?
What am I missing?
Absolutely nothing, as in this case these two operations are semantically very similar (and equivalent if a and b belong to the same scope).
Either you move the reference a into b, making a a moved value, and no longer available.
Either you reborrow *a in b, making a unusable as long as b is in scope.
The second case is less definitive than the first, you can show this by putting the line defining b into a sub-scope.
This example won't compile because a is moved:
fn main() {
let a: &mut i32 = &mut 0;
{ let b = a; }
let c = a;
}
But this one will, because once b goes out of scope a is unlocked:
fn main() {
let a: &mut i32 = &mut 0;
{ let b = &mut *a; }
let c = a;
}
Now, to the question "Why does annotating the type of b change the behavior ?", my guess would be:
When there is no type annotation, the operation is a simple and straightforward move. Nothing is needed to be checked.
When there is a type annotation, a conversion may be needed (casting a &mut _ into a &_, or transforming a simple reference into a reference to a trait object). So the compiler opts for a re-borrow of the value, rather than a move.
For example, this code is perflectly valid:
fn main() {
let a: &mut i32 = &mut 0;
let b: &i32 = a;
}
and here moving a into b would not make any sense, as they are of different type. Still this code compiles: b simply re-borrows *a, and the value won't be mutably available through a as long as b is in scope.
To complement #Levans's answer on the specific question "Why does annotating the type change the behaviour?":
When you don't write the type, the compiler performs a simple move. When you do put the type, the let statement becomes a coercion site as documented in "Coercion sites":
Possible coercion sites are:
let statements where an explicit type is given.
In the present case the compiler performs a reborrow coercion, which is a special case of coercion going from &mut to &mut, as explained in this issue comment on GitHub.
Note that reborrowing in general and reborrow coercion in particular are currently poorly documented. There is an open issue on the Rust Reference to improve that point.
Running example on play.rust-lang.org
fn main() {
show({
let number = b"123456";
for sequence in number.windows(6) {
let product = sequence.iter().fold(1, |a, &b| a * (b as u64));
println!("product of {:?} is {}", sequence, product);
}
});
}
Instead of having an output like "product of [49, 50, 51, 52, 53, 54] is 15312500000" I need the normal numbers in the brackets and the normalized result for the product.
Trying around with - b'0' to subtract the 48 to get the normal digits in line 5 doesn't work, i.e.
a * ((b as u64) -b'0')
or
(a - b'0') * (b as u64)
Seems I'm missing something here, for example I have no idea what exactly are the 'a' and 'b' values in the fold(). Can anyone enlighten me? :)
Looking at the signature of fold, we can see that it takes two arguments:
fn fold<B, F>(self, init: B, f: F) -> B
where F: FnMut(B, Self::Item) -> B
init, which is of some arbitrary type B, and f, which is a closure that takes a B value and an element from the iterator, in order to compute a new B value. The whole function returns a B. The types are strongly suggestive of what happens: the closure f is repeatedly called on successive elements of the iterator, passing the computed B value into the next f call. Checking the implementation confirms this suspicion:
let mut accum = init;
for x in self {
accum = f(accum, x);
}
accum
It runs through the iterator, passing the accumulated state into the closure in order to compute the next state.
First things first, lets put the type on the fold call:
let product = sequence.iter().fold(1, |a: u64, &b: &u8| a * (b as u64));
That is, the B type we want is u64 (that's what our final product will be), and the item type of the iterator is &u8, a reference to a byte.
Now, we can manually inline the definition of fold to compute product to try to clarify the desired behaviour (I'm ignoring the normalisation for now):
let mut accum = 1;
for x in sequence.iter() {
accum = { // the closure
let a: u64 = accum;
let &b: &u8 = x;
a * b as u64
}
}
let product = accum;
Simplifying:
let mut product = 1;
for &b in sequence.iter() {
product = product * (b as u64)
}
Hopefully this makes it clearer what needs to happen: b runs across each byte, and so it is the value that needs adjustment, to bring the ASCII encoded value down to the expected 0..10 range.
So, you were right with:
a * ((b as u64) -b'0')
However, the details mean that fails to compile, with a type error: b'0' has type u8, but b as u64 as type u64, and it's not legal to use - with u64 and u8. Moving the normalisation to happen before the u64 cast will ensure this works ok, since then you're subtracting b (which is a u8) and a u8:
product * (b - b'0') as u64
All in all, the fold might look clearer (and actually work) as:
let product = sequence.iter()
.fold(1, |prod, &byte| prod * (byte - b'0') as u64);
(I apologise for giving you such confusing code on IRC.)
As an alternative to fold, you can use map and MultiplicativeIterator::product. I find that the two steps help make it clearer what is happening.
#![feature(core)]
use std::iter::MultiplicativeIterator;
fn main() {
let number = b"123456";
for sequence in number.windows(6) {
let product = sequence.iter().map(|v| (v - b'0') as u64).product();
println!("product of {:?} is {}", sequence, product);
}
}
You could even choose to split up the resizing from u8 to u64:
sequence.iter().map(|v| v - b'0').map(|v| v as u64).product();
Nowadays, an alternative is product + to_digit: (itertools was used to print the contents of the iterator)
use {itertools::Itertools, std::char};
fn main() {
let number = b"123456";
let sequence = number
.iter()
.map(|&c| u64::from(char::from(c).to_digit(10).expect("not a digit")));
let product: u64 = sequence.clone().product();
println!("product of {:?} is {}", sequence.format(", "), product);
}
(playground)