Do Rust closures capture context by reference or by value?

Do Rust closures capture context by reference or by value? - rust

C++
const auto it = vector.find([&activity](const std::string & r) {
return activity == r;
});
Rust
let index = match vector.iter().position(|r| r == activity) {
Some(x) => x,
None => {
vector.push(activity.to_string());
vector.len() - 1
},
};
I'm a Rust novice from C++, I was wondering if the lambda in the C++ snippet which captures the variable activity by reference and the item argument of the vector as a reference parameter is equivalent to the closure of the rust snippet below. In other words, does the closure in the Rust snippet have both arguments as references as well?

In short, closures capture by reference when they can, and move when they must. The preferences for capture is by the following order, first available option is used:
Immutable borrow (&T).
Unique immutable borrow (no syntax for that). This is used when the closure assigns to the captured variable but does not take a mutable reference to it (explicitly or implicitly, using method autoref). See this section on the reference for more.
Mutable borrows (&mut T).
Move or copy (T).
If the closure specifies the move keyword, however, all captured variables are always moved/copied. This is usually used when the closure needs to outlive the captured variable, e.g. when it is returned from the function.
In your example, since r == activity desugars into PartialEq::eq(&r, &activity), activity is captured by immutable reference.

In Rust, lambdas (called closures btw) have other subtleties than capture by value or by reference. The real question to be asked is: does the lambda have the ownership of the variable, or did it just borrow it (which is not quite the same). If you looked only at the compiled result, taking the ownership is like moving by value, and borrowing is like taking by reference.
By default, they only borrow the variable, and if you want them to take the ownership you have to use the move keyword. However, there are several types of borrows (mutable vs. read-only), contrasting with take by reference of C++, of which there is only one kind.
See the appropriate section of the Rust book for a more detailed explanation.

Related

How does having multiple shared references which value can be accessed just works in Rust?

I'm new to Rust and already have read "the book" but I'm trying to understand the inners of references in the language.
As far as I know a reference is a type of pointer that takes you to some value in memory when dereferenced.
let x = 5
let y = &x
In this example y is a pointer to the memory address of x and so the type and value of y is not equal to the type and value of x.
assert_eq!(y, x)
// fails to compile as assert_eq! has no implementation for `&{integer} == {integer}`
Then a reference is not the same as the value it references to.
But if I dereference the y by using the * operator I now do get the value it referenced and so the following code compiles.
assert_eq!(*y, x)
To access the value the reference points to, dereferencing is needed; but dereferencing implies moving the ownership of the referenced value to the new variable.
let x = Point {x:1, y:2};
let y = &x;
let z = *y;
// fails to compile as move occurs because `*y` has type `Point`, which does not implement the `Copy` trait
By implementing the Copy trait for the type (Point in this case) the problem can be solved by letting Rust create a copy of the value and move the ownership of the copied value to the new variable.
The final question is, how can Rust access a value of a type that does not implement the Copy or Clone traits and is behind a reference without having the dereference (*) function take ownership of the value, thus making other shared references to the original value invalid?
E.g. (works just fine)
let x = Point {x:1, y:2};
let y = &x;
let z = &x;
fn print_point(a: &Point){
println!("{a:#?}")
}
println!("Printing y");
print_point(y);
println!("Printing x");
println!("{x:#?}");
println!("Printing z");
print_point(z);
(Playground)

The ownership semantics of Rust are more complex than simply giving ownership or denying access.
First, there is a little background to be explained about pointers in Rust. A pointer is just an integer that indicates an index of the memory. It's usually beginner-friendly to think of it as an arrow pointing to a location in memory. An arrow is not what it points to, but it's easy to get the value pointed by an arrow given the arrow (just look where it points to).
All pointer-like in Rust are either just that arrow, or that arrow a little more information about that value (so that you don't have to read the value to know that information). By pointer-like I mean anything that behaves like a pointer, that is, can be dereferenced (to keep it simple).
For example, the borrow of x (&x) is not exactly a pointer ("actual" pointers in Rust are used less often than borrows and would be either *mut x or *const x), it's a pointer-like. This pointer-like is just an integer: once your program is compilated, you couldn't tell the difference between a borrow and a pointer. However, contrary to "just a number", a borrow holds some additional constraints. For instance, it's never a null-pointer (that is, a pointer that points to the very-first position of the memory; by convention — and for technical reasons concerning the allocator — this area is never allocated, that is, this pointer is always invalid). In fact, more generally, a borrow should never be invalid: it should always be safe to dereference it (NB. safe doesn't mean legal, Rust could prevent you from dereferencing it, as in the example you posted), in the sense that the value that it points to always "makes sense". This is still not the strongest guarantee provided by a borrow, which is that, for as long as someone holds the borrow (that is, for as long as someone could use it), no one is going to modify the value it points to. This is not something you have to be careful about: Rust will prevent you from writing code that could break this.
Similarly, a mutable borrow (&mut x) is also a pointer-like that is "just an integer", but has different constraints. A mutable borrow (also called "exclusive borrow"), besides always being valid, ensures that, as long as one person is holding it, no one else can access the value it points to, either to modify it or just to read it. This is to prevent data races, because the one that hold the exclusive borrow can modify the value, even though they don't own the value.
These are definitively the most common pointer-like used in Rust (they're everywhere), and could be seen as the read-only but multi-access version of a pointer, or the read-and-write but single-access version of a pointer.
Having understood that, it's easier to understand Rust's logic. If I have y = &x, I am allowed to read the value x, but I can't "take" it (that is, take its ownership): I can't even write to it (but I should be able to if I owned it)! Note that even if I have an exclusive borrow, I couldn't take the ownership of x, but I could "swap" it: take its ownership in exchange for the ownership of a variable I owned (or create an owned variable for the occasion). For this reason, if you write let z = *y, you are taking the ownership of x so Rust complains. But note that this is due to the binding, not the the dereferencing of y. To prove it, compare it with the following (String is not Copy):
let a = String::new();
let b = &a;
let c = &a;
assert_eq(*b, *c);
playground
Here, I dereference the borrow to a non-Copy value, but it's clear that I am not violating the borrow contract (to compare equality, I just need to read the value).
Furthermore, in general, if I have a borrow of a struct, I can obtain borrows of the its fields.
Incidentally, note that String is a pointer-like too! It's not "just a number" (meaning it carries more information than just being an arrow). These pointer-like that are more than "just a number" are called fat pointers. String is a fat pointer to an owned str, that is, String is a kind of pointer that also ensure that whomever owns it also owns the pointed value, which is why String is not Copy (otherwise, the pointed value could be owned by multiple parties).

To access the value the reference points to, dereferencing is needed; but dereferencing implies moving the ownership of the referenced value to the new variable.
Dereferencing does not imply moving. The motivating example presented let z = *y simply dereferences y and then moves/copies the value to z. Yet if you do not assign the value to a new variable there is no new variable and therfore no transfer of ownership.
This might help building an intuition:
You make the (correct) case that a reference is a different thing from the object it points to. However implying that you would need to assign it after dereferencing in order to do anything with it (wrong). Most of the time you are fine working directly on the members, which may be copyable. If not maybe the members of the members are. In the end it is likely to all just to boil down to bytes.

Why calling iter on array of floats gives "temporary value is freed" when using f64::from_bits?

I encountered a lifetime error which I am not able to explain why it is emitted by the compiler. I need this (which works fine):
fn iter<'a>() -> impl Iterator<Item = &'a f64> {
[3.14].iter()
}
However, when I try to use a float value which is transmuted from a specific byte representation using from_bits, like this:
fn iter<'a>() -> impl Iterator<Item = &'a f64> {
[f64::from_bits(0x7fffffffffffffff)].iter()
}
it gives me "creates a temporary which is freed while still in use". Playground here (stable 1.45.2).
My reasoning was that since f64 is Copy type (and it indeed works as expected if I use a constant value), this should work because no freeing should be performed on this value.
So the question is why the compiler emits the error in the second case?
Thanks for any pointers and explanations!
P.S. I need the iterator over references because it fits nicely with my other API.

Your code has two related problems:
You return a reference to a temporary object that go out of scope at the end of the function body.
Your return type contains a lifetime parameter that isn't bound to any function input.
References can only live as long as the data they point to. The result of the method call f64::from_bits(0x7fffffffffffffff) is a temporary object which goes out of scope at the end of the expression. Returning a reference to a temporary value or a local variable from a function is not possible, since the value referred to won't be alive anymore once the function returns.
The Copy trait, or whether the object is stored on the heap, is completely unrelated to the fact that values that go out of scope can no longer be referred to. Any value created inside a function will go out of scope at the end of the function body, unless you move it out of the function via the return value. However, you need to move ownership of the value for this to work – you can't simply return a reference.
Since you can't return a reference to any value created inside your function, any reference in the return value necessarily needs to refer to something that was passed in via the function parameters. This implies that the lifetime of any reference in the return value needs to match the lifetime of some reference that was passed to the function. Which gets us to the second point – a lifetime parameter that only occurs in the return type is always an error. The lifetime parameter is chosen by the calling code, so you are essentially saying that your function returns a reference that lives for an arbitrary time chosen by the caller, which is only possible if the reference refers to static data that lives as long as the program.
This also gives an explanation why your first example works. The literal [3.14] defines a constant static array. This array will live as long as the program, so you can return references with arbitrary lifetime to it. However, you'd usually express this by explicitly specifying the static lifetime to make clear what is happening:
fn iter() -> impl Iterator<Item = &'static f64> {
[3.14].iter()
}
A lifetime parameter that only occurs in the return value isn't ever useful.
So how do you fix your problem? You probably need to return an iterator over an owned type, since your iter() function doesn't accept any arguments.

It is Copy, it is semantically copied, and that is indeed the problem, as it semantically only exists on the stack until function returns, reference is now, semantically, pointing to memory that is outside the stack and is very likely to be overwritten soon, and if Rust allowed it, that would result in undefined behaviour.
On top of that, from_bits isn't const, and that means that you cannot convert static values at compile time, it is a runtime operation. Why do you want to convert each time when you already know the value?
Why is this the case anyway?
from_bits:
This is currently identical to transmute::<u64, f64>(v) on all
platforms.
If you take a look at transmute, you will find:
transmute is semantically equivalent to a bitwise move of one type
into another. It copies the bits from the source value into the
destination value, then forgets the original. It's equivalent to C's
memcpy under the hood, just like transmute_copy.
While generated code may indeed just be a simple reinterpretation of static value, Rust cannot semantically allow this value to be moved onto stack, then dropped, while a reference is still pointing to it.
Solution.
Since you want to return aNaN, you should just do the same thing you did in the first example:
fn iter<'a>() -> impl Iterator<Item = &'a f64> {
[f64::NAN].iter()
}
This will iterate over a static slice directly, and there will be no issues.

Why is the "move" keyword necessary when it comes to threads; why would I ever not want that behavior?

For example (taken from the Rust docs):
let v = vec![1, 2, 3];
let handle = thread::spawn(move || {
println!("Here's a vector: {:?}", v);
});
This is not a question about what move does, but about why it is necessary to specify.
In cases where you want the closure to take ownership of an outside value, would there ever be a reason not to use the move keyword? If move is always required in these cases, is there any reason why the presence of move couldn't just be implied/omitted? For example:
let v = vec![1, 2, 3];
let handle = thread::spawn(/* move is implied here */ || {
// Compiler recognizes that `v` exists outside of this closure's
// scope and does black magic to make sure the closure takes
// ownership of `v`.
println!("Here's a vector: {:?}", v);
});
The above example gives the following compile error:
closure may outlive the current function, but it borrows `v`, which is owned by the current function
When the error magically goes away simply by adding move, I can't help but wonder to myself: why would I ever not want that behavior?
I'm not suggesting anything is wrong with the required syntax. I'm just trying to gain a deeper understanding of move from people who understand Rust better than I do. :)

It's all about lifetime annotations, and a design decision Rust made long ago.
See, the reason why your thread::spawn example fails to compile is because it expects a 'static closure. Since the new thread can run longer than the code that spawned it, we have to make sure that any captured data stays alive after the caller returns. The solution, as you pointed out, is to pass ownership of the data with move.
But the 'static constraint is a lifetime annotation, and a fundamental principle of Rust is that lifetime annotations never affect run-time behavior. In other words, lifetime annotations are only there to convince the compiler that the code is correct; they can't change what the code does.
If Rust inferred the move keyword based on whether the callee expects 'static, then changing the lifetimes in thread::spawn may change when the captured data is dropped. This means that a lifetime annotation is affecting runtime behavior, which is against this fundamental principle. We can't break this rule, so the move keyword stays.
Addendum: Why are lifetime annotations erased?
To give us the freedom to change how lifetime inference works, which allows for improvements like non-lexical lifetimes (NLL).
So that alternative Rust implementations like mrustc can save effort by ignoring lifetimes.
Much of the compiler assumes that lifetimes work this way, so to make it otherwise would take a huge effort with dubious gain. (See this article by Aaron Turon; it's about specialization, not closures, but its points apply just as well.)

There are actually a few things in play here. To help answer your question, we must first understand why move exists.
Rust has 3 types of closures:
FnOnce, a closure that consumes its captured variables (and hence can only be called once),
FnMut, a closure that mutably borrows its captured variables, and
Fn, a closure that immutably borrows its captured variables.
When you create a closure, Rust infers which trait to use based on how the closure uses the values from the environment. The manner in which a closure captures its environment depends on its type. A FnOnce captures by value (which may be a move or a copy if the type is Copyable), a FnMut mutably borrows, and a Fn immutably borrows. However, if you use the move keyword when declaring a closure, it will always "capture by value", or take ownership of the environment before capturing it. Thus, the move keyword is irrelevant for FnOnces, but it changes how Fns and FnMuts capture data.
Coming to your example, Rust infers the type of the closure to be a Fn, because println! only requires a reference to the value(s) it is printing (the Rust book page you linked talks about this when explaining the error without move). The closure thus attempts to borrow v, and the standard lifetime rules apply. Since thread::spawn requires that the closure passed to it have a 'static lifetime, the captured environment must also have a 'static lifetime, which v does not outlive, causing the error. You must thus explicitly specify that you want the closure to take ownership of v.
This can be further exemplified by changing the closure to something that the compiler would infer to be a FnOnce -- || v, as a simple example. Since the compiler infers that the closure is a FnOnce, it captures v by value by default, and the line let handle = thread::spawn(|| v); compiles without requiring the move.

The existing answers have great information, which led me to an understanding that is easier for me to think about, and hopefully easier for other Rust newcomers to get.
Consider this simple Rust program:
fn print_vec (v: &Vec<u32>) {
println!("Here's a vector: {:?}", v);
}
fn main() {
let mut v: Vec<u32> = vec![1, 2, 3];
print_vec(&v); // `print_vec()` borrows `v`
v.push(4);
}
Now, asking why the move keyword can't be implied is like asking why the "&" in print_vec(&v) can't also be implied.
Rust’s central feature is ownership. You can't just tell the compiler, "Hey, here's a bunch of code I wrote, now please discern perfectly everywhere I intend to reference, borrow, copy, move, etc. Kthnxsbye!" Symbols and keywords like & and move are a necessary and integral part of the language.
In hindsight, this seems really obvious, and makes my question seem a little silly!

Borrowing fields of a struct independently inside a closure [duplicate]

This question already has answers here:
Mutably borrow one struct field while borrowing another in a closure
(2 answers)
Closed 2 years ago.
What's the difference between
let book_scores = &system.book_scores;
library.books.sort_unstable_by_key(|b| book_scores[*b]);
and
library.books.sort_unstable_by_key(|b| &system.book_scores[*b]);
?
The first one is allowed by the compiler and the second one fails with
error[E0502]: cannot borrow system as immutable because it is also borrowed as mutable
libraries is a field of system, library is an element of libraries in a for loop like
for library in &mut system.libraries {

The difference is that a closure's variable binding is shallow. Consider the following closure:
let closure = || println!("{}", a.b.c.d);
Since the closure doesn't define an a, a is borrowed from its outside environment. When actually called, the closure accesses .b.c.d on the borrowed a.
The closure doesn't try to immediately calculate a.b.c.d and borrow that. If it did, it wouldn't be able to notice changes to a.b or a.b.c or even a.b.c.d. Similarly, if it captured a.b.c on creation and accessed .d when called, it wouldn't be able to respond to changes to a.b or a.b.c.
If you need to pre-capture a part of the expression, you need to make it explicit, as your first snippet does.

Why does rust re-declare mutability when taking a reference to a mutable variable?

I'm reading chapter two of The Rust Programming Language and something I don't understand caught my eye in here:
use std::io;
fn main() {
println!("Guess the number!");
println!("Please input your guess.");
let mut guess = String::new();
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
println!("You guessed: {}", guess);
}
On code line 5, it declares a mutable variable with let mut guess = String::new(), but on the next line the argument for read_line() also has a mut keyword.
If the variable was defined as mutable in the first place, then why do we use mut again instead of just using the reference like this:
io::stdin().read_line(&guess).expect("Failed to read line");
If the type is defined for the variable, then when we use reference, shouldn't the type (mut) exist by default?

TL;DR: This is a design decision. The Rust compiler could, reasonably, infer whether mutability is necessary or not; however to a human reader it may not be obvious.
Long Story
If you look at Rust's predecessors, you will find that the use of reference arguments in C++ is not universally appreciated. In C++:
foo.call(bar);
only the definition of call will let you know whether bar is passed by value, const reference or mutable reference. As a result, the Google Style Guide is infamous for mandating pass-by-pointer for any modifiable argument, so as to distinguish at the call side whether a variable may be modified by the call, or not.
In designing Rust, there has been a large and deliberate emphasis on explicitness. The reasoning is that code is read more often than it is written, and therefore syntax and semantics should be optimized for reading and understanding.
There is a tension between explicitness and concision, so explicitness is not always preferred, but it often is.
In the case of mutable references, given the rules surrounding borrow-checking and the impact of mutable borrows on them, explicitness was preferred.

Because you can have either an immutable reference to a mutable variable or a mutable reference to a mutable variable. The keyword mut selects which type of reference you'd like to create.
let mut foo = 1;
example1(&foo); // May not modify `foo`
example2(&mut foo); // May modify `foo`
See also:
How do I pass a reference to mutable data in Rust?
What's the difference between placing "mut" before a variable name and after the ":"?

Remember that by default in Rust all is immutable, when you create a reference to something by using & by default this create a reference to something immutable, at least for the reference, the value itself is allowed to be mutable, the real mutable state of the value doesn't matter.
That is a little counter intuitive when you come from language where all is mutable. You don't need to explicitly tell that something is mutable, it's the default behavior. The need to explicitly write that a reference to something is immutable when we create it almost don't exist.
So to create a reference to something mutable, one must explicitly use &mut. This is a rule, the compiler know that the value can be mutate and could do it for you but Rust ask you to write it explicitly, it's as simple as that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Do Rust closures capture context by reference or by value? - rust

Related

How does having multiple shared references which value can be accessed just works in Rust?

Why calling iter on array of floats gives "temporary value is freed" when using f64::from_bits?

Why is the "move" keyword necessary when it comes to threads; why would I ever not want that behavior?

Borrowing fields of a struct independently inside a closure [duplicate]

Why does rust re-declare mutability when taking a reference to a mutable variable?

Categories

Resources