I have read the rfc 2005, knowing the process of manipulation is a repeated operation. And I say encounters reference pattern, I am not talking about encountering reference pattern at the first iteration like the following one:
let x = &String;
// The binding mode is `move`, so x is of type `&String`
But some cases where binding mode is shifted to ref/ref mut during the previous non-reference pattern iteration, and then encounters a reference pattern.
I have read the rfc very carefully, and I found this sentence The *default binding mode* only changes when matching a reference with a non-reference pattern., does this mean reference pattern will do nothing to the matching process, but someone told me reference pattern will reset the binding mode back to the default move(see case 2).
There are two cases where the "correct" inferences seem conflict with each other:
Correct means that the inferred type is the same as the one inferred by the compiler, but I don't know if the inferring process is also correct, which is the reason why I post this question.
// case 1
let (a, b) = &(&String, &String); // a and b are of type &&String
// case 2
let (&a, &b) = &(&String, &String); // a and b are of type String
Inference for case 1:
First, we are matching &(&String, &String) against (a, b), and (a, b) is non-reference pattern, so we deref &(&String, &String) and set binding mode from move(default one) to ref. Then we are matching (&String, &String) against (a, b) , with ref binding mode, a and b are of type &&String.
Why did it stop at matching (&String, &String) against (a, b), shouldn't we continue to match &String against a and b respectively?
If we continue, matching &String against a, a is reference pattern, what should we do now?
Inference for case 2:
First, we are matching &(&String, &String) against (&a, &b), (&a, &b) is non-reference pattern, so we deref &(&String, &String) and set binding mode to ref. Then we match (&String, &String) against (&a, &b), which is basically matching &String against &a, &a is reference pattern, so we reset binding mode back to move, making a of type String.
Contradiction:
In case 2, when encounters reference pattern, what we do is to reset binding mode to the default move. However in case 1(if we didn't stop at that point, keeping matching &String against a), a is also reference pattern, if we still reset binding mode to move, then a will have type &String instead of &&String.
And there is another question, which is when should this inferring algorithm stop? In case 1, if the algorithm should stop at that point, then everything makes senses.
Seems that I have found the answer. There are several things to clarify:
match ergonomics only covers the cases where we are matching non-reference pattern against reference, so the title of this question is wrong, match ergonomics does nothing when encounters reference pattern. The algorithm of match ergonomics is just something like:
fn match_ergonomics() {
if (the reference we are matching is `&xx`) {
binding_mode = `ref`
} else if (the reference we are matching is `&mut xx`) {
if (binding_mode == `ref`) {
binding_mode = `ref`;
} else {
binding_mode = `ref mut`;
}
}
}
default binding mode is a concept introduced for match ergonomics, but this is not exclusive to the cases covered by match ergonomics, it has been integrated into the whole rust matching system.
The concept of reference pattern can be confusing. In rfc:
A reference pattern is any pattern which can match a reference without coercion. Reference patterns include bindings, wildcards (_), consts of reference types, and patterns beginning with & or &mut.
While in rust reference, it is just pattern beginning with &/&mut.
Matching is a recursive process, for every procedure in this recursion, there are several cases depending what pattern we are encountering, which include the match ergonomics cases.
There are actually two situations where we will change the DBD, one is matching non-reference pattern against reference, and the other is matching &[mut] pat against reference.
matching
against
change to DBM
non-ref pattern
reference
set to ref or ref mut
&[mut] pat
reference
set to move
Matching algorithm
// The terminologies we are using in this algorithm come from rust reference
If no `default binding mode` is set, set the `default binding mode` to `move`.
If the pattern is an identifier pattern:
If we have explicit `ref`:
bind by reference;
else if we have explicit `ref mut`:
bind by mutable reference;
else:
// When explicit `ref/ref mut` are absent
// default binding mode decides how to bind
If the binding mode is `move`, bind directly.
If the binding mode is `ref`, bind by immutable reference.
If the binding mode is `ret mut`, bind by mutable reference.
If the pattern has an inner pattern, repeat this process with that pattern.
else if the pattern is a `&[mut]` reference pattern:
// matching `&[mut]` against non-reference triggers
// a compiler error
Ensure that we are matching against a `&[mut]` reference.
Dereference the scrutinee expression.
Set the binding mode to `move`.
Repeat this process with the inner pattern.
else if the pattern is any kind of destructuring pattern:
Set `T` to the type implied by the pattern.
Ensure that we are matching against a `T` value or `&[mut] T` reference.
// cases covered by match ergonomics
If we are matching against a `&T` reference:
Dereference the scrutinee expression.
Set the binding mode to `ref`.
else if we are matching against a `&mut T` reference:
Dereference the scrutinee expression.
If the binding mode is `move`, set the binding mode to `ref mut`.
else
destructure the value;
Repeat this process for all fields in the pattern.
Inference for case 1 and 2
// case 1
let (a, b) = &(&String, &String); // a and b are of type &&String
First procedure: we are matching (a, b) against &(&String, &String), which goes into case 3(any other kind of destructuring pattern). &(&String, &String) is a &T, so deref it and set default binding mode to ref.
Second procedure: we are matching (a, b) against (&String, &String), which is basicially matching a against &String. a is a identifier pattern, which goes into case 1. We don't have explicit ref/ref mut and default binding mode is ref, so bind by immutable ref, making a of type &&String.
All the patterns are matched, recursion is over.
// case 2
let (&a, &b) = &(&String, &String); // a and b are of type String
First procedure: we are matching (&a, &b) against &(&String, &String), which goes into case 3. Deref and set default binding mode to ref
Second procedure: we are matching (&a, &b) against (&String, &String), which is basically matching &a against &String. &String is a reference pattern, so we goes into case 2, and set default binding mode to move, &a = &String, so a has type String
All the patterns are matched, recursion is over.
The algorithm may not be that accurate, welcome people who know this stuff to edit this answer.
Related
In Rust, one often sees functions that take &str as a parameter.
fn foo(bar: &str) {
println!("{}", bar);
}
When calling functions like this, it is perfectly fine to pass in a String as an argument by referencing it.
let bar = String::from("bar");
foo(&bar);
Strictly speaking, the argument being passed is an &String and the function is expecting an &str, but Rust (as one would hope) just figures it out and everything works fine. However, this is not the case with match statements. If I try and use the same bar variable as before in a match statement, the naive usage will not compile:
match &bar {
"foo" => println!("foo"),
"bar" => println!("bar"),
_ => println!("Something else")
};
Rustc complains that it was expecting an &str but received an &String. The problem and solution are both very obvious: just borrow bar more explicitly with .as_str(). But this brings me to the real question: why is this the case?
If Rust can figure out that an &String trivially converts to an &str in the case of function arguments, why can't it do the same thing with match statements? Is this the result of a limitation of the type system, or is there hidden unsafety in fancier borrowing with match statements? Or is this simply a case of a quality of life improvement getting integrated into some places but not others? I'm sure someone knowledgeable about type systems has an answer, but there seems to be very little information about this little quirk of behavior on the internet.
The technical reason it doesn't work is because the match scrutinee is not a coercion site. Function arguments, as shown in your foo(&bar) example, are possible coercion sites; and it allows you to pass a &String as a &str because of Deref coercion.
A possible reason why its not a coercion site is that there's no clear type that it should be coerced to. In your example you'd like it to be &str since that matches the string literals, but what about:
match &string_to_inspect {
"special" => println!("this is a special string"),
other => println!("this is a string with capacity: {}", other.capacity()),
};
One would like the match to act like &str to match the literal, but because the match is on a &String one would expect other to be a &String as well. How to satisfy both? The next logical step would be for each pattern to coerce as required, which has been much desired... but it opens a whole can of worms since Deref is user-definable. See deref patterns from the Rust lang-team for more info.
https://doc.rust-lang.org/reference/type-coercions.html says:
Coercion sites
A coercion can only occur at certain coercion sites in a program; these are typically places where the desired type is explicit or can be derived by propagation from explicit types (without type inference). Possible coercion sites are:
[...]
Arguments for function calls
The value being coerced is the actual parameter, and it is coerced to the type of the formal parameter.
but not a match scrutinee.
Coercion types
Coercion is allowed between the following types:
[...]
&T or &mut T to &U if T implements Deref<Target = U>.
I have a tiny playground example here
fn main() {
let l = Some(3);
match &l {
None => {}
Some(_x) => {} // x is of type &i32
}
}
I'm pattern matching on &Option and if I use Some(x) as a branch, why is x of type &i32?
The type of the expression &l you match against is &Option<i32>, so if we are strict the patterns should be &None and &Some(x), and if we use these patterns, the type of x indeed is i32. If we omit the ampersand in the patterns, as you did in your code, it first looks like the patterns should not be able to match at all, and the compiler should throw an error similar to "expected Option, found reference", and indeed this is what the compiler did before Rust version 1.26.
Current versions of Rust support "match ergonomics" introduced by RFC 2005, and matching a reference to an enum against a pattern without the ampersand is now allowed. In general, if your match expression is only a reference, you can't move any members out of the enum, so matching a reference against Some(x) is equivalent to matching against the pattern &Some(ref x), i.e. x becomes a reference to the inner value of the Option. In your particular case, the inner value is an i32, which is Copy, so you would be allowed to match against &Some(x) and get an i32, but this is not possible for general types.
The idea of the RFC is to make it easier to get the ampersands and refs in patterns right, but I'm not completely convinced whether the new rules actually simplified things, or whether they added to the confusion by making things magically work in some cases, thereby making it more difficult for people to get a true understanding of the underlying logic. (This opinion is controversial – see the comments.)
I'm trying to learn Rust's lifetime rules by comparing it to similar concepts in C++, which I'm more familiar with. Most of the time, my intuition works really well and I can make sense the rule. However, in the following case, I'm not sure if my understanding is correct or not.
In Rust, a temporary value's lifetime is the end of its statement, except when the last temporary value is bound to a name using let.
struct A(u8);
struct B(u8);
impl A {
fn get_b(&mut self) -> Option<B> {
Some(B(self.0))
}
}
fn a(v: u8) -> A {
A(v)
}
// temporary A's lifetime is the end of the statement
// temporary B binds to a name so lives until the enclosing block
let b = a(1).get_b();
// temporary A's lifetime is the end of the statement
// temporary B's lifetime extends to the enclosing block,
// so that taking reference of temporary works similar as above
let b = &a(2).get_b();
If the temporary value is in an if condition, according to the reference, the lifetime is instead limited to the conditional expression.
// Both temporary A and temporary B drops before printing some
if a(3).get_b().unwrap().val <= 3 {
println!("some");
}
Now to the question:
If putting let in if condition, because of pattern matching, we are binding to the inner part of the temporary value. I'd expect the temporary value bound by let to be extended to the enclosing block, while other temporary values should still have a lifetime limited by the if condition.
(In this case actually everything is copied I would say even temporary B can be dropped, but that's a separate question.)
However, both temporaries' lifetimes are extended to the enclosing if block.
// Both temporary A and temporary B's lifetime are extended to the end of the enclosing block,
// which is the if statement
if let Some(B(v # 0...4)) = a(4).get_b() {
println!("some {}", v);
}
Should this be considered an inconsistency in Rust? Or am I misunderstanding and there is a consistent rule that can explain this behavior?
Full code example:
playground
The same thing implemented in C++ that matches my expectation
Note the output from Rust is
some 4
Drop B 4
Drop A 4
while the output from C++ is
Drop A 4
some 4
Drop B 4
I have read this Reddit thread and Rust issue, which I think is quite relevant, but I still can't find a clear set of lifetime rule that works for all the cases in Rust.
Update:
What I'm unclear about is why the temporary lifetime rule about if conditional expression does not apply to if let. I think the let Some(B(v # 0...4)) = a(4).get_b() should be the conditional expression, and thus the temporary A's lifetime should be limited by that, rather than the entire if statement.
The behaviour of extending temporary B's lifetime to the entire if statement is expected, because that is borrowed by the pattern matching.
An if let construct is just syntactic sugar for a match construct. let Some(B(v # 0...4)) = a(4).get_b() is not a conditional used in a regular if expression, because it is not an expression that evaluates to bool. Given your example:
if let Some(B(v # 0...4)) = a(4).get_b() {
println!("some {}", v);
}
It will behave exactly the same as the below example. No exceptions. if let is rewritten into match before the type or borrow checkers are even run.
match a(4).get_b() {
Some(B(v # 0...4)) => {
println!("some {}", v);
}
_ => {}
}
Temporaries live as long as they do in match blocks because they sometimes come in handy. Like if your last function was fn get_b(&mut self) -> Option<&B>, and if the temporary didn't live for the entire match block, then it wouldn't pass borrowck.
If conditionals don't follow the same rule because it's impossible for the last function call in an if conditional to hold a reference to anything. They have to evaluate to a plain bool.
See:
Rust issue 37612
I don't understand "where" the MutexGuard in the inner block of code is. The mutex is locked and unwrapped, yielding a MutexGuard. Somehow this code manages to dereference that MutexGuard and then mutably borrow that object. Where did the MutexGuard go? Also, confusingly, this dereference cannot be replaced with deref_mut. Why?
use std::sync::Mutex;
fn main() {
let x = Mutex::new(Vec::new());
{
let y: &mut Vec<_> = &mut *x.lock().unwrap();
y.push(3);
println!("{:?}, {:?}", x, y);
}
let z = &mut *x.lock().unwrap();
println!("{:?}, {:?}", x, z);
}
Summary: because *x.lock().unwrap() performs an implicit borrow of the operand x.lock().unwrap(), the operand is treated as a place context. But since our actual operand is not a place expression, but a value expression, it gets assigned to an unnamed memory location (basically a hidden let binding)!
See below for a more detailed explanation.
Place expressions and value expressions
Before we dive in, first two important terms. Expressions in Rust are divided into two main categories: place expressions and value expressions.
Place expressions represent a value that has a home (a memory location). For example, if you have let x = 3; then x is a place expression. Historically this was called lvalue expression.
Value expressions represent a value that does not have a home (we can only use the value, there is no memory location associated with it). For example, if you have fn bar() -> i32 then bar() is a value expression. Literals like 3.14 or "hi" are value expressions too. Historically these were called rvalue expressions.
There is a good rule of thumb to check if something is a place or value expression: "does it make sense to write it on the left side of an assignment?". If it does (like my_variable = ...;) it is a place expression, if it doesn't (like 3 = ...;) it's a value expression.
There also exist place contexts and value contexts. These are basically the "slots" in which expressions can be placed. There are only a few place contexts, which (usually, see below) require a place expression:
Left side of a (compound) assignment expression (⟨place context⟩ = ...;, ⟨place context⟩ += ...;)
Operand of an borrow expression (&⟨place context⟩ and &mut ⟨place context⟩)
... plus a few more
Note that place expressions are strictly more "powerful". They can be used in a value context without a problem, because they also represent a value.
(relevant chapter in the reference)
Temporary lifetimes
Let's build a small dummy example to demonstrate a thing Rust does:
struct Foo(i32);
fn get_foo() -> Foo {
Foo(0)
}
let x: &Foo = &get_foo();
This works!
We know that the expression get_foo() is a value expression. And we know that the operand of a borrow expression is a place context. So why does this compile? Didn't place contexts need place expressions?
Rust creates temporary let bindings! From the reference:
When using a value expression in most place expression contexts, a temporary unnamed memory location is created initialized to that value and the expression evaluates to that location instead [...].
So the above code is equivalent to:
let _compiler_generated = get_foo();
let x: &Foo = &_compiler_generated;
This is what makes your Mutex example work: the MutexLock is assigned to a temporary unnamed memory location! That's where it lives. Let's see:
&mut *x.lock().unwrap();
The x.lock().unwrap() part is a value expression: it has the type MutexLock and is returned by a function (unwrap()) just like get_foo() above. Then there is only one last question left: is the operand of the deref * operator a place context? I didn't mention it in the list of place contests above...
Implicit borrows
The last piece in the puzzle are implicit borrows. From the reference:
Certain expressions will treat an expression as a place expression by implicitly borrowing it.
These include "the operand of the dereference operator (*)"! And all operands of any implicit borrow are place contexts!
So because *x.lock().unwrap() performs an implicit borrow, the operand x.lock().unwrap() is a place context, but since our actual operand is not a place, but a value expression, it gets assigned to an unnamed memory location!
Why doesn't this work for deref_mut()
There is an important detail of "temporary lifetimes". Let's look at the quote again:
When using a value expression in most place expression contexts, a temporary unnamed memory location is created initialized to that value and the expression evaluates to that location instead [...].
Depending on the situation, Rust chooses memory locations with different lifetimes! In the &get_foo() example above, the temporary unnamed memory location had a lifetime of the enclosing block. This is equivalent to the hidden let binding I showed above.
However, this "temporary unnamed memory location" is not always equivalent to a let binding! Let's take a look at this case:
fn takes_foo_ref(_: &Foo) {}
takes_foo_ref(&get_foo());
Here, the Foo value only lives for the duration of the takes_foo_ref call and not longer!
In general, if the reference to the temporary is used as an argument for a function call, the temporary lives only for that function call. This also includes the &self (and &mut self) parameter. So in get_foo().deref_mut(), the Foo object would also only live for the duration of deref_mut(). But since deref_mut() returns a reference to the Foo object, we would get a "does not live long enough" error.
That's of course also the case for x.lock().unwrap().deref_mut() -- that's why we get the error.
In the deref operator (*) case, the temporary lives for the enclosing block (equivalent to a let binding). I can only assume that this is a special case in the compiler: the compiler knows that a call to deref() or deref_mut() always returns a reference to the self receiver, so it wouldn't make sense to borrow the temporary for only the function call.
Here are my thoughts:
let y: &mut Vec<_> = &mut *x.lock().unwrap();
A couple of things going on under the surface for your current code:
Your .lock() yields a LockResult<MutexGuard<Vec>>
You called unwrap() on the LockResult and get a MutexGuard<Vec>
Because MutexGuard<T> implements the DerefMut interface, Rust performs deref coercion. It gets dereferenced by the * operator, and yields a &mut Vec.
In Rust, I believe you don't call deref_mut by your own, rather the complier will do the Deref coercion for you.
If you want to get your MutexGuard, you should not dereference it:
let mut y = x.lock().unwrap();
(*y).push(3);
println!("{:?}, {:?}", x, y);
//Output: Mutex { data: <locked> }, MutexGuard { lock: Mutex { data: <locked> } }
From what I have seen online, people usually do make the MutexGuard explicit by saving it into a variable, and dereference it when it is being used, like my modified code above. I don't think there is an official pattern about this. Sometimes it will also save you from making a temporary variable.
In learning Rust, I encountered the following in the official Rust book:
There’s one pitfall with patterns: like anything that introduces a new
binding, they introduce shadowing. For example:
let x = 'x';
let c = 'c';
match c {
x => println!("x: {} c: {}", x, c),
}
println!("x: {}", x)
This prints:
x: c c: c
x: x
In other words, x => matches the pattern and introduces a new binding
named x that’s in scope for the match arm. Because we already have a
binding named x, this new x shadows it.
I don't understand two things:
Why does the match succeed?
Shouldn't the differing value of c and x cause this to fail?
How does the match arm x binding get set to 'c'?
Is that somehow the return of the println! expression?
There is a fundamental misconception of what match is about.
Pattern-matching is NOT about matching on values but about matching on patterns, as the name imply. For convenience and safety, it also allows binding names to the innards of the matched pattern:
match some_option {
Some(x) => println!("Some({})", x),
None => println!("None"),
}
For convenience, match is extended to match the values when matching specifically against literals (integrals or booleans), which I think is at the root of your confusion.
Why? Because a match must be exhaustive!
match expressions are there so the compiler can guarantee that you handle all possibilities; checking that you handle all patterns is easy because they are under the compiler's control, checking that you handle all values is hard in the presence of custom equality operators.
When using just a name in the match clause, you create an irrefutable pattern: a pattern that cannot fail, ever. In this case, the entire value being matched is bound to this name.
You can exhibit this by adding a second match clause afterward, the compiler will warn that the latter binding is unreachable:
fn main() {
let x = 42;
match x {
name => println!("{}", name),
_ => println!("Other"),
};
}
<anon>:6:5: 6:6 error: unreachable pattern [E0001]
<anon>:6 _ => println!("Other"),
^
Combined with the shadowing rules, which specifically allow hiding a binding in a scope by reusing its name to bind another value, you get the example:
within the match arm, x is bound to the value of 'c'
after the arm, the only x in scope is the original one bound to the value 'x'
Your two points are caused by the same root problem. Coincidentally, the reason that this section exists is to point out the problem you asking about! I'm afraid that I'm basically going to regurgitate what the book says, with different words.
Check out this sample:
match some_variable {
a_name => {},
}
In this case, the match arm will always succeed. Regardless of the value in some_variable, it will always be bound to the name a_name inside that match arm. It's important to get this part first — the name of the variable that is bound has no relation to anything outside of the match.
Now we turn to your example:
match c {
x => println!("x: {} c: {}", x, c),
}
The exact same logic applies. The match arm with always match, and regardless of the value of c, it will always be bound to the name x inside the arm.
The value of x from the outer scope ('x' in this case) has no bearing whatsoever in a pattern match.
If you wanted to use the value of x to control the pattern match, you can use a match guard:
match c {
a if a == x => println!("yep"),
_ => println!("nope"),
}
Note that in the match guard (if a == x), the variable bindings a and x go back to acting like normal variables that you can test.