Rust: Strange syntax with match and variable binding [duplicate] - rust

This question already has an answer here:
How does match compiles with `continue` in its arms?
(1 answer)
Closed last year.
I was looking the Rust's docs where they introduce this piece of code:
let guess: u32 = match guess.trim().parse() {
Ok(num) => num,
Err(_) => continue,
};
where the value num is assigned to guess in case of a successful parsing, otherwise the continue statement is executed making the loop to break.
Is it me or this syntax is a bit odd? It feels like we are executing guess = continue. I found this way of writing it more natural:
match guess.trim().parse() {
Ok(num) => let guess: u32 = num,
Err(_) => continue,
};
Am I understanding something wrong? I don't know anything about Rust, so maybe there is an explanation for this.

match is (like amazingly many things in Rust) an expression. So at runtime it has a value that you can assign to a variable, for example.
continue is a control statement. It causes the flow of control to resume at the next iteration of a loop. If that branch is taken, there is no need for a value because the assignment, which would be done after the match, is never executed.
If you are used to other programming languages that have exceptions the following idea may help understanding the situation... in Python, for example, you can assign the return value of a function to a variable:
value = func()
However, if func raises an exception there is no return value. None is needed because execution continues at the nearest exception handler.
Something similar is going on in Rust with control flow commands like continue, break, or return. They basically tell the computer to "forget what you are doing right now, continue with (or leave) the loop/function".
One could argue that the second way feels indeed more natural (it's how you would do it in Python, hehe). However, Rust has very strict scoping rules and it would be hard to define sensibly in which scope the guess variable should exist, or what would happen if another branch defined a different variable. The way it is done in Rust is very clear once you get used to it.

What you have written is not only different from what was given in the example, it is also completely useless.
match can either be used as an expression, or as a statement right away. In the first syntax, you are using match as an expression. Is it odd? Not if you are used to functional programming (for instance, you would have the same pattern in OCaml), but if you come from imperative language it can be a bit puzzling.
In the second case, this does not even mean anything because match arms should be expressions, not statements. So, to make it work, you should at least do something like
match guess.trim().parse() {
Ok(num) => {
let a = num;
}
Err(_) => continue,
}
But then, it's easy to understand what's wrong: the variable a has a lifetime that is less that the match scope, ie. the match scope outlives a, meaning that after the match statement, a is undefined.
See this playground example, where it just does what you expect, as opposed to this playground example which does not even compile. If you look, besides the "this does not make sense" error, there is a "this variable in undefined" error.
Besides, the idea of this pattern is that if you read the code quickly, you start by seeing a let pattern (which is a bit cumbersome), because that is the main point of that statement, and not a conditional flow instruction, because that is not the important part of that statement.

Related

What is the difference between Option::None in Rust and null in other languages?

I want to know what are the main differences between Rust's Option::None and "null" in other programming languages. And why is None considered to be better?
A brief history lesson in how to not have a value!
In the beginning, there was C. And in C, we had these things called pointers. Oftentimes, it was useful for a pointer to be initialized later, or to be optional, so the C community decided that there would be a special place in memory, usually memory address zero, which we would all just agree meant "there's no value here". Functions which returned T* could return NULL (written in all-caps in C, since it's a macro) to indicate failure, or lack of value. In a low-level language like C, in that day and age, this worked fairly well. We were only just rising out of the primordial ooze that is assembly language and into the realm of typed (and comparatively safe) languages.
C++ and later Java basically aped this approach. In C++, every pointer could be NULL (later, nullptr was added, which is not a macro and is a slightly more type-safe NULL). In Java, the problem becomes more clear: Every non-primitive value (which means, basically, every value that isn't a number or character) could be null. If my function takes a String as argument, then I always have to be ready to handle the case where some naive young programmer passes me a null. If I get a String as a result from a function, then I have to check the docs to see if it might not actually exist.
That gave us NullPointerException, which even today crowds this very website with tens of questions every day by young programmers falling into traps.
It is clear, in the modern day, that "every value might be null" is not a sustainable approach in a statically typed language. (Dynamically typed languages tend to be more prepared to deal with failure at every point anyway, by their very nature, so the existence of, say, nil in Ruby or None in Python is somewhat less egregious.)
Kotlin, which is often lauded as a "better Java", approaches this problem by introducing nullable types. Now, not every type is nullable. If I have a String, then I actually have a String. If I intend that my String be nullable, then I can opt-in to nullability with String?, which is a type that's either a string or null. Crucially, this is type-safe. If I have a value of type String?, then I can't call String methods on it until I do a null check or make an assertion. So if x has type String?, then I can't do x.toLowerCase() unless I first do one of the following.
Put it inside a if (x != null), to make sure that x is not null (or some other form of control flow that proves my case).
Use the ? null-safe call operator to do x?.toLowerCase(). This will compile to an if (x != null) check and will return a String? which is null if the original string was null.
Use the !! to assert that the value is not null. The assertion is checked and will throw an exception if I turn out to be wrong.
Note that (3) is what Java does by default at every turn. The difference is that, in Kotlin, the "I'm asserting that I know better than the type checker" case is opt-in, and you have to go out of your way to get into the situation where you can get a null pointer exception. (I'm glossing over platform types, which are a convenient hack in the type system to interoperate with Java. It's not really germane here)
Nullable types is how Kotlin solves the problem, and it's how Typescript (with --strict mode) and Scala 3 (with null checks turned on) handle the problem as well. However, it's not really viable in Rust without significant changes to the compiler. That's because nullable types require the language to support subtyping. In a language like Java or Kotlin, which is built using object-oriented principles first, it's easy to introduce new subtypes. String is a subtype of String?, and also of Any (essentially java.lang.Object), and also of Any? (any value and also null). So a string like "A" has a principal type of String but it also has all of those other types, by subtyping.
Rust doesn't really have this concept. Rust has a bit of type coercion available with trait objects, but that's not really subtyping. It's not correct to say that String is a subtype of dyn Display, only that a String (in an unsized context) can be coerced automatically into a dyn Display.
So we need to go back to the drawing board. Nullable types don't really work here. However, fortunately, there's another way to handle the problem of "this value may or may not exist", and it's called optional types. I wouldn't hazard a guess as to what language first tried this idea, but it was definitely popularized by Haskell and is common in more functional languages.
In functional languages, we often have a notion of principal types, similar to in OOP languages. That is, a value x has a "best" type T. In OOP languages with subtyping, x might have other types which are supertypes of T. However, in functional languages without subtyping, x truly only has one type. There are other types that can unify with T, such as (written in Haskell's notation) forall a. a. But it's not really correct to say that the type of x is forall a. a.
The whole nullable type trick in Kotlin relied on the fact that "abc" was both a String and a String?, while null was only a String?. Since we don't have subtyping, we'll need two separate values for the String and String? case.
If we have a structure like this in Rust
struct Foo(i32);
Then Foo(0) is a value of type Foo. Period. End of story. It's not a Foo?, or an optional Foo, or anything like that. It has one type.
However, there's this related value called Some(Foo(0)) which is an Option<Foo>. Note that Foo(0) and Some(Foo(0)) are not the same value, they just happen to be related in a fairly natural way. The difference is that while a Foo must exist, an Option<Foo> could be Some(Foo(0)) or it could be a None, which is kind of like our null in Kotlin. We still have to check whether or not the value is None before we can do anything. In Rust, we generally do that with pattern matching, or by using one of the several built-in functions that do the pattern matching for us. It's the same idea, just implementing using techniques from functional programming.
So if we want to get the value out of an Option, or a default if it doesn't exist, we can write
my_option.unwrap_or(0)
If we want to do something to the inside if it exists, or null out if it doesn't, then we write
my_option.and_then(|inner_value| ...)
This is basically what ?. does in Kotlin. If we want to assert that a value is present and panic otherwise, we can write
my_option.unwrap()
Finally, our general purpose Swiss army knife for dealing with this is pattern matching.
match my_option {
None => {
...
}
Some(value) => {
...
}
}
So we have two different approaches to dealing with this problem: nullable types based on subtyping, and optional types based on composition. "Which is better" is getting into opinion a bit, but I'll try to summarize the arguments I see on both sides.
Advocates of nullable types tend to focus on ergonomics. It's very convenient to be able to do a quick "null check" and then use the same value, rather than having to jump through hoops of unwrapping and wrapping values constantly. It's also nice being able to pass literal values to functions expecting String? or Int? without worrying about bundling them or constantly checking whether they're in a Some or not.
On the other hand, optional types have the benefit of being less "magic". If nullable types didn't exist in Kotlin, we'd be somewhat out of luck. But if Option didn't exist in Rust, someone could write it themselves in a few lines of code. It's not special syntax, it's just a type that exists and has some (ordinary) functions defined on it. It's also built up of composition, which means (naturally) that it composes better. That is, (T?)? is equivalent to T? (the former still only has one null value), while Option<Option<T>> is distinct from Option<T>. I don't recommend writing functions that return Option<Option<T>> directly, but you can end up getting bitten by this when writing generic functions (i.e. your function returns S? and the caller happens to instantiate S with Int?).
I go a bit more into the differences in this post, where someone asked basically the same question but in Elm rather than Rust.
There's two implicit assumptions in the question: first, that other language's "nulls" (and nils and undefs and nones) are all the same. They aren't.
The second assumption is that "null" and "none" provide similar functionality. There's many different uses of null: the value is unknown (SQL trinary logic), the value is a sentinel (C's null pointer and null byte), there was an error (Perl's undef), and to indicate no value (Rust's Option::none).
Null itself comes in at least three different flavors: an invalid value, a keyword, and special objects or types.
Keyword as null
Many languages opt for a specific keyword to indicate null. Perl has undef, Go and Ruby have nil, Python has None. These are useful in that they are distinct from false or 0.
Invalid value as null
Unlike having a keyword, these are things within the language which mean a specific value, but are used as null anyway. The best examples are C's null pointer and null bytes.
Special objects and types as null
Increasingly, languages will use special objects and types for null. These have all the advantages of a keyword, but rather than a generic "something went wrong" they can have very specific meaning per domain. They can also be user-defined offering flexibility beyond what the language designer intended. And, if you use objects, they can have additional information attached.
For example, std::ptr::null in Rust indicates a null raw pointer. Go has the error interface. Rust has Result::Err and Option::None.
Null as unknown
Most programming languages use two-value or binary logic: true or false. But some, particularly SQL, use three-value or trinary logic: true, false, and unknown. Here null means "we do not know what the value is". It's not true, it's not false, it's not an error, it's not empty, it's not zero: the value is unknown. This changes how the logic works.
If you compare unknown to anything, even itself, the result is unknown. This allows you to draw logical conclusions based on incomplete data. For example, Alice is 7'4" tall, we don't know how tall Bob is. Is Alice taller than Bob? Unknown.
Null as uninitialized
When you declare a variable, it has to contain some value. Even in C, which famously does not initialize variables for you, it contains whatever value happened to be sitting in memory. Modern languages will initialize values for you, but they have to initialize it to something. Often that something is null.
Null as sentinel
When you have a list of things and you need to know when the list is done, you can use a sentinel value. This is anything which is not a valid value for the list. For example, if you have a list of positive numbers you can use -1.
The classic example is C. In C, you don't know how long an array is. You need something to tell you to stop. You can pass around the size of the array, or you can use a sentinel value. You read the array until you see the sentinel. A string in C is just an array of characters which ends with a "null byte", that's just a 0 which is an invalid character. If you have an array of pointers, you can end it with a null pointer. The disadvantages are 1) there's not always a truly invalid value and 2) if you forget the sentinel you walk off the end of the array and bad stuff happens.
A more modern example is how to know to stop iterating. For example in Go, for thing := range things { ... }, range will return nil when there are no more items in the range causing the for loop to exit. While this is more flexible, it has the same problem as the classic sentinel: you need a value which will never appear in the list. What if null is a valid value?
Languages such as Python and Ruby choose to solve this problem by raising a special exception when the iteration is done. Both will raise StopIteration which their loops will catch and then exit the loop. This avoids the problem of choosing a sentinel value, Python and Ruby iterators can return anything.
Null as error
While many languages use exceptions for error handling, some languages do not. Instead, they return a special value to indicate an error, often null. C, Go, Perl, and Rust are good examples.
This has the same problem as a sentinel, you need to use a value which is not a valid return value. Sometimes this is not possible. For example, functions in C can only return a single value of a given type. If it returns a pointer, it can return a null pointer to indicate error. But if it returns a number, you have to pick an otherwise valid number as the error value. This conflating the error and return values is a problem.
Go works around this by allowing functions to return multiple values, typically the return value and an error. Then the two values can be checked independently. Rust can only return a single value, so it works around this by returning the special type Result. This contains either an Ok with the return value or an Err with an error code.
In both Rust and Go, they are not just values, but they can have methods called on them expanding their functionality. Rust's Result::Err has the additional advantage of being a special type, you can't accidentally use a Result::Err as anything else.
Null as no value
Finally, we have "none of the given options". Quite simply, there's a set of valid options and the result is none of them. This is distinct from "unknown" in that we know the value is not in the valid set of values. For example, if I asked "which fruit is this car" the result would be null meaning "the car is not any fruit".
When asking for the value of a key in a collection, and that key does not exist, you will get "no value". For example, Rust's HashMap get will return None if the key does not exist.
This is not an error, though they often get confused. For example, Ruby will raise ArgumentError if you pass nonsense into a function. For example, array.first(-2) asks for the first -2 values which is nonsense and will raise an ArgumentError.
Option vs Result
Which finally brings us back to Option::None. It is the "special type" version of null which has many advantages.
Rust uses Option for many things: to indicate an uninitialized value, to indicate simple errors, as no value, and for Rust specific things. The documentation gives numerous examples.
Initial values
Return values for functions that are not defined over their entire input range (partial functions)
Return value for otherwise reporting simple errors, where None is returned on error
Optional struct fields
Struct fields that can be loaned or “taken”
Optional function arguments
Nullable pointers
Swapping things out of difficult situations
To use it in so many places dilutes its value as a special type. It also overlaps with Result which is what you're supposed to use to return results from functions.
The best advice I've seen is to use Result<Option<T>, E> to separate the three possibilities: a valid result, no result, and an error. Jeremy Fitzhardinge gives a good example of what to return from querying a key/value store.
...I'm a big advocate of returning Result<Option<T>, E> where Ok(Some(value)) means "here's the thing you asked for", Ok(None) means "the thing you asked for definitely doesn't exist", and Err(...) means "I can't say whether the thing you asked for exists or not because something went wrong".

Understanding try_for_each in RUST?

I am new to Rust and until now I have very hard time understanding the compiler errors. In this specific case I have a fallible function that I want to apply to every element of the iterator. I have try using try_for_each but compiler keeps giving me error the trait bound (): Try is not satisfied . I have no idea how to resolve this issue. I also wonder how people usually get idea of Rust errors, because I feel it is very hard to decode?
Structure of my code is as follows
iter.try_for_each(|x| let some_var= fallible_function(*x)?; do_something_with_var(some_var);)
try_for_each take a closure/function that return a Try type (https://doc.rust-lang.org/std/ops/trait.Try.html), like Option or Result, where you can apply the ? operator, just like you did with fallible_function(*x)?, so if your function return an Result, the closure must return an Result, but your closure end with nothing, so you must return an Infaillible value like Result::Ok (you could return a Residual, https://doc.rust-lang.org/std/ops/trait.FromResidual.html, like Option::None, but the iterator will always return after the first iteration).
I'm entering a bit too much into the detail on how the ? operator work, if you want to know more about the inner working of it and how it can enable you to make your own types for advanced control flow I would highly suggest you to look into the Try trait, but other than that yeah you just need to return Ok(()) or Some(()), depending on what the faillible function return.

Is this "possibly-uninitialized" compiler error a false alarm? [rustc 1.51.0]

I was hit with the possibly-uninitialized variable error, although I'm convinced that this should never be the case.
(The rustc --version is rustc 1.51.0 (2fd73fabe 2021-03-23))
fn example(discriminant: bool) {
let value;
if discriminant {
value = 0;
}
// more code [...]
if discriminant {
println!("{}", value);
}
}
Here's the error message:
error[E0381]: borrow of possibly-uninitialized variable: `value`
--> src/raytracing/intersect.rs:8:24
|
10 | println!("{}", value);
| ^^^^^ use of possibly-uninitialized `value`
Reasoning
I did expect the example to compile (even though seperating the if-blocks as shown isn't "good style" IMHO):
As discriminant is immutable, I expect the second if block to never execute without the first.
Therefore, the variable should always be defined at the time of borrow in println!().
Possible mitigation
The error can be 'silenced' by initializing value with a temporary value.
I strongly disagree with this approach because
a temporary value could be wrong in a mathematical sense due to it being impossible to compute in other code paths.
Also, this could hide the fact that the value isn't properly defined during development, limiting the compiler's abilities to signal errors.
Question
Can someone clarify if this case is intended behaviour or maybe a compiler bug?
You're not the first person to see this behavior, and generally I would not consider it a bug. In your case, the condition is very simple and obvious, and it's easy for you and me to reason about this condition. However, in general, the condition does not need to be obvious and, for example, the second case could have additional conditions that are hard to reason about, even though the code is correct.
Moreover, the kind of analysis you want the compiler to do (determine which code is reachable based on conditions) is usually only done when the compiler is optimizing. Therefore, even if the compiler had support for this kind of analysis, it probably wouldn't work in debug mode. It also doesn't work in all cases even in the best optimizing compilers and static analysis tools.
If you have a philosophical objection to initializing a dummy value in this case, you can use an Option:
fn example(discriminant: bool) {
let value = if discriminant {
Some(0)
} else {
None
};
// more code [...]
if discriminant {
println!("{}", value.unwrap());
}
}
In this case, your value is always initialized, and in the second portion, you're asserting that it contains a suitable non-None value. If you're in release mode, the compiler may be able to determine that your code is correct and optimize it to omit the Option (or it may not).

Why can't Rust do more complex type inference? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
In Chapter 3 of The Rust Programming Language, the following code is used as an example for a kind of type inference that Rust cannot manage:
fn main() {
let condition = true;
let number = if condition { 5 } else { "six" };
println!("The value of number is: {}", number);
}
with the explanation that:
Rust needs to know at compile time what type the number variable is, definitively, so it can verify at compile time that its type is valid everywhere we use number. Rust wouldn’t be able to do that if the type of number was only determined at runtime; the compiler would be more complex and would make fewer guarantees about the code if it had to keep track of multiple hypothetical types for any variable.
I'm not certain I understand the rationale, because the example does seem like something where a simple compiler could infer the type.
What exactly makes this kind of type inference so difficult? In this case, the value of condition can be clearly inferred at compile time (it's true), and so thus the type of number can be too (it's i32?).
I can see how things could become a lot more complicated, if you were trying to infer types across multiple compilation units for instance, but is there something about this specific example that would add a lot of complexity to the compiler?
There are three main reasons I can think of:
1. Action at a distance effects
Let's suppose the language worked that way. Since we're extending type inference, we might as well make the language even smarter and have it infer return types as well. This allows me to write something like:
pub fn get_flux_capacitor() {
let is_prod = true;
if is_prod { FluxCapacitor::new() } else { MovieProp::new() }
}
And elsewhere in my project, I can get a FluxCapacitor by calling that function. However, one day, I change is_prod to false. Now, instead of getting an error that my function is returning the wrong type, I will get errors at every callsite. A small change inside one function has lead to errors in entirely unchanged files! That's pretty weird.
(If we don't want to add inferered return types, just imagine it's a very long function instead.)
2. Compiler internals exposed
What happens in the case where it's not so simple? Surely this should be the same as the above example:
pub fn get_flux_capacitor() {
let is_prod = (1 + 1) == 2;
...
}
But how far does that extend? The compiler's constant propagation is mostly an implementation detail. You don't want the types in your program to depend on how smart this version of the compiler is.
3. What did you actually mean?
As a human looking at this code, it looks like something is missing. Why are you branching on true at all? Why not just write FluxCapacitor::new()? Perhaps there's logic missing to check and see if a env=DEV environment variable is missing. Perhaps a trait object should actually be used so that you can take advantage of runtime polymorphism.
In this kind of situation where you're asking the computer to do something that doesn't seem quite right, Rust often chooses to throw its hands up and ask you to fix the code.
You're right, in this very specific case (where condition=true statically), the compiler could be made able to detect that the else branch is unreachable and therefore number must be 5.
This is just a contrived example, though... in the more general case, the value of condition would only be dynamically known at runtime.
It's in that case, as other have said, that inference becomes hard to implement.
On that topic, there are two things I haven't seen mentioned yet.
The Rust language design tends to err on the side of doing things as
explicitly as possible
Rust type inference is only local
On point #1, the explicit way for Rust to deal with the "this type can be one of multiple types" use case are enums.
You can define something like this:
#[derive(Debug)]
enum Whatsit {
Num(i32),
Text(&'static str),
}
and then do let number = if condition { Num(5) } else { Text("six") };
On point #2, let's see how the enum (while wordier) is the preferred approach in the language. In the example from the book we just try printing the value of number.
In a more real-case scenario we would at one point use number for something other than printing.
This means passing it to another function or including it in another type. Or (to even enable use of println!) implementing the Debug or Display traits on it. Local inference means that (if you can't name the type of number in Rust), you would not be able to do any of these things.
Suppose you want to create a function that does something with a number;
with the enum you would write:
fn do_something(number: Whatsit)
but without it...
fn do_something(number: /* what type is this? */)
In a nutshell, you're right that in principle it IS doable for the compiler to synthesize a type for number. For instance, the compiler might create an anonymous enum like Whatsit above when compiling that code.
But you - the programmer - would not know the name of that type, would not be able to refer to it, wouldn't even know what you can do with it (can I multiply two "numbers"?) and this would greatly limit its usefulness.
A similar approach was followed for instance to add closures to the language. The compiler would know what specific type a closure has, but you, the programmer, would not. If you're interested I can try finding out discussions on the difficulties that the approach introduced in the design of the language.

Why does Rust need the `if let` syntax?

Coming from other functional languages (and being a Rust newbie), I'm a bit surprised by the motivation of Rust's if let syntax. The RFC mentions that without if let, the "idiomatic solution today for testing and unwrapping an Option<T>" is either
match opt_val {
Some(x) => {
do_something_with(x);
}
None => {}
}
or
if opt_val.is_some() {
let x = opt_val.unwrap();
do_something_with(x);
}
In Scala, it would be possible to do exactly the same, but the idiomatic solution is rather to map over an Option (or to foreach if it is only for the side effect of doing_something_with(x)).
Why isn't it an idiomatic solution to do the same in Rust?
opt_val.map(|x| do_something_with(x));
map() is intended for transforming an optional value, while if let is mostly needed to perform side effects. While Rust is not a pure language, so any of its code blocks can contain side effects, map semantics is still there. Using map() to perform side effects, while certainly possible, will only confuse readers of your code. Note that it should not have performance penalties, at least in simple code - LLVM optimizer is perfectly capable of inlining the closure directly into the calling function, so it turns to be equivalent to a match statement.
Before if let the only way to perform side effects on an Option was either a match or if with Option::is_some() check. match approach is the safest one, but it is very verbose, especially when a lot of nested checks are needed:
match o1 {
Some(v1) => match v1.f {
Some(v2) => match some_function(v2) {
Some(r) => ...
None => {}
}
None => {}
}
None => {}
}
Note the prominent rightward drift and a lot of syntactical noise. And it only gets worse if branches are not simple matches but proper blocks with multiple statements.
if option.is_some() approach, on the other hand, is slightly less verbose but still reads very badly. Also its condition check and unwrap() are not statically tied, so it is possible to get it wrong without the compiler noticing it.
if let solves the verbosity problem, based on the same pattern matching infrastructure as match (so it is harder to get wrong than if option.is_some()) and, as a side benefit, allows using arbitrary types in patterns, not only Option. For example, some types may not provide map()-like methods; if let will still work with them very nicely. So if let is a clear win, hence it is idiomatic.
.map() is specific to the Option<T> type, but if let (and while let!) are features that work with all Rust types.
Because your solution creates a closure, which uses resources, whereas if let desugars exactly to your first example, which doesn't. I also find it more readable.
Rust is all about zero-cost abstractions that make programming nicer, and if let and while let are good examples of those (at least IMO -- I realize it's a matter of personal preference). They're not strictly necessary, but they sure feel good to use (also see: Clojure, where they were likely lifted from).
A quote from a related issue against The Rust Programming Language:
You'd better think about if let as a one-handed match with no exhaustive checking enforced. It often has nothing to do with if or let, that's why it is so confusing. :)
Maybe it would be better to rename the whole operator, call it match once or something like that.
(slightly modified to make sense without the context)

Resources