How does Rust solve mutability for Hindley-Milner? - rust

I've read that Rust has very good type inference using Hindley-Milner. Rust also has mutable variables and AFAIK there must be some constraints when a HM algorithm works with mutability because it could over-generalize. The following code:
let mut a;
a = 3;
a = 2.5;
Does not compile, because at the second row integer was inferred and a floating point value cannot be assigned to an integer variable. So I'm guessing that for simple variables, as soon as a non-generic type is inferred, the variable becomes a mono-type and cannot be generalized anymore.
But what about a template, like Vec? For example this code:
let mut v;
v = Vec::new();
v.push(3);
v.push(2.3);
This fails again, but for the last line again. That means that the second row inferred the type partially (Vec) and the third one inferred the container type.
What's the rule? Is there something like value-restriction that I don't know about? Or am I over-complicating things and Rust has much tighter rules (like no generalization at all)?

It is considered an issue (as far as diagnostic quality goes) that rustc is slightly too eager in its type inference.
If we check your first example:
let mut a = 3;
a = 2.5;
Then the first line leads to inferring that a has a {generic integer type}, and the second line will lead to diagnose that 2.5 cannot be assigned to a because it's not a generic integer type.
It is expected that a better algorithm would instead register the conflict, and then point at the lines from which each type came. Maybe we'll get that with Chalk.
Note: the generic integer type is a trick of Rust to make integer literals "polymorphic", if there is no other hint at what specific integer type it should be, it will default to i32.
The second example occurs in basically the same way.
let mut v = Vec::new();
v.push(3);
In details:
v is assigned type $T
Vec::new() produces type Vec<$U>
3 produces type {integer}
So, on the first line, we get $T == Vec<$U> and on the second line we get $U == {integer}, so v is deduced to have type Vec<{integer}>.
If there is no other source to learn the exact integer type, it falls back to i32 by default.
I would like to note that mutability does not actually impact inference here; from the point of view of type inference, or type unification, the following code samples are equivalent:
// With mutability:
let mut a = 1;
a = 2.5;
// Without mutability:
let a = if <condition> { 1 } else { 2.5 };
There are much worse issues in Rust with regard to HM, Deref and sub-typing come as much more challenging.

If I'm not wrong it does this:
let mut a;
a = 3; //here a is already infered as mut int
a = 2.5; //fails because int != float
For the vec snippet:
let mut v;
v = Vec::new();// now v type is Vec<something>
v.push(3); // v type is Vec<int>
v.push(2.3); // here fails because Vec<int> != Vec<float>
Notice I did not used rust types, but just for having a general idea.

Related

How to properly cast to a negative number in Rust?

In Rust, this code is valid :
let signedInt: i32 = 23*-1;
However, this is not :
let unsignedInt: u16 = 2;
let signedInt: i32 = unsignedInt*-1;
Which makes sense, as Rust tries to interpret -1 as if it were of the same type as unsignedInt.
So casting is needed. However, said casting becomes quite ugly when using more types :
-((unsignedInt*320) as f32)
Doing this is needed, as -(unsignedInt*320) is an invalid expression. But the code above is basically unreadable, and I was wondering what was the best way of making it both valid Rust and human-readable.
Rust requires explicit casts because it's a common source of bugs in other languages like C. Generally you should avoid as, and use from or into instead if possible, otherwise try_from/try_into. The main exception is int<->float casts which are only possible with as at the moment.
Because all numbers in u16 can be represented in i32, your second example can be written as:
let unsignedInt: u16 = 2;
let signedInt = i32::from(unsignedInt) * -1;
Your third example must still be written with an as cast, but you can leave out the variable type:
let unsignedInt: u16 = 2;
let float = -((unsignedInt * 320) as f32);

Find what std::convert::try_into, converts to

I am learning Rust and here is the sample from the book
use std::convert::TryInto;
fn main() {
let a: i32 = 10;
let b: u16 = 100;
let b_ = b.try_into()
.unwrap();
if a < b_ {
println!("Ten is less than one hundred.");
}
}
Author says b.try_into() converts b to i32. But where do we specify this in code? b_ is not given an explicit type, so why would a u16 get converted to i32 and not to a u32 or something else?
Thanks.
Rust has a quite smart compiler and it can look at nearby code to determine what type a variable should get. This is called type inference.
If you explicitly want to set the type that the .try_into() function should convert, you can put the type in the usual position.
let b_: i32 = b.try_into().unwrap();
You also need to remember that you cannot specify any type for conversion because they are manually implemented in the Rust standard library.
My guess is that the compiler looks at the bottom if statement and infers that b_ should be a i32 (So that it can perform the if check with a which it already knows is an i32).
I also tested that reversing the condition i.e if b_ > a causes a compile error. I guess it is because it wants to know what type b_ is before going for a

Why does Rust not infer the right type for a closure passed to filter?

If I have the following code:
let a = 1;
let f = |n| n == &a;
let _: Vec<_> = (1u64..10).filter(f).collect();
Rust complains greatly that collect exists for the relevant Filter struct, but the trait bound FnMut is not satisfied by the closure.
However, if I inline the closure or annotate its argument type, the code works, such as:
let a = 1;
let _: Vec<_> = (1u64..10).filter(|n| n == &a).collect();
or:
let a = 1;
let f = |n: &u64| n == &a;
let _: Vec<_> = (1u64..10).filter(f).collect();
Why is this? The fact that inlining the closure without annotating the type works is truly bizarre. I would think it was because n was having its type inferred as u64 instead of &u64 because ranges have some kind of propensity towards getting consumed but I don't know.
I don't know the exact rule, but from what I've observed working with Rust, creating a closure without requiring a specific type always causes the type to be inferred from the information available in the declaration of the closure only — not how the closure value is used later. That is, Rust's type inference stops being “bidirectional” in this case. For example:
let f = |x| x.clone();
f("hello world");
error[E0282]: type annotations needed
--> src/main.rs:11:14
|
11 | let f = |x| x.clone();
| ^ consider giving this closure parameter a type
|
= note: type must be known at this point
This example fails to compile because the compiler does not use the information from the call to f to decide that x should be &str.
In your case, I'm not sure exactly what the problem is.
I'd assume it was a lifetime problem (the parameter lifetime being inferred as the lifetime of the borrowed a rather than an arbitrary lifetime) but then I would think that |n: &u64| wouldn't help.
Another hypothesis is that the problem is that the == operator is a PartialEq::eq call in disguise, and it isn't inferring what the Self type of that call is (since some type other than u64 could implement PartialEq<&u64>. But then I'd expect to see another “type annotations needed” error, requiring you to narrow down which trait implementation is to be used.
I don't have a precise explanation, but generally you should expect that when you separate a closure's definition from its usage, you are likely to need to add more type annotations.

Why does boxing an array of function pointers with `box` syntax only work with a temporary `let` binding?

I have two dummy functions:
fn foo() -> u32 { 3 }
fn bar() -> u32 { 7 }
And I want to create a boxed slice of function pointer: Box<[fn() -> u32]>. I want to do it with the box syntax (I know that it's not necessary for two elements, but my real use case is different).
I tried several things (Playground):
// Version A
let b = box [foo, bar] as Box<[_]>;
// Version B
let tmp = [foo, bar];
let b = box tmp as Box<[_]>;
// Version C
let b = Box::new([foo, bar]) as Box<[_]>;
Version B and C work fine (C won't work for me though, as it uses Box::new), but Version A errors:
error[E0605]: non-primitive cast: `std::boxed::Box<[fn() -> u32; 2]>` as `std::boxed::Box<[fn() -> u32 {foo}]>`
--> src/main.rs:8:13
|
8 | let b = box [foo, bar] as Box<[_]>;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: an `as` expression can only be used to convert between primitive types. Consider using the `From` trait
Apparently, for some reason, in the version A, the compiler isn't able to coerce the function items to function pointers. Why is that? And why does it work with the additional temporary let binding?
This question was inspired by this other question. I wondered why vec![foo, bar] errors, but [foo, bar] works fine. I looked at the definition of vec![] and found this part which confused me.
This looks like an idiosyncracy of the type inference algorithm to me, and there probably is no deeper reason for this except that the current inference algorithm happens to behave like it does. There is no formal specification of when type inference works and when it doesn't. If you encounter a situation that the type inference engine cannot handle, you need to add type annotations, or rewrite the code in a way that the compiler can infer the types correctly, and that is exactly what you need to do here.
Each function in Rust has its own individual function item type, which cannot be directly named by syntax, but is diplayed as e.g. fn() -> u32 {foo} in error messages. There is a special coercion that converts function item types with identical signatures to the corresponding function pointer type if they occur in different arms of a match, in different branches of an if or in different elements of an array. This coercion is different than other coercions, since it does not only occur in explicitly typed context ("coercion sites"), and this special treatment is the likely cause for this idiosyncracy.
The special coercion is triggered by the binding
let tmp = [foo, bar];
so the type of tmp is completely determined as [fn() -> u32; 2]. However, it appears the special coercion is not triggered early enough in the type inference algorithm when writing
let b = box [foo, bar] as Box<[_]>;
The compiler first assumes the item type of an array is the type of its first element, and apparently when trying to determine what _ denotes here, the compiler still hasn't updated this notion – according to the error message, _ is inferred to mean fn() -> u32 {foo} here. Interestingly, the compiler has already correctly inferred the full type of box [foo, bar] when printing the error message, so the behaviour is indeed rather weird. A full explanation can only be given when looking at the compiler sources in detail.
Rust's type solver engine often can't handle situations it should theoretically be able to solve. Niko Matsakis' chalk engine is meant to provide a general solution for all these cases at some point in the future, but I don't know what the status and the timeline of that project is.
[T; N] to [T] is an unsizing coercion.
CoerceUnsized<Pointer<U>> for Pointer<T> where T: Unsize<U> is
implemented for all pointer types (including smart pointers like Box
and Rc). Unsize is only implemented automatically, and enables the
following transformations:
[T; n] => [T]
These coercions only happen at certain coercion sites:
Coercions occur at a coercion site. Any location that is explicitly
typed will cause a coercion to its type. If inference is necessary,
the coercion will not be performed. Exhaustively, the coercion sites
for an expression e to type U are:
let statements, statics, and consts: let x: U = e
Arguments to functions: takes_a_U(e)
Any expression that will be returned: fn foo() -> U { e }
Struct literals: Foo { some_u: e }
Array literals: let x: [U; 10] = [e, ..]
Tuple literals: let x: (U, ..) = (e, ..)
The last expression in a block: let x: U = { ..; e }
Your case B is a let statement, your case C is a function argument. Your case A is not covered.
Going on pure instinct, I'd point out that box is an unstable magic keyword, so it's possible that it's just half-implemented. Maybe it should have coercions applied to the argument but no one has needed it and thus it was never supported.

How does Rust infer resultant types from From::<>::from()?

In this snippet from Hyper's example, there's a bit of code that I've annotated with types that compiles successfully:
.map_err(|x: std::io::Error| -> hyper::Error {
::std::convert::From::<std::io::Error>::from(x)
})
The type definition of From::from() seems to be fn from(T) -> Self;
How is it that what seems to be a std::io::Error -> Self seems to return a hyper::Error value, when none of the generics and arguments I give it are of the type hyper::Error?
It seems that some sort of implicit type conversion is happening even when I specify all the types explicitly?
Type information in Rust can flow backwards.
The return type of the closure is specified to be hyper::Error. Therefore, the result of the block must be hyper::Error, therefore the result of From::from must be hyper::Error.
If you wanted to, you could use ...
<hyper::Error as ::std::convert::From>::<std::io::Error>::from(x)
... which would be the even more fully qualified version. But with the closure return type there, it's unnecessary.
Type inference has varying degrees.
For example, in C++ each literal is typed, and only a fully formed type can be instantiated, therefore the type of any expression can be computed (and is). Before C++11, this led to the compiler giving an error message: You are attempting to assign a value of type X to a variable of type Y. In C++11, auto was introduced to let the compiler figure out the type of the variable based on the value that was assigned to it.
In Java, this works slightly differently: the type of a variable has to be fully spelled out, but in exchange when constructing a type the generic bits can be left out since they are deduced from the variable the value is assigned to.
Those two examples are interesting because type information does not flow the same way in both of them, which hints that there is no reason for the flow to go one way or another; there are however technical constraints aplenty.
Rust, instead, uses a variation of the Hindley Milner type unification algorithm.
I personally see Hindley Milner as a system of equation:
Give each potential type a name: A, B, C, ...
Create equations tying together those types based on the structure of the program.
For example, imagine the following:
fn print_slice(s: &[u32]) {
println!("{:?}", s);
}
fn main() {
let mut v = Vec::new();
v.push(1);
print_slice(&v);
}
And start from main:
Assign names to types: v => A, 1 => B,
Put forth some equations: A = Vec<C> (from v = Vec::new()), C = B (from v.push(1)), A = &[u32] OR <A as Deref>::Output = &[u32] OR ... (from print_slice(&v),
First round of solving: A = Vec<B>, &[B] = &[u32],
Second round of solving: B = u32, A = Vec<u32>.
There are some difficulties woven into the mix because of subtyping (which the original HM doesn't have), however it's essentially just that.
In this process, there is no consideration for going backward or forwarded, it's just equation solving either way.
This process is known as Type Unification and if it fails you get a hopefully helpful compiler error.

Resources