Why does Rust put a :: before the parameters in generics sometimes? - rust

When declaring a variable of type vector or a hash map in Rust, we do:
let v: Vec<int>
let m: HashMap<int, int>
To instantiate, we need to call new(). However, we do so thusly:
Vec::<int>::new()
^^
HashMap::<int, int>::new()
^^
Note the sudden appearance of ::. Coming from C++, these are odd. Why do these occur? Does having a leading :: make IDENTIFIER :: < IDENTFIER … easier to parse than IDENTIFIER < IDENTIFIER, which might be construed as a less-than operation? (And thus, this is simply a thing to make the language easier to parse? But if so, why not also do it during type specifications, so as to have the two mirror each other?)
(As Shepmaster notes, often Vec::new() is enough; the type can often be inferred.)

When parsing an expression, it would be ambiguous whether a < was the start of a type parameter list or a less-than operator. Rust always assumes the latter and requires ::< for type parameter lists.
When parsing a type, it's always unambiguously a type parameter list, so ::< is never necessary.
In C++, this ambiguity is kept in the parser, which makes parsing C++ much more difficult than parsing Rust. See here for an explanation why this matters.
Anyway, most of the time in Rust, the types can be inferred and you can just write Vec::new(). Since ::< is usually not needed and is fairly ugly, it makes sense to keep only < in types, rather than making the two syntaxes match up.

The two different syntaxes don't even specify the same type parameters necessarily.
In this example:
let mut map: HashMap<K, V>;
K and V fill the type parameters of the struct HashMap declaration, the type itself.
In this expression:
HashMap::<K, V>::new()
K and V fill the type parameters of the impl block where the method new is defined! The impl block need not have the same, as many, or the same default, type parameters as the type itself.
In this particular case, the struct has the parameters HashMap<K, V, S = RandomState> (3 parameters, 1 defaulted). And the impl block containing ::new() has parameters impl<K, V> (2 parameters, not implemented for arbitrary states).

Related

What is "<[_]>" in Rust?

In the vec! macro implementation there is this rule:
($($x:expr),+ $(,)?) => (
$crate::__rust_force_expr!(<[_]>::into_vec(box [$($x),+]))
);
What exactly is that <[_]> in it?
Breaking down the specific parts of the syntax:
<T>::f is the syntax to explicitly call f associated with a type T. Usually just T::f is enough, but pedantically, :: requires a path, which is why it is used here since [_] is not. The <...> allows any type to be used as a path. See Why do I need angle brackets in <$a> when implementing macro based on type?
[T] is the type denoting a slice of type T.
_ used as a type is a placeholder or "wildcard". It is not a type itself, but serves to indicate that the type should be inferred. See What does it mean to instantiate a Rust generic with an underscore?
Let's go step by step to see how <[_]>::into_vec(box [$($x),+]) produces a Vec:
[$($x),+] expands to an array of input elements: [1, 2, 3]
box ... puts that into a Box. box expressions are nightly-only syntax sugar for Box::new: box 5 is syntax sugar for Box::new(5) (actually it's the other way around: internally Box::new uses box, which is implemented in the compiler)
<[_]>::into_vec(...) calls the to_vec method on a slice containing elements that have an inferred type ([_]). Wrapping the [_] in angled brackets is needed for syntactic reasons to call an method on a slice type. And into_vec is a function that takes a boxed slice and produces a Vec:
pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
// ...
}
This could be done in many simpler ways, but this code was fine-tuned to improve the performance of vec!. For instance, since the size of the Vec can be known in an advance, into_vec doesn't cause the Vec to be reallocated during its construction.

Calculate object A, then return object B that references A in Rust

In my code I often want to calculate a new value A, and then return some view of that value B, because B is a type that's more convenient to work with. The simplest case is where A is a vector and B is a slice that I would like to return. Let's say I want to write a function that returns a set of indices. Ideally this would return a slice directly because then I can use it immediately to index a string.
If I return a vector instead of a slice, I have to use to_slice:
fn all_except(except: usize, max:usize) -> Vec<usize> {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
string indices are ranges of `usize`
the type `str` cannot be indexed by `Vec<usize>`
help: the trait `SliceIndex<str>` is not implemented for `Vec<usize>`
I can't return a slice directly:
fn all_except(except: usize, max:usize) -> &[usize] {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
^ expected named lifetime parameter
missing lifetime specifier
help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
help: consider using the `'static` lifetime
I can't even return the underlying vector and a slice of it, for the same reason
pub fn except(index: usize, max: usize) -> (&[usize], Vec<usize>) {
let v = (0..index).chain((index + 1)..max).collect();
(v, v.as_slice)
}
"abcdefg"[all_except(1, 7)[1]
Now it may be possible to hack this particular example using deref coercion (I'm not sure), but I have encountered this problem with more complex types. For example, I have a function that loads an ndarray::Array2<T> from CSV file, then want to split it into two parts using array.split_at(), but this returns two ArrayView2<T> which reference the original Array2<T>, so I encounter the same issue. In general I'm wondering if there's a solution to this problem in general. Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
Your code doesn't seem to make any sense, you can't index a string using a slice. If you could the first snippet would have worked with just an as_slice in the caller or something, vecs trivially coerce to slices. That's exactly what the compiler error is telling you: the compiler is looking for a SliceIndex and a Vec (or slice) is definitely not that.
That aside,
Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
There are packages like owning_ref which can bundle owner and reference to avoid extra allocations. It tends to be somewhat fiddly.
I don't think there's any other general solution, because Rust reasons at the function level, the type checker has no notion of "tell the compiler to move A into the parent scope". So you need a construct which works around borrow checker.

Why does boxing an array of function pointers with `box` syntax only work with a temporary `let` binding?

I have two dummy functions:
fn foo() -> u32 { 3 }
fn bar() -> u32 { 7 }
And I want to create a boxed slice of function pointer: Box<[fn() -> u32]>. I want to do it with the box syntax (I know that it's not necessary for two elements, but my real use case is different).
I tried several things (Playground):
// Version A
let b = box [foo, bar] as Box<[_]>;
// Version B
let tmp = [foo, bar];
let b = box tmp as Box<[_]>;
// Version C
let b = Box::new([foo, bar]) as Box<[_]>;
Version B and C work fine (C won't work for me though, as it uses Box::new), but Version A errors:
error[E0605]: non-primitive cast: `std::boxed::Box<[fn() -> u32; 2]>` as `std::boxed::Box<[fn() -> u32 {foo}]>`
--> src/main.rs:8:13
|
8 | let b = box [foo, bar] as Box<[_]>;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: an `as` expression can only be used to convert between primitive types. Consider using the `From` trait
Apparently, for some reason, in the version A, the compiler isn't able to coerce the function items to function pointers. Why is that? And why does it work with the additional temporary let binding?
This question was inspired by this other question. I wondered why vec![foo, bar] errors, but [foo, bar] works fine. I looked at the definition of vec![] and found this part which confused me.
This looks like an idiosyncracy of the type inference algorithm to me, and there probably is no deeper reason for this except that the current inference algorithm happens to behave like it does. There is no formal specification of when type inference works and when it doesn't. If you encounter a situation that the type inference engine cannot handle, you need to add type annotations, or rewrite the code in a way that the compiler can infer the types correctly, and that is exactly what you need to do here.
Each function in Rust has its own individual function item type, which cannot be directly named by syntax, but is diplayed as e.g. fn() -> u32 {foo} in error messages. There is a special coercion that converts function item types with identical signatures to the corresponding function pointer type if they occur in different arms of a match, in different branches of an if or in different elements of an array. This coercion is different than other coercions, since it does not only occur in explicitly typed context ("coercion sites"), and this special treatment is the likely cause for this idiosyncracy.
The special coercion is triggered by the binding
let tmp = [foo, bar];
so the type of tmp is completely determined as [fn() -> u32; 2]. However, it appears the special coercion is not triggered early enough in the type inference algorithm when writing
let b = box [foo, bar] as Box<[_]>;
The compiler first assumes the item type of an array is the type of its first element, and apparently when trying to determine what _ denotes here, the compiler still hasn't updated this notion – according to the error message, _ is inferred to mean fn() -> u32 {foo} here. Interestingly, the compiler has already correctly inferred the full type of box [foo, bar] when printing the error message, so the behaviour is indeed rather weird. A full explanation can only be given when looking at the compiler sources in detail.
Rust's type solver engine often can't handle situations it should theoretically be able to solve. Niko Matsakis' chalk engine is meant to provide a general solution for all these cases at some point in the future, but I don't know what the status and the timeline of that project is.
[T; N] to [T] is an unsizing coercion.
CoerceUnsized<Pointer<U>> for Pointer<T> where T: Unsize<U> is
implemented for all pointer types (including smart pointers like Box
and Rc). Unsize is only implemented automatically, and enables the
following transformations:
[T; n] => [T]
These coercions only happen at certain coercion sites:
Coercions occur at a coercion site. Any location that is explicitly
typed will cause a coercion to its type. If inference is necessary,
the coercion will not be performed. Exhaustively, the coercion sites
for an expression e to type U are:
let statements, statics, and consts: let x: U = e
Arguments to functions: takes_a_U(e)
Any expression that will be returned: fn foo() -> U { e }
Struct literals: Foo { some_u: e }
Array literals: let x: [U; 10] = [e, ..]
Tuple literals: let x: (U, ..) = (e, ..)
The last expression in a block: let x: U = { ..; e }
Your case B is a let statement, your case C is a function argument. Your case A is not covered.
Going on pure instinct, I'd point out that box is an unstable magic keyword, so it's possible that it's just half-implemented. Maybe it should have coercions applied to the argument but no one has needed it and thus it was never supported.

How does Rust infer resultant types from From::<>::from()?

In this snippet from Hyper's example, there's a bit of code that I've annotated with types that compiles successfully:
.map_err(|x: std::io::Error| -> hyper::Error {
::std::convert::From::<std::io::Error>::from(x)
})
The type definition of From::from() seems to be fn from(T) -> Self;
How is it that what seems to be a std::io::Error -> Self seems to return a hyper::Error value, when none of the generics and arguments I give it are of the type hyper::Error?
It seems that some sort of implicit type conversion is happening even when I specify all the types explicitly?
Type information in Rust can flow backwards.
The return type of the closure is specified to be hyper::Error. Therefore, the result of the block must be hyper::Error, therefore the result of From::from must be hyper::Error.
If you wanted to, you could use ...
<hyper::Error as ::std::convert::From>::<std::io::Error>::from(x)
... which would be the even more fully qualified version. But with the closure return type there, it's unnecessary.
Type inference has varying degrees.
For example, in C++ each literal is typed, and only a fully formed type can be instantiated, therefore the type of any expression can be computed (and is). Before C++11, this led to the compiler giving an error message: You are attempting to assign a value of type X to a variable of type Y. In C++11, auto was introduced to let the compiler figure out the type of the variable based on the value that was assigned to it.
In Java, this works slightly differently: the type of a variable has to be fully spelled out, but in exchange when constructing a type the generic bits can be left out since they are deduced from the variable the value is assigned to.
Those two examples are interesting because type information does not flow the same way in both of them, which hints that there is no reason for the flow to go one way or another; there are however technical constraints aplenty.
Rust, instead, uses a variation of the Hindley Milner type unification algorithm.
I personally see Hindley Milner as a system of equation:
Give each potential type a name: A, B, C, ...
Create equations tying together those types based on the structure of the program.
For example, imagine the following:
fn print_slice(s: &[u32]) {
println!("{:?}", s);
}
fn main() {
let mut v = Vec::new();
v.push(1);
print_slice(&v);
}
And start from main:
Assign names to types: v => A, 1 => B,
Put forth some equations: A = Vec<C> (from v = Vec::new()), C = B (from v.push(1)), A = &[u32] OR <A as Deref>::Output = &[u32] OR ... (from print_slice(&v),
First round of solving: A = Vec<B>, &[B] = &[u32],
Second round of solving: B = u32, A = Vec<u32>.
There are some difficulties woven into the mix because of subtyping (which the original HM doesn't have), however it's essentially just that.
In this process, there is no consideration for going backward or forwarded, it's just equation solving either way.
This process is known as Type Unification and if it fails you get a hopefully helpful compiler error.

How to constrain the element type of an iterator?

I’m converting some older Rust code to work on 1.0.0. I need to convert a function that takes an iterator over characters, which used to be written like this:
fn f<I: Iterator<char>>(char_iter: I)
Now that Iterator doesn’t take a parameter, the constraint on I can only be I: Iterator. The element type is then I::Item. Is there a way to express the constraint that I::Item = char? (Or should I be doing this another way entirely?)
fn f<I: Iterator<Item = char>>(char_iter: I)
Associated types were recently added to the language, and many library types were updated to take advantage of them. For example, Iterator defines one associated type, named Item. You can add a constraint on the associated type by writing the name of the associated type, an equals sign, and the type you need.
Okay, I was able to figure this out from reading some RFC discussions, and the answer is that you can instantiate associated types in the trait (like signature fibration in ML):
fn f<I: Iterator<Item = char>>(char_iter: I)
Soon it should be possible to use equality constraints in where clauses, but this doesn’t work in 1.0.0-alpha:
fn f<I: Iterator>(char_iter: I) where I::Item == char
You can write I: Iterator<Item = char>. At some point in the future, a where clause like where I::Item == char may work too, but not now.

Resources