What is "<[_]>" in Rust? - rust

In the vec! macro implementation there is this rule:
($($x:expr),+ $(,)?) => (
$crate::__rust_force_expr!(<[_]>::into_vec(box [$($x),+]))
);
What exactly is that <[_]> in it?

Breaking down the specific parts of the syntax:
<T>::f is the syntax to explicitly call f associated with a type T. Usually just T::f is enough, but pedantically, :: requires a path, which is why it is used here since [_] is not. The <...> allows any type to be used as a path. See Why do I need angle brackets in <$a> when implementing macro based on type?
[T] is the type denoting a slice of type T.
_ used as a type is a placeholder or "wildcard". It is not a type itself, but serves to indicate that the type should be inferred. See What does it mean to instantiate a Rust generic with an underscore?

Let's go step by step to see how <[_]>::into_vec(box [$($x),+]) produces a Vec:
[$($x),+] expands to an array of input elements: [1, 2, 3]
box ... puts that into a Box. box expressions are nightly-only syntax sugar for Box::new: box 5 is syntax sugar for Box::new(5) (actually it's the other way around: internally Box::new uses box, which is implemented in the compiler)
<[_]>::into_vec(...) calls the to_vec method on a slice containing elements that have an inferred type ([_]). Wrapping the [_] in angled brackets is needed for syntactic reasons to call an method on a slice type. And into_vec is a function that takes a boxed slice and produces a Vec:
pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
// ...
}
This could be done in many simpler ways, but this code was fine-tuned to improve the performance of vec!. For instance, since the size of the Vec can be known in an advance, into_vec doesn't cause the Vec to be reallocated during its construction.

Related

Why does boxing an array of function pointers with `box` syntax only work with a temporary `let` binding?

I have two dummy functions:
fn foo() -> u32 { 3 }
fn bar() -> u32 { 7 }
And I want to create a boxed slice of function pointer: Box<[fn() -> u32]>. I want to do it with the box syntax (I know that it's not necessary for two elements, but my real use case is different).
I tried several things (Playground):
// Version A
let b = box [foo, bar] as Box<[_]>;
// Version B
let tmp = [foo, bar];
let b = box tmp as Box<[_]>;
// Version C
let b = Box::new([foo, bar]) as Box<[_]>;
Version B and C work fine (C won't work for me though, as it uses Box::new), but Version A errors:
error[E0605]: non-primitive cast: `std::boxed::Box<[fn() -> u32; 2]>` as `std::boxed::Box<[fn() -> u32 {foo}]>`
--> src/main.rs:8:13
|
8 | let b = box [foo, bar] as Box<[_]>;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: an `as` expression can only be used to convert between primitive types. Consider using the `From` trait
Apparently, for some reason, in the version A, the compiler isn't able to coerce the function items to function pointers. Why is that? And why does it work with the additional temporary let binding?
This question was inspired by this other question. I wondered why vec![foo, bar] errors, but [foo, bar] works fine. I looked at the definition of vec![] and found this part which confused me.
This looks like an idiosyncracy of the type inference algorithm to me, and there probably is no deeper reason for this except that the current inference algorithm happens to behave like it does. There is no formal specification of when type inference works and when it doesn't. If you encounter a situation that the type inference engine cannot handle, you need to add type annotations, or rewrite the code in a way that the compiler can infer the types correctly, and that is exactly what you need to do here.
Each function in Rust has its own individual function item type, which cannot be directly named by syntax, but is diplayed as e.g. fn() -> u32 {foo} in error messages. There is a special coercion that converts function item types with identical signatures to the corresponding function pointer type if they occur in different arms of a match, in different branches of an if or in different elements of an array. This coercion is different than other coercions, since it does not only occur in explicitly typed context ("coercion sites"), and this special treatment is the likely cause for this idiosyncracy.
The special coercion is triggered by the binding
let tmp = [foo, bar];
so the type of tmp is completely determined as [fn() -> u32; 2]. However, it appears the special coercion is not triggered early enough in the type inference algorithm when writing
let b = box [foo, bar] as Box<[_]>;
The compiler first assumes the item type of an array is the type of its first element, and apparently when trying to determine what _ denotes here, the compiler still hasn't updated this notion – according to the error message, _ is inferred to mean fn() -> u32 {foo} here. Interestingly, the compiler has already correctly inferred the full type of box [foo, bar] when printing the error message, so the behaviour is indeed rather weird. A full explanation can only be given when looking at the compiler sources in detail.
Rust's type solver engine often can't handle situations it should theoretically be able to solve. Niko Matsakis' chalk engine is meant to provide a general solution for all these cases at some point in the future, but I don't know what the status and the timeline of that project is.
[T; N] to [T] is an unsizing coercion.
CoerceUnsized<Pointer<U>> for Pointer<T> where T: Unsize<U> is
implemented for all pointer types (including smart pointers like Box
and Rc). Unsize is only implemented automatically, and enables the
following transformations:
[T; n] => [T]
These coercions only happen at certain coercion sites:
Coercions occur at a coercion site. Any location that is explicitly
typed will cause a coercion to its type. If inference is necessary,
the coercion will not be performed. Exhaustively, the coercion sites
for an expression e to type U are:
let statements, statics, and consts: let x: U = e
Arguments to functions: takes_a_U(e)
Any expression that will be returned: fn foo() -> U { e }
Struct literals: Foo { some_u: e }
Array literals: let x: [U; 10] = [e, ..]
Tuple literals: let x: (U, ..) = (e, ..)
The last expression in a block: let x: U = { ..; e }
Your case B is a let statement, your case C is a function argument. Your case A is not covered.
Going on pure instinct, I'd point out that box is an unstable magic keyword, so it's possible that it's just half-implemented. Maybe it should have coercions applied to the argument but no one has needed it and thus it was never supported.

How do I return an array of objects using `impl Trait` from a function without boxing?

I'm implementing a Sudoku crate, and I'd like to allow the possibility of higher-dimension Sudokus. In 2 dimensions, there are 3 groups that need to be checked for correctness (the prime rule contains only 3 subrules) and position (by definition) requires only 2 coordinates to fully specify, but in higher dimensions this obviously increases.
I wrote the following method signature, but it doesn't work, and I'd like to avoid boxing at all costs due to the environments for which I'm developing.
pub fn groups(position: [u8; DIMENSIONS]) -> [impl Group; DIMENSIONS + 1]
The actual type of each group will be known at compile time (the actual type will be something like [Box, Stack, Band]), but it appears that the compiler doesn't like this even though it'll know the size of each element.
I'd love to be able to use a tuple with size determined at compile time, but that doesn't seem to be easily supported, unless I'm missing something.
The idea is that there are three types of Group: a Stack (column), a Band (row), and a Box. The definition for Group is given below.
pub trait Group {
/// A group is considered valid if it has contains only unique elements.
fn is_valid(&self) -> bool { /* Default implementation elided. */ }
/// Returns an owned copy of the group's constituent elements.
fn elements(&self) -> Vec<Option<Element>>;
}
Let's start with the two-dimensional case. For a given element, we can write the element's index as a double (i.e. 2-tuple) (x, y). The groups associated with this element are then one Box (as always; this is where DIMENSION + 1 comes from), one Stack, and one Band.
In three dimensions, a given element is indexed (by definition) by a triple (i.e. 3-tuple) (x, y, z). The groups associated with this element are then one Box, one Stack, and two Bands.
This trend continues into higher dimensions, and I want to enable the user to use a different dimensionality — thus the compile-time determination of sizes.
To check the puzzle for correctness, the caller simply calls Group::is_valid() on each of puzzle.groups(); they can also iterate over each to render the puzzle as appropriate.
You can't because all impl Trait guarantees is that it's some type that implements Trait. Arrays in Rust are homogeneous, each element must be the same type, but something like [impl Group; DIMENSIONS + 1] would mean that each element could be any type as long as it implemented Group.
It's difficult to say without more code but this could probably be accomplished by parameterizing on the type of Group:
pub fn groups<T: Group>(position: [u8; DIMENSIONS]) -> [T; DIMENSIONS + 1]
Now there will be one instance of the groups() function monomorphized for each kind of T it's called with which implements Group. This has the consequence that the returned array is fixed to that type only, allowing it to be homogeneous. Since the function has no parameter of type T, the type of T probably can't be inferred unless it can be inferred from the call-site (e.g. let x: [SomeGroup; 3] = groups([...])), so you would need to explicitly specify the type with groups::<SomeGroup>([...]).
I don't know if this is exactly what you want, but it communicates the general idea that you need to set the array to some fixed type. That is, [T; N] is fixed to T whereas [impl Trait; N] would imply that each element can be any type that implements Trait, which itself would imply that it's a heterogeneous array, but arrays in Rust are homogeneous.
I can't tell if you also want to parameterize on the constant DIMENSIONS, which requires RFC 2000: const generics which AFAIK isn't implemented yet, even in nightly. You probably don't need that if you're able to infer it from the parameter, as you seem to be doing.
pub fn groups<T: Group, const DIMENSIONS: usize>(position: [u8; DIMENSIONS])
-> [T; DIMENSIONS + 1]

How does Rust infer resultant types from From::<>::from()?

In this snippet from Hyper's example, there's a bit of code that I've annotated with types that compiles successfully:
.map_err(|x: std::io::Error| -> hyper::Error {
::std::convert::From::<std::io::Error>::from(x)
})
The type definition of From::from() seems to be fn from(T) -> Self;
How is it that what seems to be a std::io::Error -> Self seems to return a hyper::Error value, when none of the generics and arguments I give it are of the type hyper::Error?
It seems that some sort of implicit type conversion is happening even when I specify all the types explicitly?
Type information in Rust can flow backwards.
The return type of the closure is specified to be hyper::Error. Therefore, the result of the block must be hyper::Error, therefore the result of From::from must be hyper::Error.
If you wanted to, you could use ...
<hyper::Error as ::std::convert::From>::<std::io::Error>::from(x)
... which would be the even more fully qualified version. But with the closure return type there, it's unnecessary.
Type inference has varying degrees.
For example, in C++ each literal is typed, and only a fully formed type can be instantiated, therefore the type of any expression can be computed (and is). Before C++11, this led to the compiler giving an error message: You are attempting to assign a value of type X to a variable of type Y. In C++11, auto was introduced to let the compiler figure out the type of the variable based on the value that was assigned to it.
In Java, this works slightly differently: the type of a variable has to be fully spelled out, but in exchange when constructing a type the generic bits can be left out since they are deduced from the variable the value is assigned to.
Those two examples are interesting because type information does not flow the same way in both of them, which hints that there is no reason for the flow to go one way or another; there are however technical constraints aplenty.
Rust, instead, uses a variation of the Hindley Milner type unification algorithm.
I personally see Hindley Milner as a system of equation:
Give each potential type a name: A, B, C, ...
Create equations tying together those types based on the structure of the program.
For example, imagine the following:
fn print_slice(s: &[u32]) {
println!("{:?}", s);
}
fn main() {
let mut v = Vec::new();
v.push(1);
print_slice(&v);
}
And start from main:
Assign names to types: v => A, 1 => B,
Put forth some equations: A = Vec<C> (from v = Vec::new()), C = B (from v.push(1)), A = &[u32] OR <A as Deref>::Output = &[u32] OR ... (from print_slice(&v),
First round of solving: A = Vec<B>, &[B] = &[u32],
Second round of solving: B = u32, A = Vec<u32>.
There are some difficulties woven into the mix because of subtyping (which the original HM doesn't have), however it's essentially just that.
In this process, there is no consideration for going backward or forwarded, it's just equation solving either way.
This process is known as Type Unification and if it fails you get a hopefully helpful compiler error.

Why does the argument for the find closure need two ampersands?

I have been playing with Rust by porting my Score4 AI engine to it - basing the work on my functional-style implementation in OCaml. I specifically wanted to see how Rust fares with functional-style code.
The end result: It works, and it's very fast - much faster than OCaml. It almost touches the speed of imperative-style C/C++ - which is really cool.
There's a thing that troubles me, though — why do I need two ampersands in the last line of this code?
let moves_and_scores: Vec<_> = moves_and_boards
.iter()
.map(|&(column,board)| (column, score_board(&board)))
.collect();
let target_score = if maximize_or_minimize {
ORANGE_WINS
} else {
YELLOW_WINS
};
if let Some(killer_move) = moves_and_scores.iter()
.find(|& &(_,score)| score==target_score) {
...
I added them is because the compiler errors "guided" me to it; but I am trying to understand why... I used the trick mentioned elsewhere in Stack Overflow to "ask" the compiler to tell me what type something is:
let moves_and_scores: Vec<_> = moves_and_boards
.iter()
.map(|&(column,board)| (column, score_board(&board)))
.collect();
let () = moves_and_scores;
...which caused this error:
src/main.rs:108:9: 108:11 error: mismatched types:
expected `collections::vec::Vec<(u32, i32)>`,
found `()`
(expected struct `collections::vec::Vec`,
found ()) [E0308]
src/main.rs:108 let () = moves_and_scores;
...as I expected, moves_and_scores is a vector of tuples: Vec<(u32, i32)>. But then, in the immediate next line, iter() and find() force me to use the hideous double ampersands in the closure parameter:
if let Some(killer_move) = moves_and_scores.iter()
.find(|& &(_,score)| score==target_score) {
Why does the find closure need two ampersands? I could see why it may need one (pass the tuple by reference to save time/space) but why two? Is it because of the iter? That is, is the iter creating references, and then find expects a reference on each input, so a reference on a reference?
If this is so, isn't this, arguably, a rather ugly design flaw in Rust?
In fact, I would expect find and map and all the rest of the functional primitives to be parts of the collections themselves. Forcing me to iter() to do any kind of functional-style work seems burdensome, and even more so if it forces this kind of "double ampersands" in every possible functional chain.
I am hoping I am missing something obvious - any help/clarification most welcome.
This here
moves_and_scores.iter()
gives you an iterator over borrowed vector elements. If you follow the API doc what type this is, you'll notice that it's just the iterator for a borrowed slice and this implements Iterator with Item=&T where T is (u32, i32) in your case.
Then, you use find which takes a predicate which takes a &Item as parameter. Sice Item already is a reference in your case, the predicate has to take a &&(u32, i32).
pub trait Iterator {
...
fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
where P: FnMut(&Self::Item) -> bool {...}
... ^
It was probably defined like this because it's only supposed to inspect the item and return a bool. This does not require the item being passed by value.
If you want an iterator over (u32, i32) you could write
moves_and_scores.iter().cloned()
cloned() converts the iterator from one with an Item type &T to one with an Item type T if T is Clone. Another way to do it would be to use into_iter() instead of iter().
moves_and_scores.into_iter()
The difference between the two is that the first option clones the borrowed elements while the 2nd one consumes the vector and moves the elements out of it.
By writing the lambda like this
|&&(_, score)| score == target_score
you destructure the "double reference" and create a local copy of the i32. This is allowed since i32 is a simple type that is Copy.
Instead of destructuring the parameter of your predicate you could also write
|move_and_score| move_and_score.1 == target_score
because the dot operator automatically dereferences as many times as needed.

Why does Rust put a :: before the parameters in generics sometimes?

When declaring a variable of type vector or a hash map in Rust, we do:
let v: Vec<int>
let m: HashMap<int, int>
To instantiate, we need to call new(). However, we do so thusly:
Vec::<int>::new()
^^
HashMap::<int, int>::new()
^^
Note the sudden appearance of ::. Coming from C++, these are odd. Why do these occur? Does having a leading :: make IDENTIFIER :: < IDENTFIER … easier to parse than IDENTIFIER < IDENTIFIER, which might be construed as a less-than operation? (And thus, this is simply a thing to make the language easier to parse? But if so, why not also do it during type specifications, so as to have the two mirror each other?)
(As Shepmaster notes, often Vec::new() is enough; the type can often be inferred.)
When parsing an expression, it would be ambiguous whether a < was the start of a type parameter list or a less-than operator. Rust always assumes the latter and requires ::< for type parameter lists.
When parsing a type, it's always unambiguously a type parameter list, so ::< is never necessary.
In C++, this ambiguity is kept in the parser, which makes parsing C++ much more difficult than parsing Rust. See here for an explanation why this matters.
Anyway, most of the time in Rust, the types can be inferred and you can just write Vec::new(). Since ::< is usually not needed and is fairly ugly, it makes sense to keep only < in types, rather than making the two syntaxes match up.
The two different syntaxes don't even specify the same type parameters necessarily.
In this example:
let mut map: HashMap<K, V>;
K and V fill the type parameters of the struct HashMap declaration, the type itself.
In this expression:
HashMap::<K, V>::new()
K and V fill the type parameters of the impl block where the method new is defined! The impl block need not have the same, as many, or the same default, type parameters as the type itself.
In this particular case, the struct has the parameters HashMap<K, V, S = RandomState> (3 parameters, 1 defaulted). And the impl block containing ::new() has parameters impl<K, V> (2 parameters, not implemented for arbitrary states).

Resources