Does Rust support using an infix operator as a function? - rust

I am writing a function that does piecewise multiplication of two arrays.
xs.iter()
.zip(ys).map(|(x, y)| x * y)
.sum()
In some other languages, I can pass (*) as a function to map. Does Rust have this feature?

Nnnyes. Sorta kinda not really.
You can't write an operator as a name. But most operators are backed by traits, and you can write the name of those, so a * b is effectively Mul::mul(a, b), and you can pass Mul::mul as a function pointer.
But that doesn't help in this case because Iterator::map is expecting a FnMut((A, B)) -> C, and the binary operators all implement FnMut(A, B) -> C.
Now, you could write an adapter for this, but you'd need one for every combination of arity and mutability. And you'd have to eat a heap allocation and indirection or require a nightly compiler.
Or, you could write your own version of Iterator::map on an extension trait that accepts higher arity functions for iterators of tuples... again, one for each arity...
Honestly, it's simpler to just use a closure.

Rust does not have any syntax to pass infix operators, mostly because it is redundant anyway.
In Rust, each operator maps to a trait: * maps to the std::ops::Mul trait, for example.
Therefore, using * directly should be using std::ops::Mul::mul:
xs.iter().zip(ys).map(Mul::mul).sum();
However there are several difficulties:
Generally, iterators yield references while Mul is implemented for plain values,
Mul::mul expects two arguments, xs.zip(ys) yields a single element (a tuple of two elements).
So, you need to go from reference to value and then "unpack" the tuple and... it ends up being shorter to use a closure.

No. The * operator is implemented in std::Ops::Mul, but it can't be used directly:
use std::ops::Mul::mul;
fn main() {
let v1 = vec![1, 2, 3];
let v2 = vec![1, 2, 3];
println!("{:?}", v1.iter().zip(v2).map(|(x, y)| mul).collect());
}
Will result in the following error:
error[E0253]: `mul` is not directly importable
--> <anon>:1:5
|
1 | use std::ops::Mul::mul;
| ^^^^^^^^^^^^^^^^^^ cannot be imported directly
You could introduce your own function using the * operator, but there wouldn't be much added value :).

Related

What is "<[_]>" in Rust?

In the vec! macro implementation there is this rule:
($($x:expr),+ $(,)?) => (
$crate::__rust_force_expr!(<[_]>::into_vec(box [$($x),+]))
);
What exactly is that <[_]> in it?
Breaking down the specific parts of the syntax:
<T>::f is the syntax to explicitly call f associated with a type T. Usually just T::f is enough, but pedantically, :: requires a path, which is why it is used here since [_] is not. The <...> allows any type to be used as a path. See Why do I need angle brackets in <$a> when implementing macro based on type?
[T] is the type denoting a slice of type T.
_ used as a type is a placeholder or "wildcard". It is not a type itself, but serves to indicate that the type should be inferred. See What does it mean to instantiate a Rust generic with an underscore?
Let's go step by step to see how <[_]>::into_vec(box [$($x),+]) produces a Vec:
[$($x),+] expands to an array of input elements: [1, 2, 3]
box ... puts that into a Box. box expressions are nightly-only syntax sugar for Box::new: box 5 is syntax sugar for Box::new(5) (actually it's the other way around: internally Box::new uses box, which is implemented in the compiler)
<[_]>::into_vec(...) calls the to_vec method on a slice containing elements that have an inferred type ([_]). Wrapping the [_] in angled brackets is needed for syntactic reasons to call an method on a slice type. And into_vec is a function that takes a boxed slice and produces a Vec:
pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
// ...
}
This could be done in many simpler ways, but this code was fine-tuned to improve the performance of vec!. For instance, since the size of the Vec can be known in an advance, into_vec doesn't cause the Vec to be reallocated during its construction.

Why is "&&" being used in closure arguments?

I have two questions regarding this example:
let a = [1, 2, 3];
assert_eq!(a.iter().find(|&&x| x == 2), Some(&2));
assert_eq!(a.iter().find(|&&x| x == 5), None);
Why is &&x used in the closure arguments rather than just x? I understand that & is passing a reference to an object, but what does using it twice mean?
I don't understand what the documentation says:
Because find() takes a reference, and many iterators iterate over references, this leads to a possibly confusing situation where the argument is a double reference. You can see this effect in the examples below, with &&x.
Why is Some(&2) used rather than Some(2)?
a is of type [i32; 3]; an array of three i32s.
[i32; 3] does not implement an iter method, but it does dereference into &[i32].
&[i32] implements an iter method which produces an iterator.
This iterator implements Iterator<Item=&i32>.
It uses &i32 rather than i32 because the iterator has to work on arrays of any type, and not all types can be safely copied. So rather than restrict itself to copyable types, it iterates over the elements by reference rather than by value.
find is a method defined for all Iterators. It lets you look at each element and return the one that matches the predicate. Problem: if the iterator produces non-copyable values, then passing the value into the predicate would make it impossible to return it from find. The value cannot be re-generated, since iterators are not (in general) rewindable or restartable. Thus, find has to pass the element to the predicate by-reference rather than by-value.
So, if you have an iterator that implements Iterator<Item=T>, then Iterator::find requires a predicate that takes a &T and returns a bool. [i32]::iter produces an iterator that implements Iterator<Item=&i32>. Thus, Iterator::find called on an array iterator requires a predicate that takes a &&i32. That is, it passes the predicate a pointer to a pointer to the element in question.
So if you were to write a.iter().find(|x| ..), the type of x would be &&i32. This cannot be directly compared to the literal i32 value 2. There are several ways of fixing this. One is to explicitly dereference x: a.iter().find(|x| **x == 2). The other is to use pattern matching to destructure the double reference: a.iter().find(|&&x| x == 2). These two approaches are, in this case, doing exactly the same thing. [1]
As for why Some(&2) is used: because a.iter() is an iterator over &i32, not an iterator of i32. If you look at the documentation for Iterator::find, you'll see that for Iterator<Item=T>, it returns an Option<T>. Hence, in this case, it returns an Option<&i32>, so that's what you need to compare it against.
[1]: The differences only matter when you're talking about non-Copy types. For example, |&&x| .. wouldn't work on a &&String, because you'd have to be able to move the String out from behind the reference, and that's not allowed. However, |x| **x .. would work, because that is just reaching inside the reference without moving anything.
1) I thought the book explanation was good, maybe my example with .cloned() below will be useful. But since .iter() iterates over references, you have to specify reference additionally because find expects a reference.
2) .iter() is iterating over references; therefore, you find a reference.
You could use .cloned() to see what it would look like if you didn't have to do deal with references:
assert_eq!(a.iter().cloned().find(|&x| x == 2), Some(2));

How to use .collect() on each iterator returned by .unzip()?

I have the following code, in which fac return (MyType, OtherType):
let l = (-1..13).map(|x| {
fac(x).0
}).collect::<Vec<MyType>>();
It works, but I'm throwing away the OtherType values. So I decided to use .unzip, like this:
let (v, r) = (-1..13).map(|x| {
fac(x)
}).unzip();
let l = v.collect::<Vec<MyType>>();
let q = r.collect::<Vec<OtherType>>();
But type inference fails with:
error: the type of this value must be known in this context
let l = v.collect::<Vec<Literal>>();
^~~~~~~~~~~~~~~~~~~~~~~~~~~
let q = r.collect::<Vec<OtherType>>();
^~~~~~~~~~~~~~~~~~~~~~~~~~~
The thing is: I don't know or care what is the concrete type of the iterators (and I would suppose the compiler could infer them, as shown in the first snippet). How to satisfy the compiler in this case?
Also, I would prefer to restructure the code - I don't like to separately call .collect() on both v and r. Ideally I would continue the method chain after .unzip(), returning two Vecs in that expression.
.unzip() doesn't return iterators — it acts like two parallel collect! You can in fact collect the two pieces to different kinds of collections, but let's use vectors for both in this example:
// Give a type hint to determine the collection type
let (v, r): (Vec<MyType>, Vec<OtherType>) = (-1..13).map(|x| {
fac(x)
}).unzip();
It is done this way to be as simple and transparent as possible. Returning two iterators instead would need them to share a common state, a complexity that rust's iterator library prefers to avoid.

Value and references when providing a closure to Iterator::find

I have still quite a long way to go in learning Rust, but I find the way values and references are used to be inconsistent. This may be born from my own ignorance of the language.
For example, this works:
let x = (1..100).find(|a| a % 2 == 0);
But let x = (1..100).find(|a| a > 50); does not. I am not sure - why though?
Using let x = (1..100).find(|&a| a > 50); fixes the error, but then I thought using &a is like asking for reference of element from the range and hence following should work, but it does not:
let x = (1..100).find(|&a| *a > 50);
Again no idea why!
but then I thought using &a is like asking for reference of element from the range
This is the wrong part of your reasoning. Using & in pattern does exactly the opposite - it implicitly dereferences the matched value:
let &a = &10;
// a is 10, not &10 or &&10
As you probably already know, find() accepts a closure which satisfies FnMut(&T) -> bool, that is, this closure accepts a reference to each element of the iterator, so if you write (1..100).find(|a| ...), a will be of type &i32.
let x = (1..100).find(|a| a % 2 == 0) works because arithmetic operators are overloaded to work on references, so you can apply % to a reference and it still would be able to compile.
Comparison operators are not overloaded to handle references, and so you need to get an i32 from &i32. This could be done in two ways, first, like you already did:
let x = (1..100).find(|&a| a > 50)
Here we use & patterns to implicitly dereference the function argument. It is equivalent to this one:
let x = (1..100).find(|a| { let a = *a; a > 50 })
Another way would be to dereference the argument explicitly:
let x = (1..100).find(|a| *a > 50)
I thought using &a is like asking for reference of element from the range
Sometimes & is used as an operator, and sometimes it is used as a pattern match. For the closure parameter (|&a|), it is being used as a pattern match. This means that the variable a will be automatically dereferenced when it is used. It is also equivalent to do
let x = (1..100).find(|a| *a > 50);
Non-trivial patterns usually destructure something, i.e., break something into its components. This usually mirrors some construction syntax, so it looks very similar but is actually the inverse. This dualism applies to records, to tuples, to boxes (once those are properly implemented), and also to references:
The expression &x creates a reference to whatever x evaluates to. Here, the & turns a value of type T into one of type &T.
The pattern &a, on the other hand, eliminates the reference, so a is bound to what is behind the reference (note that a could also be another, more complicated pattern). Here, the & goes from a &T value to a T value.
The closures in your examples are all of of type &i32 -> bool1. So they accept a reference to an integer, and you can either work with that reference (which you do in the first example, which works because arithmetic operators are overloaded for references too) or you can use the pattern &a. In the latter case, a is a i32 (compare the general explanation above, substitute i32 for T), so of course you can't dereference it further.
1 This is not actually a real type, but it's close enough for our purposes.

Why does Rust put a :: before the parameters in generics sometimes?

When declaring a variable of type vector or a hash map in Rust, we do:
let v: Vec<int>
let m: HashMap<int, int>
To instantiate, we need to call new(). However, we do so thusly:
Vec::<int>::new()
^^
HashMap::<int, int>::new()
^^
Note the sudden appearance of ::. Coming from C++, these are odd. Why do these occur? Does having a leading :: make IDENTIFIER :: < IDENTFIER … easier to parse than IDENTIFIER < IDENTIFIER, which might be construed as a less-than operation? (And thus, this is simply a thing to make the language easier to parse? But if so, why not also do it during type specifications, so as to have the two mirror each other?)
(As Shepmaster notes, often Vec::new() is enough; the type can often be inferred.)
When parsing an expression, it would be ambiguous whether a < was the start of a type parameter list or a less-than operator. Rust always assumes the latter and requires ::< for type parameter lists.
When parsing a type, it's always unambiguously a type parameter list, so ::< is never necessary.
In C++, this ambiguity is kept in the parser, which makes parsing C++ much more difficult than parsing Rust. See here for an explanation why this matters.
Anyway, most of the time in Rust, the types can be inferred and you can just write Vec::new(). Since ::< is usually not needed and is fairly ugly, it makes sense to keep only < in types, rather than making the two syntaxes match up.
The two different syntaxes don't even specify the same type parameters necessarily.
In this example:
let mut map: HashMap<K, V>;
K and V fill the type parameters of the struct HashMap declaration, the type itself.
In this expression:
HashMap::<K, V>::new()
K and V fill the type parameters of the impl block where the method new is defined! The impl block need not have the same, as many, or the same default, type parameters as the type itself.
In this particular case, the struct has the parameters HashMap<K, V, S = RandomState> (3 parameters, 1 defaulted). And the impl block containing ::new() has parameters impl<K, V> (2 parameters, not implemented for arbitrary states).

Resources