Array cannot be indexed by RangeFull? - rust

Consider the following example:
use std::ops::Index;
use std::ops::RangeFull;
fn f<T: Index<RangeFull>>(x: T) {}
fn main() {
let x: [i32; 4] = [0, 1, 2, 3];
f(x);
}
Upon calling f(x), I get an error:
error[E0277]: the type `[i32; 4]` cannot be indexed by `std::ops::RangeFull`
--> src/main.rs:8:5
|
8 | f(x);
| ^ `[i32; 4]` cannot be indexed by `std::ops::RangeFull`
|
= help: the trait `std::ops::Index<std::ops::RangeFull>` is not implemented for `[i32; 4]`
note: required by `f`
--> src/main.rs:4:1
|
4 | fn f<T: Index<RangeFull>>(x: T) {}
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I am confused. I can obviously write, for example, let y = x[..];. Does this not mean indexing x with RangeFull? Are arrays somehow special in this regard?

As you can see in the documentation for the primitive array type, Index<…> is not directly implemented for arrays. This is partly because it would currently be impossible to provide blanket implementations for all array sizes, but mainly because it's not necessary; the implementation for slices is sufficient for most purposes.
The expression x[..] is translated to *std::ops::Index::index(&x, ..) by the compiler, which in turn is evaluated according to the usual method call semantics. Since there is no implementation of Index<RangeFull> for arrays, the compiler repeatedly dereferences &x and performs an unsized coercion at the end, eventually finding the implementation of Index<RangeFull> for [i32].
The process of calling a generic function, like f() in your example, is different from method call semantics. The compiler first infers what T is based on the argument you are passing; in this case T is inferred to be [i32; 4]. In the next step, the compiler verifies whether T satisfies the trait bounds, and since it doesn't, you get an error message.
If we want to make your code work, we need to make sure to pass a slice to f(). Since a slice is unsized, we need to pass it by reference, so we need to define f() like this:
fn f<T: ?Sized + Index<RangeFull>>(_: &T) {}
The ?Sized is necessary since type parameters receive an implicit Sized bound. When calling f(), we need to make sure T is actually inferred as [i32] rather than [i32; 4]. To this end, we can either explicitly specify T
f::<[_]>(&x);
or explicitly perform the unsized conversion before passing the argument, so the compiler infers the desired type:
f(&x as &[_]);
f(&x[..])

Related

Why does auto borrowing not occur in Rust if I implement `TryFrom` for a reference type?

Let's say I want to implement a conversion on a reference. In this case, it's a conversion from &f64 -> Foo.
use std::convert::{TryFrom, TryInto};
struct Foo {
a: f64
}
impl TryFrom<&f64> for Foo {
type Error = String;
fn try_from(value: &f64) -> Result<Foo, String> {
Ok(Foo {
a: *value
})
}
}
fn main(){
let foo: Foo = 5.0.try_into().unwrap();
let bar: Foo = (&5.0).try_into().unwrap();
}
(Yes of course this is a pointless and stupid example, but it helps simplify the problem)
Now, the second line in the main method, with manual borrowing, succeeds.
However, the first line in the main method, without the manual borrowing, fails with this error:
error[E0277]: the trait bound `Foo: From<{float}>` is not satisfied
--> src/main.rs:18:24
|
18 | let foo: Foo = 5.0.try_into().unwrap();
| ^^^^^^^^ the trait `From<{float}>` is not implemented for `Foo`
|
= note: required because of the requirements on the impl of `Into<Foo>` for `{float}`
note: required because of the requirements on the impl of `TryFrom<{float}>` for `Foo`
--> src/main.rs:7:6
|
7 | impl TryFrom<&f64> for Foo {
| ^^^^^^^^^^^^^ ^^^
= note: required because of the requirements on the impl of `TryInto<Foo>` for `{float}`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to previous error
Playground
Why is automatic borrowing not working here?
Just as the error message suggests, the problem is the trait bound Foo: From<{float}> is not satisfied. When matching traits, Rust will not perform any coercion but probing the suitable method. This is actually documented in The Rustonomicon, reads
Note that we do not perform coercions when matching traits (except for receivers, see the next page). If there is an impl for some type U and T coerces to U, that does not constitute an implementation for T.
and the next page says
Suppose we have a function foo that has a receiver (a self, &self or &mut self parameter). If we call value.foo(), the compiler needs to determine what type Self is before it can call the correct implementation of the function. ... If it can't call this function (for example, if the function has the wrong type or a trait isn't implemented for Self), then the compiler tries to add in an automatic reference. This means that the compiler tries <&T>::foo(value) and <&mut T>::foo(value). This is called an "autoref" method call.
So when matching the trait bound, Rust compiler will try to auto ref/deref on the type only. In addition, the dot operator in rust is just a syntax sugar of fully qualified function call. Thus 5.0.try_into().unwrap(); will become f64::try_into(5.0).unwrap(); and since TryInto is not implemented for f64, Rust will try to auto reference it by calling &f64::try_into(5.0).unwrap();. Now the compiler can find a version of TryInto implemented for &f64, however the type of argument still doesn't match: try_into for &f64 requires &f64 as parameter type, while the current call provides f64, and Rust compiler cannot do any coercion on parameters when checking trait bound. Thus the trait bound still doesn't match (&f64 cannot take f64 argument) and the check failed. Thus you will see the error message.

How to reduce std::io::Chain

Moving on from https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html, I would like to define a function that accepts an iterable of Paths, and returns a Reader that wraps all the paths into a single stream, my non-compilable attempt,
fn read_lines<P, I: IntoIterator<Item = P>>(files: I) -> Result<io::Lines<io::BufReader<File>>>
where
P: AsRef<Path>,
{
let handles = files.into_iter()
.map(|path|
File::open(path).unwrap());
// I guess it is hard (impossible?) to define the type of this reduction,
// Chain<File, Chain<File, ..., Chain<File, File>>>
// and that is the reason the compiler is complaining.
match handles.reduce(|a, b| a.chain(b)) {
Some(combination) => Ok(BufReader::new(combination).lines()),
None => {
// Not nice, hard fail if the array len is 0
Ok(BufReader::new(handles.next().unwrap()).lines())
},
}
}
This gives an expected error, which I am unsure how to address,
error[E0599]: the method `chain` exists for struct `File`, but its trait bounds were not satisfied
--> src/bin.rs:136:35
|
136 | match handles.reduce(|a, b| a.chain(b)) {
| ^^^^^ method cannot be called on `File` due to unsatisfied trait bounds
|
::: /home/test/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/fs.rs:91:1
|
91 | pub struct File {
| --------------- doesn't satisfy `File: Iterator`
|
::: /home/test/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/mod.rs:902:8
|
902 | fn chain<R: Read>(self, next: R) -> Chain<Self, R>
| ----- the method is available for `Box<File>` here
|
= note: the following trait bounds were not satisfied:
`File: Iterator`
which is required by `&mut File: Iterator`
= help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
1 | use std::io::Read;
|
error: aborting due to previous error
I've tried contorting the code with Box's without success, but it seems the fundamental issue is that the type of this reduction is "undefined": Chain<File, Chain<File, ..., Chain<File, File>>> IIUC. How would a Rustacean define a method like this? Is it possible without using dynamic "boxes"?
I guess it is hard (impossible?) to define the type of this reduction, Chain<File, Chain<File, ..., Chain<File, File>>>. [...] How would a Rustacean define a method like this?
The combinator you are looking for is flat_map:
let handles = files.into_iter().map(|path| File::open(path).unwrap());
handles.flat_map(|handle| BufReader::new(handle).lines())
Also, your return type is unnecessarily specific, committing to a particular implementation of both the iterator over the handles and the iterator over the lines coming from a handle. Even if you get it to work, the signature of your function will be tightly coupled to its implementation, meaning you won't be able to to e.g. switch to a more efficient approach without introducing a breaking change to the API.
To avoid such coupling, you can use an impl Trait return type. That way the signature of your function only promises that the type of the returned value will implement Iterator. The function could then look like this:
fn read_lines<P, I: IntoIterator<Item = P>>(files: I) -> impl Iterator<Item = io::Result<String>>
where
P: AsRef<Path>,
{
let handles = files.into_iter().map(|path| File::open(path).unwrap());
handles.flat_map(|handle| BufReader::new(handle).lines())
}
Finally, if you really want to combine reduce and chain, you can do that too. Your intuition that you need to use a Box is correct, but it is much easier to use fold() than reduce():
handles.fold(
Box::new(std::iter::empty()) as Box<dyn Iterator<Item = _>>,
|iter, handle| Box::new(iter.chain(BufReader::new(handle).lines())),
)
Folding starts with an empty iterator, boxed and cast to a trait object, and proceeds to chain lines of each handle to the end of the previous iterator chain. Each result of the chain is boxed so that its type is erased to Box<dyn Iterator<Item = io::Result<String>>>, which eliminates the recursion on the type level. The return type of the function can be either impl Iterator or Box<dyn Iterator>, both will compile.
Note that this solution is inefficient, not just due to boxing, but also because the final iterator will wrap all the previous ones. Although the recursion is not visible from the erased types, it's there in the implementation, and the final next() will internally have to go through all the stacked iterators, possibly even blowing up the stack if there is a sufficient number of files. The solution based on flat_map() doesn't have that issue.

Why does the usage of by_ref().take() differ between the Iterator and Read traits?

Here are two functions:
fn foo<I>(iter: &mut I)
where
I: std::iter::Iterator<Item = u8>,
{
let x = iter.by_ref();
let y = x.take(2);
}
fn bar<I>(iter: &mut I)
where
I: std::io::Read,
{
let x = iter.by_ref();
let y = x.take(2);
}
While the first compiles fine, the second gives the compilation error:
error[E0507]: cannot move out of borrowed content
--> src/lib.rs:14:13
|
14 | let y = x.take(2);
| ^ cannot move out of borrowed content
The signatures of by_ref and take are almost identical in std::iter::Iterator and std::io::Read traits, so I supposed that if the first one compiles, the second will compile too. Where am I mistaken?
This is indeed a confusing error message, and the reason you get it is rather subtle. The answer by ozkriff correctly explains that this is because the Read trait is not in scope. I'd like to add a bit more context, and an explanation why you are getting the specific error you see, rather than an error that the method wasn't found.
The take() method on Read and Iterator takes self by value, or in other words, it consumes its receiver. This means you can only call it if you have ownership of the receiver. The functions in your question accept iter by mutable reference, so they don't own the underlying I object, so you can't call <Iterator>::take() or <Read>::take() for the underlying object.
However, as pointed out by ozkriff, the standard library provides "forwarding" implementations of Iterator and Read for mutable references to types that implement the respective traits. When you call iter.take(2) in your first function, you actually end up calling <&mut Iterator<Item = T>>::take(iter, 2), which only consumes your mutable reference to the iterator, not the iterator itself. This is perfectly valid; while the function can't consume the iterator itself since it does not own it, the function does own the reference. In the second function, however, you end up calling <Read>::take(*iter, 2), which tries to consume the underlying reader. Since you don't own that reader, you get an error message explaining that you can't move it out of the borrowed context.
So why does the second method call resolve to a different method? The answer by ozkriff already explains that this happens because the Iterator trait is in the standard prelude, while the Read trait isn't in scope by default. Let's look at the method lookup in more detail. It is documented in the section "Method call expressions" of the Rust language reference:
The first step is to build a list of candidate receiver types. Obtain these by repeatedly dereferencing the receiver expression's type, adding each type encountered to the list, then finally attempting an unsized coercion at the end, and adding the result type if that is successful. Then, for each candidate T, add &T and &mut T to the list immediately after T.
According to this rule, our list of candidate types is
&mut I, &&mut I, &mut &mut I, I, &I, &mut I
Then, for each candidate type T, search for a visible method with a receiver of that type in the following places:
T's inherent methods (methods implemented directly on T).
Any of the methods provided by a visible trait implemented by T. If T is a type parameter, methods provided by trait bounds on T are looked up first. Then all remaining methods in scope are looked up.
For the case I: Iterator, this process starts with looking up a take() method on &mut I. There are no inherent methods on &mut I, since I is a generic type, so we can skip step 1. In step 2, we first look up methods on trait bounds for &mut I, but there are only trait bounds for I, so we move on to looking up take() on all remaining methods in scope. Since Iterator is in scope, we indeed find the forwarding implementation from the standard library, and can stop processing our list of candidate types.
For the second case, I: Read, we also start with &mut I, but since Read is not in scope, we won't see the forwarding implementation. Once we get to I in our list of candidate types, though, the clause on methods provided by trait bounds kicks in: they are looked up first, regardless of whether the trait is in scope. I has a trait bound of Read, so <Read>::take() is found. As we have seen above, calling this method causes the error message.
In summary, traits must be in scope to use their methods, but methods on trait bounds can be used even if the trait isn't in scope.
impl<'a, I: Iterator + ?Sized> Iterator for &'a mut I is the reason why the first function compiles. It implements Iterator for all mutable references to iterators.
The Read trait has the equivalent, but, unlike Iterator, the Read trait isn't in the prelude, so you'll need to use std::io::Read to use this impl:
use std::io::Read; // remove this to get "cannot move out of borrowed content" err
fn foo<I, T>(iter: &mut I)
where
I: std::iter::Iterator<Item = T>,
{
let _y = iter.take(2);
}
fn bar<I>(iter: &mut I)
where
I: std::io::Read,
{
let _y = iter.take(2);
}
Playground

Why are explicit lifetimes needed in Rust?

I was reading the lifetimes chapter of the Rust book, and I came across this example for a named/explicit lifetime:
struct Foo<'a> {
x: &'a i32,
}
fn main() {
let x; // -+ x goes into scope
// |
{ // |
let y = &5; // ---+ y goes into scope
let f = Foo { x: y }; // ---+ f goes into scope
x = &f.x; // | | error here
} // ---+ f and y go out of scope
// |
println!("{}", x); // |
} // -+ x goes out of scope
It's quite clear to me that the error being prevented by the compiler is the use-after-free of the reference assigned to x: after the inner scope is done, f and therefore &f.x become invalid, and should not have been assigned to x.
My issue is that the problem could have easily been analyzed away without using the explicit 'a lifetime, for instance by inferring an illegal assignment of a reference to a wider scope (x = &f.x;).
In which cases are explicit lifetimes actually needed to prevent use-after-free (or some other class?) errors?
The other answers all have salient points (fjh's concrete example where an explicit lifetime is needed), but are missing one key thing: why are explicit lifetimes needed when the compiler will tell you you've got them wrong?
This is actually the same question as "why are explicit types needed when the compiler can infer them". A hypothetical example:
fn foo() -> _ {
""
}
Of course, the compiler can see that I'm returning a &'static str, so why does the programmer have to type it?
The main reason is that while the compiler can see what your code does, it doesn't know what your intent was.
Functions are a natural boundary to firewall the effects of changing code. If we were to allow lifetimes to be completely inspected from the code, then an innocent-looking change might affect the lifetimes, which could then cause errors in a function far away. This isn't a hypothetical example. As I understand it, Haskell has this problem when you rely on type inference for top-level functions. Rust nipped that particular problem in the bud.
There is also an efficiency benefit to the compiler — only function signatures need to be parsed in order to verify types and lifetimes. More importantly, it has an efficiency benefit for the programmer. If we didn't have explicit lifetimes, what does this function do:
fn foo(a: &u8, b: &u8) -> &u8
It's impossible to tell without inspecting the source, which would go against a huge number of coding best practices.
by inferring an illegal assignment of a reference to a wider scope
Scopes are lifetimes, essentially. A bit more clearly, a lifetime 'a is a generic lifetime parameter that can be specialized with a specific scope at compile time, based on the call site.
are explicit lifetimes actually needed to prevent [...] errors?
Not at all. Lifetimes are needed to prevent errors, but explicit lifetimes are needed to protect what little sanity programmers have.
Let's have a look at the following example.
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'a u32 {
x
}
fn main() {
let x = 12;
let z: &u32 = {
let y = 42;
foo(&x, &y)
};
}
Here, the explicit lifetimes are important. This compiles because the result of foo has the same lifetime as its first argument ('a), so it may outlive its second argument. This is expressed by the lifetime names in the signature of foo. If you switched the arguments in the call to foo the compiler would complain that y does not live long enough:
error[E0597]: `y` does not live long enough
--> src/main.rs:10:5
|
9 | foo(&y, &x)
| - borrow occurs here
10 | };
| ^ `y` dropped here while still borrowed
11 | }
| - borrowed value needs to live until here
The lifetime annotation in the following structure:
struct Foo<'a> {
x: &'a i32,
}
specifies that a Foo instance shouldn't outlive the reference it contains (x field).
The example you came across in the Rust book doesn't illustrate this because f and y variables go out of scope at the same time.
A better example would be this:
fn main() {
let f : Foo;
{
let n = 5; // variable that is invalid outside this block
let y = &n;
f = Foo { x: y };
};
println!("{}", f.x);
}
Now, f really outlives the variable pointed to by f.x.
Note that there are no explicit lifetimes in that piece of code, except the structure definition. The compiler is perfectly able to infer lifetimes in main().
In type definitions, however, explicit lifetimes are unavoidable. For example, there is an ambiguity here:
struct RefPair(&u32, &u32);
Should these be different lifetimes or should they be the same? It does matter from the usage perspective, struct RefPair<'a, 'b>(&'a u32, &'b u32) is very different from struct RefPair<'a>(&'a u32, &'a u32).
Now, for simple cases, like the one you provided, the compiler could theoretically elide lifetimes like it does in other places, but such cases are very limited and do not worth extra complexity in the compiler, and this gain in clarity would be at the very least questionable.
If a function receives two references as arguments and returns a reference, then the implementation of the function might sometimes return the first reference and sometimes the second one. It is impossible to predict which reference will be returned for a given call. In this case, it is impossible to infer a lifetime for the returned reference, since each argument reference may refer to a different variable binding with a different lifetime. Explicit lifetimes help to avoid or clarify such a situation.
Likewise, if a structure holds two references (as two member fields) then a member function of the structure may sometimes return the first reference and sometimes the second one. Again explicit lifetimes prevent such ambiguities.
In a few simple situations, there is lifetime elision where the compiler can infer lifetimes.
I've found another great explanation here: http://doc.rust-lang.org/0.12.0/guide-lifetimes.html#returning-references.
In general, it is only possible to return references if they are
derived from a parameter to the procedure. In that case, the pointer
result will always have the same lifetime as one of the parameters;
named lifetimes indicate which parameter that is.
The case from the book is very simple by design. The topic of lifetimes is deemed complex.
The compiler cannot easily infer the lifetime in a function with multiple arguments.
Also, my own optional crate has an OptionBool type with an as_slice method whose signature actually is:
fn as_slice(&self) -> &'static [bool] { ... }
There is absolutely no way the compiler could have figured that one out.
As a newcomer to Rust, my understanding is that explicit lifetimes serve two purposes.
Putting an explicit lifetime annotation on a function restricts the type of code that may appear inside that function. Explicit lifetimes allow the compiler to ensure that your program is doing what you intended.
If you (the compiler) want(s) to check if a piece of code is valid, you (the compiler) will not have to iteratively look inside every function called. It suffices to have a look at the annotations of functions that are directly called by that piece of code. This makes your program much easier to reason about for you (the compiler), and makes compile times managable.
On point 1., Consider the following program written in Python:
import pandas as pd
import numpy as np
def second_row(ar):
return ar[0]
def work(second):
df = pd.DataFrame(data=second)
df.loc[0, 0] = 1
def main():
# .. load data ..
ar = np.array([[0, 0], [0, 0]])
# .. do some work on second row ..
second = second_row(ar)
work(second)
# .. much later ..
print(repr(ar))
if __name__=="__main__":
main()
which will print
array([[1, 0],
[0, 0]])
This type of behaviour always surprises me. What is happening is that df is sharing memory with ar, so when some of the content of df changes in work, that change infects ar as well. However, in some cases this may be exactly what you want, for memory efficiency reasons (no copy). The real problem in this code is that the function second_row is returning the first row instead of the second; good luck debugging that.
Consider instead a similar program written in Rust:
#[derive(Debug)]
struct Array<'a, 'b>(&'a mut [i32], &'b mut [i32]);
impl<'a, 'b> Array<'a, 'b> {
fn second_row(&mut self) -> &mut &'b mut [i32] {
&mut self.0
}
}
fn work(second: &mut [i32]) {
second[0] = 1;
}
fn main() {
// .. load data ..
let ar1 = &mut [0, 0][..];
let ar2 = &mut [0, 0][..];
let mut ar = Array(ar1, ar2);
// .. do some work on second row ..
{
let second = ar.second_row();
work(second);
}
// .. much later ..
println!("{:?}", ar);
}
Compiling this, you get
error[E0308]: mismatched types
--> src/main.rs:6:13
|
6 | &mut self.0
| ^^^^^^^^^^^ lifetime mismatch
|
= note: expected type `&mut &'b mut [i32]`
found type `&mut &'a mut [i32]`
note: the lifetime 'b as defined on the impl at 4:5...
--> src/main.rs:4:5
|
4 | impl<'a, 'b> Array<'a, 'b> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...does not necessarily outlive the lifetime 'a as defined on the impl at 4:5
--> src/main.rs:4:5
|
4 | impl<'a, 'b> Array<'a, 'b> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
In fact you get two errors, there is also one with the roles of 'a and 'b interchanged. Looking at the annotation of second_row, we find that the output should be &mut &'b mut [i32], i.e., the output is supposed to be a reference to a reference with lifetime 'b (the lifetime of the second row of Array). However, because we are returning the first row (which has lifetime 'a), the compiler complains about lifetime mismatch. At the right place. At the right time. Debugging is a breeze.
The reason why your example does not work is simply because Rust only has local lifetime and type inference. What you are suggesting demands global inference. Whenever you have a reference whose lifetime cannot be elided, it must be annotated.
I think of a lifetime annotation as a contract about a given ref been valid in the receiving scope only while it remains valid in the source scope. Declaring more references in the same lifetime kind of merges the scopes, meaning that all the source refs have to satisfy this contract.
Such annotation allow the compiler to check for the fulfillment of the contract.

How does Vec<T> implement iter()?

I am looking at the code of Vec<T> to see how it implements iter() as I want to implement iterators for my struct:
pub struct Column<T> {
name: String,
vec: Vec<T>,
...
}
My goal is not to expose the fields and provide iterators to do looping, max, min, sum, avg, etc for a column.
fn test() {
let col: Column<f32> = ...;
let max = col.iter().max();
}
I thought I would see how Vec<T> does iteration. I can see iter() is defined in SliceExt but it's implemented for [T] and not Vec<T> so I am stumped how you can call iter() from Vec<T>?
Indeed, as fjh said, this happens due to how dereference operator functions in Rust and how methods are resolved.
Rust has special Deref trait which allows values of the types implementing it to be "dereferenced" to obtain another type, usually one which is naturally connected to the source type. For example, an implementation like this one:
impl<T> Deref for Vec<T> {
type Target = [T];
fn deref<'a>(&'a self) -> &'a [T] { self.as_slice() }
}
means that applying * unary operator to a Vec<T> would yield [T] which you would need to borrow again:
let v: Vec<u32> = vec![0; 10];
let s: &[u32] = &*v;
(note that even though deref() returns a reference, the dereference operator * returns Target, not &Target - the compiler inserts automatic dereference if you do not borrow the dereferenced value immediately).
This is the first piece of puzzle. The second one is how methods are resolved. Basically, when you write something like
v.iter()
the compiler first tries to find iter() defined on the type of v (in this case Vec<u32>). If no such method can be found, the compiler tries to insert an appropriate number of *s and &s so the method invocation becomes valid. In this case it find that the following is indeed a valid invocation:
(&*v).iter()
Remember, Deref on Vec<T> returns &[T], and slices do have iter() method defined on them. This is also how you can invoke e.g. a method taking &self on a regular value - the compiler automatically inserts a reference operation for you.

Resources