Rust "random access" iterator - rust

I would like to write an iterator for objects that implement a trait that allows accessing items by index. Something like this:
trait Collection {
fn get_item(&self, index: usize) -> &Item;
fn get_item_mut(&mut self, index: usize) -> &mut Item;
}
I know how to implement next() function for both Iter and IterMut. My question is which other Iterator methods do I need to implement to make the iterators as efficient as possible. What I mean by this is that for example nth() goes directly to the item, not calling next() until it reaches it. I'd like to know the minimum amount of functions I need to implement to make it work as fast as a slice iterator.
I've googled about this but can't find anything concrete on this topic.

On nightly, you can implement advance_by(). This will give you an efficient nth() for free, as well as other code that uses it in the standard library.
On stable, you need to implement nth(). And of course, ExactSizeIterator and size_hint(). For extra bit of performance you can also implement TrustedLen on nightly, allowing e.g. collect::<Vec<_>>() to take advantage of the fact that the amount of items yielded is guaranteed.
Unfortunately, you can never be as efficient as slice iterators, because they implement various std-private traits (e.g. TrustedRandomAccess) and libstd uses those traits with various specializations to make them more efficient in various situations.

Related

Why can't RefCell be used as a self parameter

You can do this:
impl Foo {
fn foo(self: &Rc<Self>) {}
}
But not this:
impl Foo {
fn foo(self: &Rc<RefCell<Self>>) {}
}
The former is quite useful - e.g. I can have methods return objects containing weak references to self. But because I can't use RefCell I can't return anything that would mutate self.
There are ways around this (e.g. wrapping the whole struct in RefCell internally) but none as convenient for my current task as just allowing self: &Rc<RefCell<>>.
The grammar allowed is described here. It allows Box, Rc, Arc and Pin but not RefCell. Why?
As of the time of this writing, in August 2022, method receiver type support is still fairly limited to a handful of types, and compositions of those types. You've noticed that RefCell is not among them.
arbitrary_self_types, a feature for expanding the possible types of self, has a tracking issue for discussing its implementation. From the discussion, it seems that this feature is currently targeting types that implement Deref[Mut], which RefCell does not implement. A few comments point out that this is limiting though, so there's still a possibility.
Ultimately, I think the answer is that RefCell does not work because the full-fledged feature hasn't been designed thoroughly enough yet. The Rust team likely implemented the basic feature so that we could have Pin as a self type and thus make futures work, and added a few other "easy" types to get more bang for their buck, but a proper and generic implementation has been deferred.

What are the detriments of having an Iterator return itself?

Is a pattern that includes an iterator that returns itself an anti-pattern? Is this a good idea, or not?
impl Iterator for MySequence {
type Item = Self;
My sequence returns an element, and it's not immediately apparent why that element can itself could not be a sequence? Is this good practice?
The only method that the Iterator has for you to implement is:
fn next(&mut self) -> Option<Self::Item>
If Self::Item is Self then there would be no way to get anything out of the iterator other than more iterators.
You could of course implement other methods and/or traits that would allow you to obtain values from the iterator. This is not the behaivour expected by many of the use cases for iterators in the standard library (or for that matter 3rd party libraries).
Some examples:
Iterator methods such as sum, cmp, min etc expect to receive the values directly from next.
You could work around this by implementing underlying traits like Sum, Ord, etc for your iterator
Other methods like map could be used, but they would be less ergonimic.
Generic functions & structs may express requirements on type parameters in terms of the iterator's item. An example of this is the collect method, which collects the iterator's items into a container - but to do that it needs to know what type the items will have.
In short - because your iterator does not comply to the expected behaivour, you will be battling to use it many of the places where you would expect to use an Iterator.
What is useful is an iterator that can be cloned, such that you can for instance return to a point in iteration if needed. You mentioned in the comments:
You can also save any value of seq.next() which is itself an iterator.
If the iterator implements Clone then you can save it with:
let saved = iter.clone();
That has the advantage that you are only cloning it when needed, rather than on each iteration.

How to idiomatically require a trait to retrieve a slice

I'm working on a library that operates on [T] slices. I would like the library user to specify a type that can retrieve a slice in a mutable or immutable way.
My current lib defines 2 traits
pub trait SliceWrapper<T> {
fn slice(&self) -> &[T];
}
pub trait SliceWrapperMut<T> {
fn slice_mut (&mut self) -> &mut [T];
}
But the names slice and slice_mut seem arbitrary and not somewhere in the core Rust libs.
Is there a trait I should be requiring instead, like Into::<&mut [T]> ?
I'm afraid that Into consumes the type rather than just referencing it, so I can't simply demand the caller type implements core::convert::Into, can I?
The simplest way to answer this is to look at the documentation for existing types that more or less do what you want. This sounds like something Vec would do, so why not look at the documentation for Vec?
If you search through it looking for -> &, you'll find numerous methods that return borrowed slices:
Vec::as_slice
Borrow::borrow
Index::<Range<usize>>::index (plus for RangeTo<usize>, RangeFrom<usize>, RangeFull, RangeInclusive<usize>, and RangeToInclusive<usize>)
Deref::deref
AsRef::<[T]>::as_ref
Which of these should you implement? I don't know; which ones make sense? Read the documentation on each method (and the associated trait) and see if what it describes is what you want to permit. You say "that can retrieve a slice", but you don't really explain what that means. A big part of traits is not just abstracting out common interfaces, but giving those interfaces meaning beyond what the code strictly allows.
If the methods listed aren't right; if none of them quite convey the correct semantics, then make a new trait. Don't feel compelled to implement a trait just because you technically can.
As for Into, again, read the documentation:
A conversion that consumes self, which may or may not be expensive.
Emphasis mine. Implementing Into in this context makes no sense: if you consume the value, you can't borrow from it. Oh, also:
Library authors should not directly implement this trait, but should prefer implementing the From trait, which offers greater flexibility and provides an equivalent Into implementation for free, thanks to a blanket implementation in the standard library.
So yeah, I wouldn't use Into for this. Or From.

Indexing vector by a 32-bit integer

In Rust, vectors are indexed using usize, so when writing
let my_vec: Vec<String> = vec!["Hello", "world"];
let index: u32 = 0;
println!("{}", my_vec[index]);
you get an error, as index is expected to be of type usize. I'm aware that this can be fixed by explicitly converting index to usize:
my_vec[index as usize]
but this is tedious to write. Ideally I'd simply overload the [] operator by implementing
impl<T> std::ops::Index<u32> for Vec<T> { ... }
but that's impossible as Rust prohibits this (as neither the trait nor struct are local). The only alternative that I can see is to create a wrapper class for Vec, but that would mean having to write lots of function wrappers as well. Is there any more elegant way to address this?
Without a clear use case it is difficult to recommend the best approach.
There are basically two questions here:
do you really need indexing?
do you really need to use u32 for indices?
When using functional programming style, indexing is generally unnecessary as you operate on iterators instead. In this case, the fact that Vec only implements Index for usize really does not matter.
If your algorithm really needs indexing, then why not use usize? There are many ways to convert from u32 to usize, converting at the last moment possible is one possibility, but there are other sites where you could do the conversion, and if you find a chokepoint (or create it) you can get away with only a handful of conversions.
At least, that's the YAGNI point of view.
Personally, as a type freak, I tend to wrap things around a lot. I just like to add semantic information, because let's face it Vec<i32> just doesn't mean anything.
Rust offers a simple way to create wrapper structures: struct MyType(WrappedType);. That's it.
Once you have your own type, adding indexing is easy. There are several ways to add other operations:
if only a few operations make sense, then adding explicitly is best.
if many operations are necessary, and you do not mind exposing the fact that underneath is a Vec<X>, then you can expose it:
by making it public: struct MyType(pub WrappedType);, users can then call .0 to access it.
by implementing AsRef and AsMut, or creating a getter.
by implementing Deref and DerefMut (which is implicit, make sure you really want to).
Of course, breaking encapsulation can be annoying later, as it also prevents the maintenance of invariants, so I would consider it a last ditch solution.
I prefer to store "references" to nodes as u32 rather than usize. So when traversing the graph I keep retrieving adjacent vertex "references", which I then use to look up the actual vertex object in the Vec object
So actually you don't want u32, because you will never do calculations on it, and u32 easily allows you to do math. You want an index-type that can just do indexing but whose values are immutable otherwise.
I suggest you implement something along the line of rustc_data_structures::indexed_vec::IndexVec.
This custom IndexVec type is not only generic over the element type, but also over the index type, and thus allows you to use a NodeId newtype wrapper around u32. You'll never accidentally use a non-id u32 to index, and you can use them just as easily as a u32. You don't even have to create any of these indices by calculating them from the vector length, instead the push method returns the index of the location where the element has just been inserted.

Is it recommended to use traits to implement utility functions for structs in external crate?

I want to implement a simple utility/helper function in Rust. The function just concatenates the path in a struct (from an external crate) and the argument passed. Is it more idiomatic to implement the helper-function as a normal function or as function of a custom trait?
The implementation of the trait-based approach:
use std::path::{Path, PathBuf};
pub trait RepositoryExt {
fn get_full_path(&self, path_in_repository: &Path) -> PathBuf;
}
impl RepositoryExt for othercrate::Repository {
// othercrate::Repository's workdir() returns its path
fn get_full_path(&self, path_in_repository: &Path) -> PathBuf {
self.workdir().join(path_in_repository)
}
}
With just a function:
pub fn get_repository_full_path(repo: othercrate::Repository,
path_in_repository: &Path) -> PathBuf {
repo.workdir().join(path_in_repository)
}
The trait-based approach shortens the code when using the helper-function, but I'm worried that it may introduce difficulty to understand where it's defined.
Though both implementations should work, I want to know which is the recommended way in Rust.
(Disclaimer: I am not entirely sure about this. If this answer receives enoughâ„¢ upvotes, I will delete this disclaimer)
Good question! I have already seen both solution in the wild and would say that both are valid to use. Or in other words: neither of the two solutions are considered bad.
However, I'd say that using the Ext-trait approach is often a slightly better choice due to these advantages:
Many operations feel way more natural to call "on an object" (with dot-notation) than to call a function with both objects.
Chaining multiple calls looks nice in code because it fits our left-to-right way of reading, whereas with the function-approach the code is harder to read: f(f(a, f(d, e)), c).
If the user prefers the plain-function style, he can also use it that way with Trait::func(self_object, arg).
But of course there are some disadvantages (you already mentioned one):
It's harder for the user to understand where the helper-function is defined.
The user needs to have the trait in scope (read: use the trait).

Resources