Why does Vec<T> expect &T as the argument to binary_search? - rust

This question is predicated on the assumption that a friendly/ergonomic API in rust should prefer a reference to a type Q where T: Borrow<Q> instead of expecting &T directly. In my experience working with the API of other collection types like HashMap, this definitely seems to be the case. That said...
How come the binary_search method on Vec<T> is not defined that way? Currently, on stable, the binary_search implementation is as follows:
pub fn binary_search(&self, x: &T) -> Result<usize, usize>
where
T: Ord,
{
self.binary_search_by(|p| p.cmp(x))
}
It seems like the following would be a better implementation:
pub fn binary_search_modified<Q>(&self, x: &Q) -> Result<usize, usize>
where
T: Borrow<Q>,
Q: Ord + ?Sized,
{
self.binary_search_by(|p| p.borrow().cmp(x))
}
A comparison of the two APIs above:
let mut v: Vec<String> = Vec::new();
v.push("A".into());
v.push("B".into());
v.push("D".into());
let _ = v.binary_search("C"); // Compilation error!
let _ = v.binary_search(&String::from("C")); // Fine allocate and convert it to the exact type, I guess
let _ = v.binary_search_modified("C"); // Far nicer API, does the same thing
let _ = v.binary_search_modified(&String::from("C")); // Backwards compatible
As a more general question, what considerations go into deciding if a method should accept &T or &Q ... where T: Borrow<Q>?

You are right that binary_search() and a few other methods like contains() could be generalized to accept any type that can be borrowed as T, but unfortunately Rust 1.0 was released with the less general signature. And while it looks like using Borrow is strictly more general, attempts to implement that change broke type inference in too many cases.
There are countless Github issues, PRs and forum discussion on this topic. If you want to follow the history of the attempts to fix this, I suggest starting at the PR finally reverting the attempts to make binary_search() more general and working your way backwards.
Regarding your more general question, my advice would be the same as for any API design question: think about the use cases. Using an additional type parameter makes the code somewhat more complex, and the documentation and compiler errors become less obvious. For methods on a trait, the type parameter will render the trait unusuable for trait objects. So if you can think of convincing use cases for the more general version using Borrow, go for it, but in the absence of a convincing use case it's better to avoid it.

Related

Pass struct generic type to trait generic method [duplicate]

In this question, an issue arose that could be solved by changing an attempt at using a generic type parameter into an associated type. That prompted the question "Why is an associated type more appropriate here?", which made me want to know more.
The RFC that introduced associated types says:
This RFC clarifies trait matching by:
Treating all trait type parameters as input types, and
Providing associated types, which are output types.
The RFC uses a graph structure as a motivating example, and this is also used in the documentation, but I'll admit to not fully appreciating the benefits of the associated type version over the type-parameterized version. The primary thing is that the distance method doesn't need to care about the Edge type. This is nice but seems a bit shallow of a reason for having associated types at all.
I've found associated types to be pretty intuitive to use in practice, but I find myself struggling when deciding where and when I should use them in my own API.
When writing code, when should I choose an associated type over a generic type parameter, and when should I do the opposite?
This is now touched on in the second edition of The Rust Programming Language. However, let's dive in a bit in addition.
Let us start with a simpler example.
When is it appropriate to use a trait method?
There are multiple ways to provide late binding:
trait MyTrait {
fn hello_word(&self) -> String;
}
Or:
struct MyTrait<T> {
t: T,
hello_world: fn(&T) -> String,
}
impl<T> MyTrait<T> {
fn new(t: T, hello_world: fn(&T) -> String) -> MyTrait<T>;
fn hello_world(&self) -> String {
(self.hello_world)(self.t)
}
}
Disregarding any implementation/performance strategy, both excerpts above allow the user to specify in a dynamic manner how hello_world should behave.
The one difference (semantically) is that the trait implementation guarantees that for a given type T implementing the trait, hello_world will always have the same behavior whereas the struct implementation allows having a different behavior on a per instance basis.
Whether using a method is appropriate or not depends on the usecase!
When is it appropriate to use an associated type?
Similarly to the trait methods above, an associated type is a form of late binding (though it occurs at compilation), allowing the user of the trait to specify for a given instance which type to substitute. It is not the only way (thus the question):
trait MyTrait {
type Return;
fn hello_world(&self) -> Self::Return;
}
Or:
trait MyTrait<Return> {
fn hello_world(&Self) -> Return;
}
Are equivalent to the late binding of methods above:
the first one enforces that for a given Self there is a single Return associated
the second one, instead, allows implementing MyTrait for Self for multiple Return
Which form is more appropriate depends on whether it makes sense to enforce unicity or not. For example:
Deref uses an associated type because without unicity the compiler would go mad during inference
Add uses an associated type because its author thought that given the two arguments there would be a logical return type
As you can see, while Deref is an obvious usecase (technical constraint), the case of Add is less clear cut: maybe it would make sense for i32 + i32 to yield either i32 or Complex<i32> depending on the context? Nonetheless, the author exercised their judgment and decided that overloading the return type for additions was unnecessary.
My personal stance is that there is no right answer. Still, beyond the unicity argument, I would mention that associated types make using the trait easier as they decrease the number of parameters that have to be specified, so in case the benefits of the flexibility of using a regular trait parameter are not obvious, I suggest starting with an associated type.
Associated types are a grouping mechanism, so they should be used when it makes sense to group types together.
The Graph trait introduced in the documentation is an example of this. You want a Graph to be generic, but once you have a specific kind of Graph, you don't want the Node or Edge types to vary anymore. A particular Graph isn't going to want to vary those types within a single implementation, and in fact, wants them to always be the same. They're grouped together, or one might even say associated.
Associated types can be used to tell the compiler "these two types between these two implementations are the same". Here's a double dispatch example that compiles, and is almost similar to how the standard library relates iterator to sum types:
trait MySum {
type Item;
fn sum<I>(iter: I)
where
I: MyIter<Item = Self::Item>;
}
trait MyIter {
type Item;
fn next(&self) {}
fn sum<S>(self)
where
S: MySum<Item = Self::Item>;
}
struct MyU32;
impl MySum for MyU32 {
type Item = MyU32;
fn sum<I>(iter: I)
where
I: MyIter<Item = Self::Item>,
{
iter.next()
}
}
struct MyVec;
impl MyIter for MyVec {
type Item = MyU32;
fn sum<S>(self)
where
S: MySum<Item = Self::Item>,
{
S::sum::<Self>(self)
}
}
fn main() {}
Also, https://blog.thomasheartman.com/posts/on-generics-and-associated-types has some good information on this as well:
In short, use generics when you want to type A to be able to implement a trait any number of times for different type parameters, such as in the case of the From trait.
Use associated types if it makes sense for a type to only implement the trait once, such as with Iterator and Deref.

Why impl RangeBounds<T> for Range<&T> requires T to be sized?

https://doc.rust-lang.org/src/core/ops/range.rs.html#979-986
impl<T> RangeBounds<T> for Range<&T> {
fn start_bound(&self) -> Bound<&T> {
Included(self.start)
}
fn end_bound(&self) -> Bound<&T> {
Excluded(self.end)
}
}
As you can see, T is not marked with ?Sized, which is preventing me from passing a Range<&[u8]> into an argument requires impl RangeBounds<[u8]>.
Are there some design considerations behind it? If this is intended, which is the proper way to pass a range of [u8]?
It's rather unfortunate, but adding the T: ?Sized bound breaks type inference in code like btreemap.range("from".."to") because the compiler can't choose between T = str and T = &str, both of which satisfy the bounds on range.
Some discussion of this has taken place on PR #64327. As far as I know there are no plans to relax the bounds on these impls in the future.
Instead of a Range<T>, you can use a tuple of Bound<T>s; for example, instead of takes_range("from".."to"), you can write the following instead:
use std::ops::Bound;
takes_range((Bound::Included("from"), Bound::Excluded("to"))
This will work because (Bound<&T>, Bound<&T>) does implement RangeBounds<T> even when T is !Sized.

How to properly pass Iterators to a function in Rust

I want to pass Iterators to a function, which then computes some value from these iterators.
I am not sure how a robust signature to such a function would look like.
Lets say I want to iterate f64.
You can find the code in the playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c614429c541f337adb102c14518cf39e
My first attempt was
fn dot(a : impl std::iter::Iterator<Item = f64>,b : impl std::iter::Iterator<Item = f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if we try to iterate over slices
So you can do
fn dot<'a>(a : impl std::iter::Iterator<Item = &'a f64>,b : impl std::iter::Iterator<Item = &'a f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if I try to iterate over mapped Ranges.
(Why does the compiler requires the livetime parameters here?)
So I tried to accept references and not references generically:
pub fn dot<T : Borrow<f64>, U : Borrow<f64>>(a : impl std::iter::Iterator::<Item = T>, b: impl std::iter::Iterator::<Item = U>) -> f64 {
a.zip(b).map(|(x,y)| x.borrow()*y.borrow()).sum()
}
This works with all combinations I tried, but it is quite verbose and I don't really understand every aspect of it.
Are there more cases?
What would be the best practice of solving this problem?
There is no right way to write a function that can accept Iterators, but there are some general principles that we can apply to make your function general and easy to use.
Write functions that accept impl IntoIterator<...>. Because all Iterators implement IntoIterator, this is strictly more general than a function that accepts only impl Iterator<...>.
Borrow<T> is the right way to abstract over T and &T.
When trait bounds get verbose, it's often easier to read if you write them in where clauses instead of in-line.
With those in mind, here's how I would probably write dot:
fn dot<I, J>(a: I, b: J) -> f64
where
I: IntoIterator,
J: IntoIterator,
I::Item: Borrow<f64>,
J::Item: Borrow<f64>,
{
a.into_iter()
.zip(b)
.map(|(x, y)| x.borrow() * y.borrow())
.sum()
}
However, I also agree with TobiP64's answer in that this level of generality may not be necessary in every case. This dot is nice because it can accept a wide range of arguments, so you can call dot(&some_vec, some_iterator) and it just works. It's optimized for readability at the call site. On the other hand, if you find the Borrow trait complicates the definition too much, there's nothing wrong with optimizing for readability at the definition, and forcing the caller to add a .iter().copied() sometimes. The only thing I would definitely change about the first dot function is to replace Iterator with IntoIterator.
You can iterate over slices with the first dot implementation like that:
dot([0, 1, 2].iter().cloned(), [0, 1, 2].iter().cloned());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.cloned)
or
dot([0, 1, 2].iter().copied(), [0, 1, 2].iter().copied());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.copied)
Why does the compiler requires the livetime parameters here?
As far as I know every reference in rust has a lifetime, but the compiler can infer simple it in cases. In this case, however the compiler is not yet smart enough, so you need to tell it how long the references yielded by the iterator lives.
Are there more cases?
You can always use iterator methods, like the solution above, to get an iterator over f64, so you don't have to deal with lifetimes or generics.
What would be the best practice of solving this problem?
I would recommend the first version (and thus leaving it to the caller to transform the iterator to Iterator<f64>), simply because it's the most readable.

Should I implement AddAssign on a newtype?

I've got a newtype:
struct NanoSecond(u64);
I want to implement addition for this. (I'm actually using derive_more, but here's an MCVE.)
impl Add for NanoSecond {
fn add(self, other: Self) -> Self {
self.0 + other.0
}
}
But should I implement AddAssign? Is it required for this to work?
let mut x: NanoSecond = 0.to();
let y: NanoSecond = 5.to();
x += y;
Will implementing it cause unexpected effects?
Implementing AddAssign is indeed required for the += operator to work.
The decision of whether to implement this trait will depend greatly on the actual type and kind of semantics that you are aiming for. This applies to any type of your own making, including newtypes. The most important prior is predictability: an implementation should behave as expected from the same mathematical operation. In this case, considering that the addition through Add is already well defined for that type, and nothing stops you from implementing the equivalent operation in-place, then adding an impl of AddAssign like so is the most predictable thing to do.
impl AddAssign for NanoSecond {
fn add_assign(&mut self, other: Self) {
self.0 += other.0
}
}
One may also choose to provide additional implementations for reference types as the second operand (e.g. Add<&'a Self> and AddAssign<&'a Self>).
Note that Clippy has lints which check whether the implementation of the arithmetic operation is sound (suspicious_arithmetic_impl and suspicious_op_assign_impl). As part of being predictable, the trait should behave pretty much like the respective mathematical operation, regardless of whether + or += was used. To the best of my knowledge though, there is currently no lint or API guideline suggesting to implement -Assign traits alongside the respective operation.

Unzip iterator of references to tuples into two collections of references

I have an Iterator<Item = &(T, U)> over a slice &[(T, U)]. I'd like to unzip this iterator into its components (i.e. obtain (Vec<&T>, Vec<&U>)).
Rust provides unzip functionality through the .unzip() method on Interator:
points.iter().unzip()
Unfortunately, this doesn't work as-is because .unzip() expects the type of the iterator's item to be a tuple; mine is a reference to a tuple.
To fix this, I tried to write a function which converts between a reference to a tuple and a tuple of references:
fn distribute_ref<'a, T, U>(x: &'a (T, U)) -> (&'a T, &'a U) {
(&x.0, &x.1)
}
I can then map over the resulting iterator to get something .unzip() can handle:
points.iter().map(distribute_ref).unzip()
This works now, but I this feels a bit strange. In particular, distribute_ref seems like a fairly simple operation that would be provided by the Rust standard library. I'm guessing it either is and I can't find it, or I'm not approaching this the right way.
Is there a better way to do this?
Is there a better way to do this?
"Better" is a bit subjective. You can make your function shorter:
fn distribute_ref<T, U>(x: &(T, U)) -> (&T, &U) {
(&x.0, &x.1)
}
Lifetime elision allows to omit lifetime annotations in this case.
You can use a closure to do the same thing:
points.iter().map(|&(ref a, ref b)| (a, b)).unzip()
Depending on the task it can be sufficient to clone the data. Especially in this case, as reference to u8 takes 4 or 8 times more space than u8 itself.
points().iter().cloned().unzip()

Resources