Why are len() and is_empty() not defined in a trait? - rust

Most patterns in Rust are captured by traits (Iterator, From, Borrow, etc.).
How come a pattern as pervasive as len/is_empty has no associated trait in the standard library? Would that cause problems which I do not foresee? Was it deemed useless? Or is it only that nobody thought of it (which seems unlikely)?

Was it deemed useless?
I would guess that's the reason.
What could you do with the knowledge that something is empty or has length 15? Pretty much nothing, unless you also have a way to access the elements of the collection for example. The trait that unifies collections is Iterator. In particular an iterator can tell you how many elements its underlying collection has, but it also does a lot more.
Also note that should you need an Empty trait, you can create one and implement it for all standard collections, unlike interfaces in most languages. This is the power of traits. This also means that the standard library doesn't need to provide small utility traits for every single use case, they can be provided by libraries!

Just adding a late but perhaps useful answer here. Depending on what exactly you need, using the slice type might be a good option, rather than specifying a trait. Slices have len(), is_empty(), and other useful methods (full docs here). Consider the following:
use core::fmt::Display;
fn printme<T: Display>(x: &[T]) {
println!("length: {}, empty: ", x.len());
for c in x {
print!("{}, ", c);
}
println!("\nDone!");
}
fn main() {
let s = "This is some string";
// Vector
let vv: Vec<char> = s.chars().collect();
printme(&vv);
// Array
let x = [1, 2, 3, 4];
printme(&x);
// Empty
let y:Vec<u8> = Vec::new();
printme(&y);
}
printme can accept either a vector or an array. Most other things that it accepts will need some massaging.
I think maybe the reason for there being no Length trait is that most functions will either a) work through an iterator without needing to know its length (with Iterator), or b) require len because they do some sort of random element access, in which case a slice would be the best bet. In the first case, knowing length may be helpful to pre-allocate memory of some size, but size_hint takes care of this when used for anything like Vec::with_capacity, or ExactSizeIterator for anything that needs specific allocations. Most other cases would probably need to be collected to a vector at some point within the function, which has its len.
Playground link to my example here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9a034c2e8b75775449afa110c05858e7

Related

Is Vec<&&str> the same as Vec<&Str>?

I'm learning Rust and I'm trying to solve an advent of code challenge (day 9 2015).
I created a situation where I end up with a variable that has the type Vec<&&str> (note the double '&', it's not a typo). I'm now wondering if this type is different than Vec<&str>. I can't figure out if a reference to a reference to something would ever make sense. I know I can avoid this situation by using String for the from and to variables. I'm asking if Vec<&&str> == Vec<&str> and if I should try and avoid Vec<&&str>.
Here is the code that triggered this question:
use itertools::Itertools
use std::collections::{HashSet};
fn main() {
let contents = fs::read_to_string("input.txt").unwrap();
let mut vertices: HashSet<&str> = HashSet::new();
for line in contents.lines() {
let data: Vec<&str> = line.split(" ").collect();
let from = data[0];
let to = data[2];
vertices.insert(from);
vertices.insert(to);
}
// `Vec<&&str>` originates from here
let permutations_iter = vertices.iter().permutations(vertices.len());
for perm in permutations_iter {
let length_trip = compute_length_of_trip(&perm);
}
}
fn compute_length_of_trip(trip: &Vec<&&str>) -> u32 {
...
}
Are Vec<&str> and Vec<&&str> different types?
I'm now wondering if this type is different than Vec<&str>.
Yes, a Vec<&&str> is a type different from Vec<&str> - you can't pass a Vec<&&str> where a Vec<&str> is expected and vice versa. Vec<&str> stores string slice references, which you can think of as pointers to data inside some strings. Vec<&&str> stores references to such string slice references, i.e. pointers to pointers to data. With the latter, accessing the string data requires an additional indirection.
However, Rust's auto-dereferencing makes it possible to use a Vec<&&str> much like you'd use a Vec<&str> - for example, v[0].len() will work just fine on either, v[some_idx].chars() will iterate over chars with either, and so on. The only difference is that Vec<&&str> stores the data more indirectly and therefore requires a bit more work on each access, which can lead to slightly less efficient code.
Note that you can always convert a Vec<&&str> to Vec<&str> - but since doing so requires allocating a new vector, if you decide you don't want Vec<&&str>, it's better not to create it in the first place.
Can I avoid Vec<&&str> and how?
Since a &str is Copy, you can avoid the creation of Vec<&&str> by adding a .copied() when you iterate over vertices, i.e. change vertices.iter() to vertices.iter().copied(). If you don't need vertices sticking around, you can also use vertices.into_iter(), which will give out &str, as well as free vertices vector as soon as the iteration is done.
The reason why the additional reference arises and the ways to avoid it have been covered on StackOverflow before.
Should I avoid Vec<&&str>?
There is nothing inherently wrong with Vec<&&str> that would require one to avoid it. In most code you'll never notice the difference in efficiency between Vec<&&str> and Vec<&str>. Having said that, there are some reasons to avoid it beyond performance in microbenchmarks. The additional indirection in Vec<&&str> requires the exact &strs it was created from (and not just the strings that own the data) to stick around and outlive the new collection. This is not relevant in your case, but would become noticeable if you wanted to return the permutations to the caller that owns the strings. Also, there is value in the simpler type that doesn't accumulate a reference on each transformation. Just imagine needing to transform the Vec<&&str> further into a new vector - you wouldn't want to deal with Vec<&&&str>, and so on for every new transformation.
Regarding performance, less indirection is usually better since it avoids an extra memory access and increases data locality. However, one should also note that a Vec<&str> takes up 16 bytes per element (on 64-bit architectures) because a slice reference is represented by a "fat pointer", i.e. a pointer/length pair. A Vec<&&str> (as well as Vec<&&&str> etc.) on the other hand takes up only 8 bytes per element, because a reference to a fat reference is represented by a regular "thin" pointer. So if your vector measures millions of elements, a Vec<&&str> might be more efficient than Vec<&str> simply because it occupies less memory. As always, if in doubt, measure.
The reason you have &&str is that the data &str is owned by vertices and when you create an interator over that data you are simply getting a reference to that data, hence the &&str.
There's really nothing to avoid here. It simply shows your iterator references the data that is inside the HashSet.

Rust: can I have a fixed size slice by borrowing the whole fixed size array in a smaller scope in a simple way

I saw the workarounds and they where kinda long. Am I missing a feature of Rust or a simple solution (Important: not workaround). I feel like I should be able to do this with maybe a simple macro but arrayref crate implementations aren't what I am looking for. Is this a feature that needs to be added to Rust or creating fixed size slicing from fixed sized array in a smaller scope is something bad.
Basically what I want to do is this;
fn f(arr:[u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
// can't I borrow the whole array but but a fixed slice to it?
f(&mut arr[8..12]); // But this is know on compile time?
f(&mut arr[8..12] as &[u8;4]); // Why can't I do these things?
}
What I want can be achieved by below code(from other so threads)
use array_ref;
fn foo(){
let buf:[u8;12] = [0;12];
let (_, fixed_slice) = mut_array_refs![
&mut buf,
8,
4
];
write_u32_into(fixed_slice,0);
}
fn write_u32_into(fixed_slice:&mut [u8;12],num:u32){
// won't have to check if fixed_slice.len() == 12 and won't panic
}
But I looked into the crate and even though this never panics there are many unsafe blocks and many lines of code. It is a workaround for the Rust itself. In the first place I wanted something like this to get rid of the overhead of checking the size and the possible runtime panic.
Also this is a little overhead it doesn't matter isn't a valid answer because technically I should be able to guarantee this in compile time even if the overhead is small this doesn't mean rust doesn't need to have this type of feature or I should not be looking for an ideal way.
Note: Can this be solved with lifetimes?
Edit: If we where able to have a different syntax for fixed slices such as arr[12;;16] and when I borrowed them this way it would borrow it would borrow the whole arr. I think this way many functions for example (write_u32) would be implemented in a more "rusty" way.
Use let binding with slice_patterns feature. It was stabilized in Rust 1.42.
let v = [1, 2, 3]; // inferred [i32; 3]
let [_, ref subarray # ..] = v; // subarray is &[i32; 2]
let a = v[0]; // 1
let b = subarray[1]; // 3
Here is a section from the Rust reference about slice patterns.
Why it doesn't work
What you want is not available as a feature in rust stable or nightly because multiple things related to const are not stabilized yet, namely const generics and const traits. The reason traits are involved is because the arr[8..12] is a call to the core::ops::Index::<Range<usize>> trait that returns a reference to a slice, in your case [u8]. This type is unsized and not equal to [u8; 4] even if the compiler could figure out that it is, rust is inherently safe and can be overprotective sometimes to ensure safety.
What can you do then?
You have a few routes you can take to solve this issue, I'll stay in a no_std environment for all this as that seems to be where you're working and will avoid extra crates.
Change the function signature
The current function signature you have takes the four u8s as an owned value. If you only are asking for 4 values you can instead take those values as parameters to the function. This option breaks down when you need larger arrays but at that point, it would be better to take the array as a reference or using the method below.
The most common way, and the best way in my opinion, is to take the array in as a reference to a slice (&[u8] or &mut [u8]). This is not the same as taking a pointer to the value in C, slices in rust also carry the length of themselves so you can safely iterate through them without worrying about buffer overruns or if you read all the data. This does require changing the algorithms below to account for variable-sized input but most of the time there is a just as good option to use.
The safe way
Slice can be converted to arrays using TryInto, but this comes at the cost of runtime size checking which you seem to want to avoid. This is an option though and may result in a minimal performance impact.
Example:
fn f(arr: [u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
f(arr[8..12].try_into().unwrap());
}
The unsafe way
If you're willing to leave the land of safety there are quite a few things you can do to force the compiler to recognize the data as you want it to, but they can be abused. It's usually better to use rust idioms rather than force other methods in but this is a valid option.
fn basic(){
let mut arr:[u8;12] = [0;12];
f(unsafe {*(arr[8..12].as_ptr() as *const [u8; 4])});
}
TL;DR
I recommend changing your types to utilize slices rather than arrays but if that's not feasible I'd suggest avoiding unsafety, the performance won't be as bad as you think.

How do I collect the values of a HashMap into a vector?

I can not find a way to collect the values of a HashMap into a Vec in the documentation. I have score_table: HashMap<Id, Score> and I want to get all the Scores into all_scores: Vec<Score>.
I was tempted to use the values method (all_scores = score_table.values()), but it does not work since values is not a Vec.
I know that Values implements the ExactSizeIterator trait, but I do not know how to collect all values of an iterator into a vector without manually writing a for loop and pushing the values in the vector one after one.
I also tried to use std::iter::FromIterator; but ended with something like:
all_scores = Vec::from_iter(score_table.values());
expected type `std::vec::Vec<Score>`
found type `std::vec::Vec<&Score>`
Thanks to Hash map macro refuses to type-check, failing with a misleading (and seemingly buggy) error message?, I changed it to:
all_scores = Vec::from_iter(score_table.values().cloned());
and it does not produce errors to cargo check.
Is this a good way to do it?
The method Iterator.collect is designed for this specific task. You're right in that you need .cloned() if you want a vector of actual values instead of references (unless the stored type implements Copy, like primitives), so the code looks like this:
all_scores = score_table.values().cloned().collect();
Internally, collect() just uses FromIterator, but it also infers the type of the output. Sometimes there isn't enough information to infer the type, so you may need to explicitly specify the type you want, like so:
all_scores = score_table.values().cloned().collect::<Vec<Score>>();
If you don't need score_table anymore, you can transfer the ownership of Score values to all_scores by:
let all_scores: Vec<Score> = score_table.into_iter()
.map(|(_id, score)| score)
.collect();
This approach will be faster and consume less memory than the clone approach by #apetranzilla. It also supports any struct, not only structs that implement Clone.
There are three useful methods on HashMaps, which all return iterators:
values() borrows the collection and returns references (&T).
values_mut() gives mutable references &mut T which is useful to modify elements of the collection without destroying score_table.
into_values() gives you the elements directly: T! The iterator takes ownership of all the elements. This means that score_table no longer owns them, so you can't use score_table anymore!
In your example, you call values() to get &T references, then convert them to owned values T via a clone().
Instead, if we have an iterator of owned values, then we can convert it to a Vec using Iterator::collect():
let all_scores: Vec<Score> = score_table.into_values().collect();
Sometimes, you may need to specify the collecting type:
let all_scores = score_table.into_values().collect::<Vec<Score>>();

When should I use direct access into a Rust Vec instead of the get method?

Rust supports two methods for accessing the elements of a vector:
let mut v = vec![1, 2, 3];
let first_element = &v[0];
let second_element = v.get(1);
The get() method returns an Option type, which seems like a useful safety feature. The C-like syntax &v[0] seems shorter to type, but gives up the safety benefits, since invalid reads cause a run-time error rather than producing an indication that the read was out of bounds.
It's not clear to me when I would want to use the direct access approach, because it seems like the only advantage is that it's quicker to type (I save 3 characters). Is there some other advantage (perhaps a speedup?) that I'm not seeing? I guess I would save the conditional of a match expression, but that doesn't seem like it offers much benefit compared to the costs.
Neither of them is quicker because they both do bounds checks. In fact, your question is quite generic because there are other pairs of methods where one of them panics while the other returns an option, such as String::reserve vs String::try_reserve.
If you are sure that you are in bounds, use the brackets version. This is only a syntactic shortcut for get().unwrap().
If you are unsure of this, use the get() method and do your check.
If you critically need maximum speed and you cannot use an iterator and you have determined through benchmarks that the indexing is the bottleneck and you are sure to be in bounds, you can use the get_unchecked() method. Be careful about this because it is unsafe: it is always better to not have any unsafe block in your code.
Just a little bit of advice: if you are concerned by your program performance, avoid using those methods and prefer to use iterators as much as you can. For example, the second example is faster than the first one because in the first case there are one million bounds checks:
let v: Vec<_> = (0..1000_000).collect();
for idx in 0..1000_000 {
// do something with v[idx]
}
for num in &v {
// do something with num
}

Indexing vector by a 32-bit integer

In Rust, vectors are indexed using usize, so when writing
let my_vec: Vec<String> = vec!["Hello", "world"];
let index: u32 = 0;
println!("{}", my_vec[index]);
you get an error, as index is expected to be of type usize. I'm aware that this can be fixed by explicitly converting index to usize:
my_vec[index as usize]
but this is tedious to write. Ideally I'd simply overload the [] operator by implementing
impl<T> std::ops::Index<u32> for Vec<T> { ... }
but that's impossible as Rust prohibits this (as neither the trait nor struct are local). The only alternative that I can see is to create a wrapper class for Vec, but that would mean having to write lots of function wrappers as well. Is there any more elegant way to address this?
Without a clear use case it is difficult to recommend the best approach.
There are basically two questions here:
do you really need indexing?
do you really need to use u32 for indices?
When using functional programming style, indexing is generally unnecessary as you operate on iterators instead. In this case, the fact that Vec only implements Index for usize really does not matter.
If your algorithm really needs indexing, then why not use usize? There are many ways to convert from u32 to usize, converting at the last moment possible is one possibility, but there are other sites where you could do the conversion, and if you find a chokepoint (or create it) you can get away with only a handful of conversions.
At least, that's the YAGNI point of view.
Personally, as a type freak, I tend to wrap things around a lot. I just like to add semantic information, because let's face it Vec<i32> just doesn't mean anything.
Rust offers a simple way to create wrapper structures: struct MyType(WrappedType);. That's it.
Once you have your own type, adding indexing is easy. There are several ways to add other operations:
if only a few operations make sense, then adding explicitly is best.
if many operations are necessary, and you do not mind exposing the fact that underneath is a Vec<X>, then you can expose it:
by making it public: struct MyType(pub WrappedType);, users can then call .0 to access it.
by implementing AsRef and AsMut, or creating a getter.
by implementing Deref and DerefMut (which is implicit, make sure you really want to).
Of course, breaking encapsulation can be annoying later, as it also prevents the maintenance of invariants, so I would consider it a last ditch solution.
I prefer to store "references" to nodes as u32 rather than usize. So when traversing the graph I keep retrieving adjacent vertex "references", which I then use to look up the actual vertex object in the Vec object
So actually you don't want u32, because you will never do calculations on it, and u32 easily allows you to do math. You want an index-type that can just do indexing but whose values are immutable otherwise.
I suggest you implement something along the line of rustc_data_structures::indexed_vec::IndexVec.
This custom IndexVec type is not only generic over the element type, but also over the index type, and thus allows you to use a NodeId newtype wrapper around u32. You'll never accidentally use a non-id u32 to index, and you can use them just as easily as a u32. You don't even have to create any of these indices by calculating them from the vector length, instead the push method returns the index of the location where the element has just been inserted.

Resources