How does skip works in Rust? - rust

I've just seen this answer on how to search for a substring in a given Rust string. The code is something along these lines:
let s = "Hello, world!";
let ss: String = s.chars().skip(7).take(5).collect();
But I am now curious to know how that skip really works. Does it really iterate over the first 7 elements of the iterator before starts yielding? Or it does some trickery behind the scenes?

Taken from the Rust documentation:
Creates an iterator that skips the first n elements.
After they have been consumed, the rest of the elements are yielded.
Basic example:
let a = [1, 2, 3];
let mut iter = a.iter().skip(2);
assert_eq!(iter.next(), Some(&3));
assert_eq!(iter.next(), None);

If you look at Skip implementation, you can see it relies on Iterator::nth internally.
Chars implementation of iterator uses the default nth implementation, that call next in a loop.
slice::Iter implementation offers a specialized implementation of nth, basically just bumping the internal cursor.
As #tadman mentioned, you can't just jump to a char position in a UTF-8 string, so the default implementation is probably the right only.

Related

How do I make a HashSet from an iterator of chars?

I'm just starting to learn Rust and I'm still working on understanding its approach. The particular thing I'm working on is trying to find out if two strings have any characters in common. In another language I might do this by creating two sets of the characters in the strings and performing an intersection on the sets. So far I'm having no luck in creating a HashSet from the characters in a string in Rust. I'm trying variations on this:
let lines: Vec<&str> = text_from_file.lines().collect();
let set1 = HashSet::from(lines[0].chars());
With this variation I get the error "the trait bound std::collections::HashSet<_, _>: std::convert::From<&[u8]> is not satisfied". I don't understand Rust enough yet to know how to interpret this. How can I create a HashSet from the characters in a string?
HashSet::from() requires a slice as parameter, but lines[0].chars() is a Chars object, which is an iterator.
To create a HashSet from an iterator, you have two possibilities:
let set1: HashSet<char> = lines[0].chars().collect();
let set1: HashSet<char> = HashSet::from_iter(lines[0].chars());
I prefer the first one, as it's much easier to read for me.
You want to use HashSet::from_iter()
let lines: Vec<&str> = text_from_file.lines().collect();
let set1: HashSet<char> = HashSet::from_iter(lines[0].chars());

Most idiomatic way to double every character in a string in Rust

I have a String, and I want to make a new String, with every character in the first one doubled. So "abc" would become "aabbcc" and so on.
The best I've come up with is:
let mut result = String::new();
for c in original_string.chars() {
result.push(c);
result.push(c);
}
result
This works fine. but is there a more succinct (or more idiomatic) way to do this?
In JavaScript I would probably write something like:
original.split('').map(c => c+c).join('')
Or in Ruby:
(original.chars.map { |c| c+c }).join('')
Since Rust also has functional elements, I was wondering if there is a similarly succinct solution.
I would use std::iter::repeat to repeat every char value from the input. This creates an infinite iterator, but for your case we only need to iterate 2 times, so we can use take to limit our iterator, then flatten all the iterators that hold the doubled chars.
use std::iter;
fn main() {
let input = "abc"; //"abc".to_string();
let output = input
.chars()
.flat_map(|c| iter::repeat(c).take(2))
.collect::<String>();
println!("{:?}", output);
}
Playground
Note: To double we are using take(2) but you can use any usize to increase the repetition.
Personally, I would just do exactly what you're doing. Its intent is clear (more clear than the functional approaches you presented from JavaScript or Ruby, in my opinion) and it is efficient. The only thing I would change is perhaps reserve space for the characters, since you know exactly how much space you will need.
let mut result = String::with_capacity(original_string.len() * 2);
However, if you are really in love with this style, you could use flat_map
let result: String = original_string.chars()
.flat_map(|c| std::iter::repeat(c).take(2))
.collect();

How to prepend a slice to a Vec [duplicate]

This question already has answers here:
Efficiently insert or replace multiple elements in the middle or at the beginning of a Vec?
(3 answers)
Closed 5 years ago.
I was expecting a Vec::insert_slice(index, slice) method — a solution for strings (String::insert_str()) does exist.
I know about Vec::insert(), but that inserts only one element at a time, not a slice. Alternatively, when the prepended slice is a Vec one can append to it instead, but this does not generalize. The idiomatic solution probably uses Vec::splice(), but using iterators as in the example makes me scratch my head.
Secondly, the whole concept of prepending has seemingly been exorcised from the docs. There isn't a single mention. I would appreciate comments as to why. Note that relatively obscure methods like Vec::swap_remove() do exist.
My typical use case consists of indexed byte strings.
String::insert_str makes use of the fact that a string is essentially a Vec<u8>. It reallocates the underlying buffer, moves all the initial bytes to the end, then adds the new bytes to the beginning.
This is not generally safe and can not be directly added to Vec because during the copy the Vec is no longer in a valid state — there are "holes" in the data.
This doesn't matter for String because the data is u8 and u8 doesn't implement Drop. There's no such guarantee for an arbitrary T in a Vec, but if you are very careful to track your state and clean up properly, you can do the same thing — this is what splice does!
the whole concept of prepending has seemingly been exorcised
I'd suppose this is because prepending to a Vec is a poor idea from a performance standpoint. If you need to do it, the naïve case is straight-forward:
fn prepend<T>(v: Vec<T>, s: &[T]) -> Vec<T>
where
T: Clone,
{
let mut tmp: Vec<_> = s.to_owned();
tmp.extend(v);
tmp
}
This has a bit higher memory usage as we need to have enough space for two copies of v.
The splice method accepts an iterator of new values and a range of values to replace. In this case, we don't want to replace anything, so we give an empty range of the index we want to insert at. We also need to convert the slice into an iterator of the appropriate type:
let s = &[1, 2, 3];
let mut v = vec![4, 5];
v.splice(0..0, s.iter().cloned());
splice's implementation is non-trivial, but it efficiently does the tracking we need. After removing a chunk of values, it then reuses that chunk of memory for the new values. It also moves the tail of the vector around (maybe a few times, depending on the input iterator). The Drop implementation of Slice ensures that things will always be in a valid state.
I'm more surprised that VecDeque doesn't support it, as it's designed to be more efficient about modifying both the head and tail of the data.
Taking into consideration what Shepmaster said, you could implement a function prepending a slice with Copyable elements to a Vec just like String::insert_str() does in the following way:
use std::ptr;
unsafe fn prepend_slice<T: Copy>(vec: &mut Vec<T>, slice: &[T]) {
let len = vec.len();
let amt = slice.len();
vec.reserve(amt);
ptr::copy(vec.as_ptr(),
vec.as_mut_ptr().offset((amt) as isize),
len);
ptr::copy(slice.as_ptr(),
vec.as_mut_ptr(),
amt);
vec.set_len(len + amt);
}
fn main() {
let mut v = vec![4, 5, 6];
unsafe { prepend_slice(&mut v, &[1, 2, 3]) }
assert_eq!(&v, &[1, 2, 3, 4, 5, 6]);
}

How do I convert a Vec<String> to Vec<&str>?

I can convert Vec<String> to Vec<&str> this way:
let mut items = Vec::<&str>::new();
for item in &another_items {
items.push(item);
}
Are there better alternatives?
There are quite a few ways to do it, some have disadvantages, others simply are more readable to some people.
This dereferences s (which is of type &String) to a String "right hand side reference", which is then dereferenced through the Deref trait to a str "right hand side reference" and then turned back into a &str. This is something that is very commonly seen in the compiler, and I therefor consider it idiomatic.
let v2: Vec<&str> = v.iter().map(|s| &**s).collect();
Here the deref function of the Deref trait is passed to the map function. It's pretty neat but requires useing the trait or giving the full path.
let v3: Vec<&str> = v.iter().map(std::ops::Deref::deref).collect();
This uses coercion syntax.
let v4: Vec<&str> = v.iter().map(|s| s as &str).collect();
This takes a RangeFull slice of the String (just a slice into the entire String) and takes a reference to it. It's ugly in my opinion.
let v5: Vec<&str> = v.iter().map(|s| &s[..]).collect();
This is uses coercions to convert a &String into a &str. Can also be replaced by a s: &str expression in the future.
let v6: Vec<&str> = v.iter().map(|s| { let s: &str = s; s }).collect();
The following (thanks #huon-dbaupp) uses the AsRef trait, which solely exists to map from owned types to their respective borrowed type. There's two ways to use it, and again, prettiness of either version is entirely subjective.
let v7: Vec<&str> = v.iter().map(|s| s.as_ref()).collect();
and
let v8: Vec<&str> = v.iter().map(AsRef::as_ref).collect();
My bottom line is use the v8 solution since it most explicitly expresses what you want.
The other answers simply work. I just want to point out that if you are trying to convert the Vec<String> into a Vec<&str> only to pass it to a function taking Vec<&str> as argument, consider revising the function signature as:
fn my_func<T: AsRef<str>>(list: &[T]) { ... }
instead of:
fn my_func(list: &Vec<&str>) { ... }
As pointed out by this question: Function taking both owned and non-owned string collections. In this way both vectors simply work without the need of conversions.
All of the answers idiomatically use iterators and collecting instead of a loop, but do not explain why this is better.
In your loop, you first create an empty vector and then push into it. Rust makes no guarantees about the strategy it uses for growing factors, but I believe the current strategy is that whenever the capacity is exceeded, the vector capacity is doubled. If the original vector had a length of 20, that would be one allocation, and 5 reallocations.
Iterating from a vector produces an iterator that has a "size hint". In this case, the iterator implements ExactSizeIterator so it knows exactly how many elements it will return. map retains this and collect takes advantage of this by allocating enough space in one go for an ExactSizeIterator.
You can also manually do this with:
let mut items = Vec::<&str>::with_capacity(another_items.len());
for item in &another_items {
items.push(item);
}
Heap allocations and reallocations are probably the most expensive part of this entire thing by far; far more expensive than taking references or writing or pushing to a vector when no new heap allocation is involved. It wouldn't surprise me if pushing a thousand elements onto a vector allocated for that length in one go were faster than pushing 5 elements that required 2 reallocations and one allocation in the process.
Another unsung advantage is that using the methods with collect do not store in a mutable variable which one should not use if it's unneeded.
another_items.iter().map(|item| item.deref()).collect::<Vec<&str>>()
To use deref() you must add using use std::ops::Deref
This one uses collect:
let strs: Vec<&str> = another_items.iter().map(|s| s as &str).collect();
Here is another option:
use std::iter::FromIterator;
let v = Vec::from_iter(v.iter().map(String::as_str));
Note that String::as_str is stable since Rust 1.7.

How to get a slice from an Iterator?

I started to use clippy as a linter. Sometimes, it shows this warning:
writing `&Vec<_>` instead of `&[_]` involves one more reference and cannot be
used with non-Vec-based slices. Consider changing the type to `&[...]`,
#[warn(ptr_arg)] on by default
I changed the parameter to a slice but this adds boilerplate on the call side. For instance, the code was:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
but now it is:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect::<Vec<_>>();
function(&names);
otherwise, I get the following error:
error: the trait `core::marker::Sized` is not implemented for the type
`[collections::string::String]` [E0277]
So I wonder if there is a way to convert an Iterator to a slice or avoid having to specify the collected type in this specific case.
So I wonder if there is a way to convert an Iterator to a slice
There is not.
An iterator only provides one element at a time, whereas a slice is about getting several elements at a time. This is why you first need to collect all the elements yielded by the Iterator into a contiguous array (Vec) before being able to use a slice.
The first obvious answer is not to worry about the slight overhead, though personally I would prefer placing the type hint next to the variable (I find it more readable):
let names: Vec<_> = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
Another option would be for function to take an Iterator instead (and an iterator of references, at that):
let names = args.arguments.iter().map(|arg| &arg.name);
function(names);
After all, iterators are more general, and you can always "realize" the slice inside the function if you need to.
So I wonder if there is a way to convert an Iterator to a slice
There is. (in applicable cases)
Got here searching "rust iter to slice", for my use-case, there was a solution:
fn main() {
// example struct
#[derive(Debug)]
struct A(u8);
let list = vec![A(5), A(6), A(7)];
// list_ref passed into a function somewhere ...
let list_ref: &[A] = &list;
let mut iter = list_ref.iter();
// consume some ...
let _a5: Option<&A> = iter.next();
// now want to eg. return a slice of the rest
let slice: &[A] = iter.as_slice();
println!("{:?}", slice); // [A(6), A(7)]
}
That said, .as_slice is defined on an iter of an existing slice, so the previous answerer was correct in that if you've got, eg. a map iter, you would need to collect it first (so there is something to slice from).
docs: https://doc.rust-lang.org/std/slice/struct.Iter.html#method.as_slice

Resources