Why use .iter() in a for loop | Rust [duplicate] - rust

This question already has answers here:
What is the difference between iter and into_iter?
(5 answers)
Closed 6 months ago.
In these two examples is there any benefit in using .iter() in a for loop?
let chars = ['g', 'd', 'k', 'k', 'n'];
for i in chars {
println!("{}", i);
}
let chars = ['g', 'd', 'k', 'k', 'n'];
for i in chars.iter() {
println!("{}", i);
}

for i in array is interpreted by the compiler as for i in array.into_iter().
This means that you are iterating over elements of type char, and the array is copied (as an array is Copy if its elements are also Copy).
On the other hand, for i in array.iter() references the array instead iterates over elements of type &char, avoiding a copy.

Related

Efficient ways to build new Strings in Rust

I have just recently started learning Rust and have been messing around with some code. I wanted to create a simple function that removes vowels from a String and returns a new String. The code below functions, but I was concerned if this truly was a valid, typical approach in this language or if I'm missing something...
// remove vowels by building a String using .contains() on a vowel array
fn remove_vowels(s: String) -> String {
let mut no_vowels: String = String::new();
for c in s.chars() {
if !['a', 'e', 'i', 'o', 'u'].contains(&c) {
no_vowels += &c.to_string();
}
}
return no_vowels;
}
First, using to_string() to construct a new String and then using & to borrow just seemed off. Is there a simpler way to append characters to a String, or is this the only way to go? Or should I rewrite this entirely and iterate through the inputted String using a loop by its length, not by a character array?
Also, I have been informed that it's quite popular in Rust to not use the return statement but to instead let the last expression return the value from the function. Is my return statement required here, or is there a cleaner way to return that value that follows convention?
If you consume the original String as your example does, you can remove the vowels in-place using retain(), which will avoid allocating a new string:
fn remove_vowels(mut s: String) -> String {
s.retain(|c| !['a', 'e', 'i', 'o', 'u'].contains(&c));
s
}
See it working on the playground. Side note: you may want to consider uppercase vowels as well.
You can use collect on an iterator of characters to create a String. You can filter out the characters you don't want using filter.
// remove vowels by building a String using .contains() on a vowel array
fn remove_vowels(s: &str) -> String {
s.chars()
.filter(|c| !['a', 'e', 'i', 'o', 'u'].contains(c))
.collect()
}
playground
If this is in a performance critical region, then since you know the characters you're removing are single bytes in utf8, they are OK to remove directly from the bytes instead. Which means you can write something like
fn remove_vowels(s: &str) -> String {
String::from_utf8(
s.bytes()
.filter(|c| ![b'a', b'e', b'i', b'o', b'u'].contains(c))
.collect()
).unwrap()
}
which may be more efficient. playground

Convert String to Vec<char> at compile time for pattern matching

I'm writing a parser in Rust and I'm creating tokens from a Vec<char>. Currently, my code looks like
match &source[..] {
['l', 'e', 't', ..] => ...,
['t', 'r', 'u', 'e', ..] => ...,
_ => ...
}
Obviously this is a lot more verbose than I'd like, and not easy to read. Is there any way I can convert "let" to ['l', 'e', 't'] at compile time (with a macro or const function) in order to pattern match on it like this?
I don't think that you can do that with the macros from the Rust standard library, but you could write your own macro:
use proc_macro::{TokenStream, TokenTree, Group, Delimiter, Punct, Literal, Spacing};
use syn::{parse_macro_input, LitStr};
#[proc_macro]
pub fn charize(input: TokenStream) -> TokenStream {
// some stuff for later
let comma_token = TokenTree::Punct(Punct::new(',', Spacing::Alone));
let rest_token_iterator = std::iter::once(TokenTree::Punct(Punct::new('.', Spacing::Joint))).chain(std::iter::once(TokenTree::Punct(Punct::new('.', Spacing::Alone))));
let string_to_charize: String = parse_macro_input!(input as LitStr).value();
let char_tokens_iterator = string_to_charize.chars().map(|char| TokenTree::Literal(Literal::character(char)));
// if you are on nightly, Iterator::intersperse() is much cleaner than this (https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.intersperse)
let char_tokens_interspersed_iterator = char_tokens_iterator.map(|token| [comma_token.clone(), token]).flatten().skip(1);
let char_tokens_interspersed_with_rest_iterator = char_tokens_interspersed_iterator.chain(std::iter::once(comma_token.clone())).chain(rest_token_iterator);
std::iter::once(TokenTree::Group(Group::new(Delimiter::Bracket, char_tokens_interspersed_with_rest_iterator.collect()))).collect()
}
Macro in action:
match &['d', 'e', 'm', 'o', 'n', 's', 't', 'r', 'a', 't', 'i', 'o', 'n'][..] {
charize!("doesn't match") => println!("Does not match"),
charize!("demo") => println!("It works"),
charize!("also doesn't match") => println!("Does not match"),
_ => panic!("Does not match")
}
Note that this is a procedural macro and as such must live in a proc_macro crate.

Move items from vec to other by indexes in Rust

I have a vector items of items and a vector idxs of indexes, how can I get a vector picked filled by moving all values at once from items indexed in idxs ?
For example:
let mut items: Vec<char> = ['a', 'b', 'c', 'd', 'e', 'f'];
let mut idxs: Vec<usize> = [3, 4, 1];
let picked = pick(&mut items, &idxs);
// items should be: ['a', 'c', 'f']
// picked should be: ['d', 'e', 'b']
I can make it with:
let mut picked: Vec<char> = Vec::new();
let placeholder = 'z';
for idx in idxs {
items.insert(idx, placeholder); // insert any placeholder value of type T for keeping order
let item = items.remove(idx + 1);
picked.push(item);
}
items = items.filter(|item| item != placeholder);
But I think I am overkilling it. And keeping a placeholder value for each different types is complicated, in my case I have to avoid it.
Is there a more idiomatic way to do that ?
Here are two algorithms for the problem.
The following algorithm is O(n + m). That is the best possible asymptotic run time assuming that items must stay in its original order, since that means all elements must potentially be moved to compact them after the removals.
fn pick<T>(items: &mut Vec<T>, idxs: &[usize]) -> Vec<T> {
// Move the items into a vector of Option<T> we can remove items from
// without reordering.
let mut opt_items: Vec<Option<T>> = items.drain(..).map(Some).collect();
// Take the items.
let picked: Vec<T> = idxs
.into_iter()
.map(|&i| opt_items[i].take().expect("duplicate index"))
.collect();
// Put the unpicked items back.
items.extend(opt_items.into_iter().filter_map(|opt| opt));
picked
}
fn main() {
let mut items: Vec<char> = vec!['a', 'b', 'c', 'd', 'e', 'f'];
let idxs: Vec<usize> = vec![3, 4, 1];
let picked = pick(&mut items, &idxs);
dbg!(picked, items);
}
This algorithm is instead O(m log m) (where m is the length of idxs). The price for this is that it reorders the un-picked elements of items.
fn pick<T>(items: &mut Vec<T>, idxs: &[usize]) -> Vec<T> {
// Second element is the index into `idxs`.
let mut sorted_idxs: Vec<(usize, usize)> =
idxs.iter().copied().enumerate().map(|(ii, i)| (i, ii)).collect();
sorted_idxs.sort();
// Set up random-access output storage.
let mut output: Vec<Option<T>> = Vec::new();
output.resize_with(idxs.len(), || None);
// Take the items, in reverse sorted order.
// Reverse order ensures that `swap_remove` won't move any item we want.
for (i, ii) in sorted_idxs.into_iter().rev() {
output[ii] = Some(items.swap_remove(i));
}
// Unwrap the temporary `Option`s.
output.into_iter().map(Option::unwrap).collect()
}
Both of these algorithms could be optimized by using unsafe code to work with uninitialized/moved memory instead of using vectors of Option. The second algorithm would then need a check for duplicate indices to be safe.
If idxs is unsorted and order matters, and if you can't use a placeholder, then you can move the items like this:
let mut picked: Vec<char> = Vec::new();
let mut idxs = idxs.clone(); // Not required if you are allowed to mutate the original idx.
for i in 0 .. idxs.len() {
picked.push (items.remove (idxs[i]));
for j in i+1 .. idxs.len() {
if idxs[j] > idxs[i] { idxs[j] -= 1; }
}
}

Creating a sliding window iterator of slices of chars from a String

I am looking for the best way to go from String to Windows<T> using the windows function provided for slices.
I understand how to use windows this way:
fn main() {
let tst = ['a', 'b', 'c', 'd', 'e', 'f', 'g'];
let mut windows = tst.windows(3);
// prints ['a', 'b', 'c']
println!("{:?}", windows.next().unwrap());
// prints ['b', 'c', 'd']
println!("{:?}", windows.next().unwrap());
// etc...
}
But I am a bit lost when working this problem:
fn main() {
let tst = String::from("abcdefg");
let inter = ? //somehow create slice of character from tst
let mut windows = inter.windows(3);
// prints ['a', 'b', 'c']
println!("{:?}", windows.next().unwrap());
// prints ['b', 'c', 'd']
println!("{:?}", windows.next().unwrap());
// etc...
}
Essentially, I am looking for how to convert a string into a char slice that I can use the window method with.
The problem that you are facing is that String is really represented as something like a Vec<u8> under the hood, with some APIs to let you access chars. In UTF-8 the representation of a code point can be anything from 1 to 4 bytes, and they are all compacted together for space-efficiency.
The only slice you could get directly of an entire String, without copying everything, would be a &[u8], but you wouldn't know if the bytes corresponded to whole or just parts of code points.
The char type corresponds exactly to a code point, and therefore has a size of 4 bytes, so that it can accommodate any possible value. So, if you build a slice of char by copying from a String, the result could be up to 4 times larger.
To avoid making a potentially large, temporary memory allocation, you should consider a more lazy approach – iterate through the String, making slices at exactly the char boundaries. Something like this:
fn char_windows<'a>(src: &'a str, win_size: usize) -> impl Iterator<Item = &'a str> {
src.char_indices()
.flat_map(move |(from, _)| {
src[from ..].char_indices()
.skip(win_size - 1)
.next()
.map(|(to, c)| {
&src[from .. from + to + c.len_utf8()]
})
})
}
This will give you an iterator where the items are &str, each with 3 chars:
let mut windows = char_windows(&tst, 3);
for win in windows {
println!("{:?}", win);
}
The nice thing about this approach is that it hasn't done any copying at all - each &str produced by the iterator is still a slice into the original source String.
All of that complexity is because Rust uses UTF-8 encoding for strings by default. If you absolutely know that your input string doesn't contain any multi-byte characters, you can treat it as ASCII bytes, and taking slices becomes easy:
let tst = String::from("abcdefg");
let inter = tst.as_bytes();
let mut windows = inter.windows(3);
However, you now have slices of bytes, and you'll need to turn them back into strings to do anything with them:
for win in windows {
println!("{:?}", String::from_utf8_lossy(win));
}
This solution will work for your purpose. (playground)
fn main() {
let tst = String::from("abcdefg");
let inter = tst.chars().collect::<Vec<char>>();
let mut windows = inter.windows(3);
// prints ['a', 'b', 'c']
println!("{:?}", windows.next().unwrap());
// prints ['b', 'c', 'd']
println!("{:?}", windows.next().unwrap());
// etc...
println!("{:?}", windows.next().unwrap());
}
String can iterate over its chars, but it's not a slice, so you have to collect it into a vec, which then coerces into a slice.
You can use itertools to walk over windows of any iterator, up to a width of 4:
extern crate itertools; // 0.7.8
use itertools::Itertools;
fn main() {
let input = "日本語";
for (a, b) in input.chars().tuple_windows() {
println!("{}, {}", a, b);
}
}
See also:
Are there equivalents to slice::chunks/windows for iterators to loop over pairs, triplets etc?

How do I convert from a char array [char; N] to a string slice &str?

Given a fixed-length char array such as:
let s: [char; 5] = ['h', 'e', 'l', 'l', 'o'];
How do I obtain a &str?
You can't without some allocation, which means you will end up with a String.
let s2: String = s.iter().collect();
The problem is that strings in Rust are not collections of chars, they are UTF-8, which is an encoding without a fixed size per character.
For example, the array in this case would take 5 x 32-bits for a total of 20 bytes. The data of the string would take 5 bytes total (although there's also 3 pointer-sized values, so the overall String takes more memory in this case).
We start with the array and call []::iter, which yields values of type &char. We then use Iterator::collect to convert the Iterator<Item = &char> into a String. This uses the iterator's size_hint to pre-allocate space in the String, reducing the need for extra allocations.
Another quick one-liner I didn't see above:
let whatever_char_array = ['h', 'e', 'l', 'l', 'o'];
let string_from_char_array = String::from_iter(whatever_char_array);
Note:
This feature (iterating over an array) was introduced recently. I tried looking for the exact rustc version, but could not...
I will give you a very simple functional solution but it's not the best one. You can learn some basics:
let s: [char; 5] = ['h', 'e', 'l', 'l', 'o'];
let mut str = String::from("");
for x in &s {
str.push(*x);
}
println!("{}", str);
Before the variable names you can put an underscore if you want to keep the signature, but in this simple example it is not necessary. The program starts by creating an empty mutable String so you can add elements (chars) to the String. Then we make a for loop over the s array by taking its reference. We add each element to the initial string. At the end you can return your string or just print it.

Resources