Rust - Multiple Calls to Iterator Methods - rust

I have this following rust code:
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut chars = line.char_indices();
for (i, c) in chars {
match c {
'"' => {
if let Some(pos) = chars.position(|(_, x)| x == '"') {
tokens.push(&line[i..=i+pos]);
} else {
// Not a complete string
}
}
// Other options...
}
}
tokens
}
I am trying to elegantly extract a string surrounded by double quotes from the line, but since chars.position takes a mutable reference and chars is moved into the for loop, I get a compilation error - "value borrowed after move". The compiler suggests borrowing chars in the for loop but this doesn't work because an immutable reference is not an iterator (and a mutable one would cause the original problem where I can't borrow mutably again for position).
I feel like there should be a simple solution to this.
Is there an idiomatic way to do this or do I need to regress to appending characters one by one?

Because a for loop will take ownership of chars (because it calls .into_iter() on it) you can instead manually iterate through chars using a while loop:
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut chars = line.char_indices();
while let Some((i, c)) = chars.next() {
match c {
'"' => {
if let Some(pos) = chars.position(|(_, x)| x == '"') {
tokens.push(&line[i..=i+pos]);
} else {
// Not a complete string
}
}
// Other options...
}
}
}

It works if you just desugar the for-loop:
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut chars = line.char_indices();
while let Some((i, c)) = chars.next() {
match c {
'"' => {
if let Some(pos) = chars.position(|(_, x)| x == '"') {
tokens.push(&line[i..=i+pos]);
} else {
// Not a complete string
}
},
_ => {},
}
}
tokens
}
The normal for-loop prevents additional modification of the iterator because this usually leads to surprising and hard-to-read code. Doing it as a while-loop has no such protection.
If all you want to do is find quoted strings, I would not, however, go with an iterator at all here.
fn tokenize(line: &str) -> Vec<&str> {
let mut tokens = Vec::new();
let mut line = line;
while let Some(pos) = line.find('"') {
line = &line[(pos+1)..];
if let Some(end) = line.find('"') {
tokens.push(&line[..end]);
line = &line[(end+1)..];
} else {
// Not a complete string
}
}
tokens
}

Related

Conditionally sort a Vec in Rust

Let's say I want to sort a Vec of non-Clone items - but only maybe (this is a boiled down example of an issue in my code).
My attempt would be something like:
fn maybe_sort<T>(x: Vec<T>) -> Vec<T>
where
T: std::cmp::Ord,
{
// First, I need a copy of the vector - but only the vector, not the items inside
let mut copied = x.iter().collect::<Vec<_>>();
copied.sort();
// In my actual code the line below depends on the sorted vec
if rand::random() {
return copied.into_iter().map(|x| *x).collect::<Vec<_>>();
} else {
return x;
}
}
Alas the borrow checker isn't happy. I have a shared reference to each item in the Vec, and although I am not ever returning 2 references to the same item, Rust can't tell.
Is there a way to do this without unsafe? (and if not, what's the cleanest way to do it with unsafe.
You can .enumerate() the values to keep their original index. You can sort this based on its value T and decide whether to return the sorted version, or reverse the sort by sorting by original index.
fn maybe_sort<T: Ord>(x: Vec<T>) -> Vec<T> {
let mut items: Vec<_> = x.into_iter().enumerate().collect();
items.sort_by(|(_, a), (_, b)| a.cmp(b));
if rand::random() {
// return items in current order
}
else {
// undo the sort
items.sort_by_key(|(index, _)| *index);
}
items.into_iter().map(|(_, value)| value).collect()
}
If T implements Default, you can do it with a single sort and without unsafe like this:
fn maybe_sort<T: Ord + Default> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), Default::default);
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = v;
}
return r;
}
}
Playground
If T does not implement Default, the same thing can be done with MaybeUninit:
use std::mem::{self, MaybeUninit};
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), || unsafe { MaybeUninit::uninit().assume_init() });
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = MaybeUninit::new (v);
}
return unsafe { mem::transmute::<_, Vec<T>> (r) };
}
}
Playground
Finally, here's a safe solution which doesn't require T to implement Default, but allocates an extra buffer (there is theoretically a way to reorder the indices in place, but I'll leave it as an exercise to the reader ☺):
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
let mut rev = vec![0; x.len()];
for (i, &j) in idx.iter().enumerate() {
rev[j] = i;
}
for i in 0..x.len() {
while rev[i] != i {
let j = rev[i];
x.swap (j, i);
rev.swap (j, i);
}
}
}
x
}
Playground

How can I duplicate the first and last elements of a vector?

I would like to take a vector of characters and duplicate the first letter and the last one.
The only way I managed to do that is with this ugly code:
fn repeat_ends(s: &Vec<char>) -> Vec<char> {
let mut result: Vec<char> = Vec::new();
let first = s.first().unwrap();
let last = s.last().unwrap();
result.push(*first);
result.append(&mut s.clone());
result.push(*last);
result
}
fn main() {
let test: Vec<char> = String::from("Hello world !").chars().collect();
println!("{:?}", repeat_ends(&test)); // "HHello world !!"
}
What would be a better way to do it?
I am not sure if it is "better" but one way is using slice patterns:
fn repeat_ends(s: &Vec<char>) -> Vec<char> {
match s[..] {
[first, .. , last ] => {
let mut out = Vec::with_capacity(s.len() + 2);
out.push(first);
out.extend(s);
out.push(last);
out
},
_ => panic!("whatever"), // or s.clone()
}
}
If it can be mutable:
fn repeat_ends(s: &mut Vec<char>) {
if let [first, .. , last ] = s[..] {
s.insert(0, first);
s.push(last);
}
}
If it's ok to mutate the original vector, this does the job:
fn repeat_ends(s: &mut Vec<char>) {
let first = *s.first().unwrap();
s.insert(0, first);
let last = *s.last().unwrap();
s.push(last);
}
fn main() {
let mut test: Vec<char> = String::from("Hello world !").chars().collect();
repeat_ends(&mut test);
println!("{}", test.into_iter().collect::<String>()); // "HHello world !!"
}
Vec::insert:
Inserts an element at position index within the vector, shifting all elements after it to the right.
This means the function repeat_ends would be O(n) with n being the number of characters in the vector. I'm not sure if there is a more efficient method if you need to use a vector, but I'd be curious to hear it if there is.

How to shuffle a vector except for the first and last elements without using third party libraries?

I have a task to shuffle words but the first and last letter of every word must be unchanged. When I try to use filter() it doesn't work properly.
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
print!("MAIN:{:?}", mix("Evening,morning"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
if group.len() > 4 {
let k = group.first().unwrap().clone();
let c = group[group.len() - 1].clone();
group
.chunks_exact_mut(2)
.filter(|x| x != &[k])
.for_each(|x| x.swap(0, 1))
}
}
let s: String = a.iter().collect();
s
}
Is this what you are looking for?
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for words in a.split_mut(|num| SEPARATORS.contains(*num)) {
if words.len() > 4 {
let initial_letter = words.first().unwrap().clone();
let last_letter = words[words.len() - 1].clone();
words[0] = last_letter;
words[words.len() - 1] = initial_letter;
}
}
let s: String = a.iter().collect();
s
}

How to skip n items from inside of an iterator loop?

This code:
play
fn main() {
let text = "abcd";
for char in text.chars() {
if char == 'b' {
// skip 2 chars
}
print!("{}", char);
}
// prints `abcd`, but I want `ad`
}
prints abcd, but I want to skip 2 chars if b was found, so that it prints ad. How do I do that?
I tried to put the iterator into a variable outside the loop and manipulate that iterator within the loop, but the Borrow Checker doesn't allow that.
AFAIK you can't do that with a for loop. You will need to desugar it by hand:
let mut it = text.chars();
while let Some(char) = it.next() {
if char == 'b' {
it.nth(1); // nth(1) skips/consumes exactly 2 items
continue;
}
print!("{}", char);
}
Playground
If you want to keep an iterator style, you can use std::iter::successors (I've replaced the special char with '!' for being more readable:
fn my_iter<'a>(s: &'a str) -> impl Iterator<Item = char> + 'a {
let mut it = s.chars();
std::iter::successors(it.next(), move |c| {
if *c == '!' {
it.next().and_then(|_| it.next())
} else {
it.next()
}
})
.filter(|c| *c != '!')
}
fn main() {
assert!(my_iter("a!bc").eq("ac".chars()));
assert!(my_iter("!abcd").eq("bcd".chars()));
assert!(my_iter("abc!d").eq("abc".chars()));
assert!(my_iter("abcd!").eq("abcd".chars()));
}

Using the same iterator multiple times in Rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?
As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.
take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

Resources