Using the same iterator multiple times in Rust - rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?

As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.

take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

Related

Peek inmplementation for linked list in rust

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=693594655ea355b40e2175542c653879
I want peek() to remove the last element of the list, returning data. What am I missing?
type Link<T> = Option<Box<Node<T>>>;
struct Node<T> {
pub data: T,
pub next: Link<T>,
}
struct List<T> {
pub head: Link<T>,
}
impl<T> List<T> {
fn peek(&mut self) -> Option<T> {
let mut node = &self.head;
while let Some(cur_node) = &mut node {
if cur_node.next.is_some() {
node = &cur_node.next;
continue;
}
}
let last = node.unwrap();
let last = last.data;
return Some(last);
}
}
#[test]
fn peek_test() {
let mut q = List::new();
q.push(1);
q.push(2);
q.push(3);
assert_eq!(q.empty(), false);
assert_eq!(q.peek().unwrap(), 1);
assert_eq!(q.peek().unwrap(), 2);
assert_eq!(q.peek().unwrap(), 3);
assert_eq!(q.empty(), true);
}
To save the head, I need to access the elements by reference, but the puzzle does not fit in my head. I looked at "too-many-lists", but the value is simply returned by reference, and I would like to remove the tail element.
To make this work you have to switch from taking a shared reference (&) to a mutable one.
This results in borrow checker errors with your code wihch is why I had to change the while let loop into one
which checks if the next element is Some and only then borrows node's content mutably and advances it.
At last I Option::take that last element and return it's data. I use Option::map to avoid having to unwrap which would panic for empty lists anyways if you wanted to keep your variant you should replace unwrap with the try operator ?.
So in short you can implement a pop_back like this:
pub fn pop_back(&mut self) -> Option<T> {
let mut node = &mut self.head;
while node.as_ref().map(|n| n.next.is_some()).unwrap_or_default() {
node = &mut node.as_mut().unwrap().next;
}
node.take().map(|last| last.data)
}
I suggest something like below, Just because I spent time on it .-)
fn peek(&mut self) -> Option<T> {
match &self.head {
None => return None,
Some(v) =>
if v.next.is_none() {
let last = self.head.take();
let last = last.unwrap().data;
return Some(last);
}
}
let mut current = &mut self.head;
loop {
match current {
None => return None,
Some(node) if node.next.is_some() && match &node.next { None => false, Some(v) => v.next.is_none()} => {
let last = node.next.take();
let last = last.unwrap().data;
return Some(last);
},
Some(node) => {
current = &mut node.next;
}
}
}
}

Is there a way in Rust to overload method for a specific type?

The following is only an example. If there's a native solution for this exact problem with reading bytes - cool, but my goal is to learn how to do it by myself, for any other purpose as well.
I'd like to do something like this: (pseudo-code below)
let mut reader = Reader::new(bytesArr);
let int32: i32 = reader.read(); // separate implementation to read 4 bits and convert into int32
let int64: i64 = reader.read(); // separate implementation to read 8 bits and convert into int64
I imagine it looking like this: (pseudo-code again)
impl Reader {
read<T>(&mut self) -> T {
// if T is i32 ... else if ...
}
}
or like this:
impl Reader {
read(&mut self) -> i32 {
// ...
}
read(&mut self) -> i64 {
// ...
}
}
But haven't found anything relatable yet.
(I actually have, for the first case (if T is i32 ...), but it looked really unreadable and inconvenient)
You could do this by having a Readable trait which you implement on i32 and i64, which does the operation. Then on Reader you could have a generic function which takes any type that is Readable and return it, for example:
struct Reader {
n: u8,
}
trait Readable {
fn read_from_reader(reader: &mut Reader) -> Self;
}
impl Readable for i32 {
fn read_from_reader(reader: &mut Reader) -> i32 {
reader.n += 1;
reader.n as i32
}
}
impl Readable for i64 {
fn read_from_reader(reader: &mut Reader) -> i64 {
reader.n += 1;
reader.n as i64
}
}
impl Reader {
fn read<T: Readable>(&mut self) -> T {
T::read_from_reader(self)
}
}
fn main() {
let mut r = Reader { n: 0 };
let int32: i32 = r.read();
let int64: i64 = r.read();
println!("{} {}", int32, int64);
}
You can try it on the playground
After some trials and searches, I found that implementing them in current Rust seems a bit difficult, but not impossible.
Here is the code, I'll explain it afterwards:
#![feature(generic_const_exprs)]
use std::{
mem::{self, MaybeUninit},
ptr,
};
static DATA: [u8; 8] = [
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
];
struct Reader;
impl Reader {
fn read<T: Copy + Sized>(&self) -> T
where
[(); mem::size_of::<T>()]: ,
{
let mut buf = [unsafe { MaybeUninit::uninit().assume_init() }; mem::size_of::<T>()];
unsafe {
ptr::copy_nonoverlapping(DATA.as_ptr(), buf.as_mut_ptr(), buf.len());
mem::transmute_copy(&buf)
}
}
}
fn main() {
let reader = Reader;
let v_u8: u8 = reader.read();
dbg!(v_u8);
let v_u16: u16 = reader.read();
dbg!(v_u16);
let v_u32: u32 = reader.read();
dbg!(v_u32);
let v_u64: u64 = reader.read();
dbg!(v_u64);
}
Suppose the global static variable DATA is the target data you want to read.
In current Rust, we cannot directly use the size of a generic parameter as the length of an array. This does not work:
fn example<T: Copy + Sized>() {
let mut _buf = [0_u8; mem::size_of::<T>()];
}
The compiler gives a weird error:
error: unconstrained generic constant
--> src\main.rs:34:31
|
34 | let mut _buf = [0_u8; mem::size_of::<T>()];
| ^^^^^^^^^^^^^^^^^^^
|
= help: try adding a `where` bound using this expression: `where [(); mem::size_of::<T>()]:`
There is an issue that is tracking it, if you want to go deeper into this error you can take a look.
We just follow the compiler's suggestion to add a where bound. This requires feature generic_const_exprs to be enabled.
Next, unsafe { MaybeUninit::uninit().assume_init() } is optional, which drops the overhead of initializing this array, since we will eventually overwrite it completely. You can replace it with 0_u8 if you don't like it.
Finally, copy the data you need and transmute this array to your generic type, return.
I think you will see the output you expect:
[src\main.rs:38] v_u8 = 255
[src\main.rs:41] v_u16 = 65535
[src\main.rs:44] v_u32 = 4294967295
[src\main.rs:47] v_u64 = 18446744073709551615

Conditionally sort a Vec in Rust

Let's say I want to sort a Vec of non-Clone items - but only maybe (this is a boiled down example of an issue in my code).
My attempt would be something like:
fn maybe_sort<T>(x: Vec<T>) -> Vec<T>
where
T: std::cmp::Ord,
{
// First, I need a copy of the vector - but only the vector, not the items inside
let mut copied = x.iter().collect::<Vec<_>>();
copied.sort();
// In my actual code the line below depends on the sorted vec
if rand::random() {
return copied.into_iter().map(|x| *x).collect::<Vec<_>>();
} else {
return x;
}
}
Alas the borrow checker isn't happy. I have a shared reference to each item in the Vec, and although I am not ever returning 2 references to the same item, Rust can't tell.
Is there a way to do this without unsafe? (and if not, what's the cleanest way to do it with unsafe.
You can .enumerate() the values to keep their original index. You can sort this based on its value T and decide whether to return the sorted version, or reverse the sort by sorting by original index.
fn maybe_sort<T: Ord>(x: Vec<T>) -> Vec<T> {
let mut items: Vec<_> = x.into_iter().enumerate().collect();
items.sort_by(|(_, a), (_, b)| a.cmp(b));
if rand::random() {
// return items in current order
}
else {
// undo the sort
items.sort_by_key(|(index, _)| *index);
}
items.into_iter().map(|(_, value)| value).collect()
}
If T implements Default, you can do it with a single sort and without unsafe like this:
fn maybe_sort<T: Ord + Default> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), Default::default);
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = v;
}
return r;
}
}
Playground
If T does not implement Default, the same thing can be done with MaybeUninit:
use std::mem::{self, MaybeUninit};
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), || unsafe { MaybeUninit::uninit().assume_init() });
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = MaybeUninit::new (v);
}
return unsafe { mem::transmute::<_, Vec<T>> (r) };
}
}
Playground
Finally, here's a safe solution which doesn't require T to implement Default, but allocates an extra buffer (there is theoretically a way to reorder the indices in place, but I'll leave it as an exercise to the reader ☺):
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
let mut rev = vec![0; x.len()];
for (i, &j) in idx.iter().enumerate() {
rev[j] = i;
}
for i in 0..x.len() {
while rev[i] != i {
let j = rev[i];
x.swap (j, i);
rev.swap (j, i);
}
}
}
x
}
Playground

Implement slice_shift_char using the std library

I'd like to use the &str method slice_shift_char, but it is marked as unstable in the documentation:
Unstable: awaiting conventions about shifting and slices and may not
be warranted with the existence of the chars and/or char_indices
iterators
What would be a good way to implement this method, with Rust's current std library? So far I have:
fn slice_shift_char(s: &str) -> Option<(char, &str)> {
let mut ixs = s.char_indices();
let next = ixs.next();
match next {
Some((next_pos, ch)) => {
let rest = unsafe {
s.slice_unchecked(next_pos, s.len())
};
Some((ch, rest))
},
None => None
}
}
I'd like to avoid the call to slice_unchecked. I'm using Rust 1.1.
Well, you can look at the source code, and you'll get https://github.com/rust-lang/rust/blob/master/src/libcollections/str.rs#L776-L778 and https://github.com/rust-lang/rust/blob/master/src/libcore/str/mod.rs#L1531-L1539 . The second:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let next_s = unsafe { self.slice_unchecked(ch.len_utf8(), self.len()) };
Some((ch, next_s))
}
}
If you don't want the unsafe, you can just use a normal slice:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let len = self.len();
let next_s = &self[ch.len_utf8().. len];
Some((ch, next_s))
}
}
The unstable slice_shift_char function has been deprecated since Rust 1.9.0 and removed completely in Rust 1.11.0.
As of Rust 1.4.0, the recommended approach of implementing this is:
Use .chars() to get an iterator of the char content
Iterate on this iterator once to get the first character.
Call .as_str() on that iterator to recover the remaining uniterated string.
fn slice_shift_char(a: &str) -> Option<(char, &str)> {
let mut chars = a.chars();
chars.next().map(|c| (c, chars.as_str()))
}
fn main() {
assert_eq!(slice_shift_char("hello"), Some(('h', "ello")));
assert_eq!(slice_shift_char("ĺḿńóṕ"), Some(('ĺ', "ḿńóṕ")));
assert_eq!(slice_shift_char(""), None);
}

How to call count on an iterator and still use the iterator's items?

parts.count() leads to ownership transfer, so parts can't be used any more.
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let len = parts.count(); //ownership transfer
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
It is also possible to avoid unnecessary allocations of Vec if you only need to use the first or the second part:
fn split<'a>(slice: &'a [u8], splitter: &[u8]) -> Option<&'a [u8]> {
let mut parts = slice.split(|b| splitter.contains(b)).fuse();
let first = parts.next();
let second = parts.next();
second.or(first)
}
Then if you actually need a Vec you can map on the result:
split(&[1u8, 2u8, 3u8], &[2u8]).map(|s| s.to_vec())
Of course, if you want, you can move to_vec() conversion to the function:
second.or(first).map(|s| s.to_vec())
I'm calling fuse() on the iterator in order to guarantee that it will always return None after the first None is returned (which is not guaranteed by the general iterator protocol).
The other answers are good suggestions to answer your problem, but I'd like to point out another general solution: create multiple iterators:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let parts2 = slice.split(|b| splitter.contains(b));
let len = parts2.count();
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
You can usually create multiple read-only iterators. Some iterators even implement Clone, so you could just say iter.clone().count(). Unfortunately, Split isn't one of them because it owns the passed-in closure.
One thing you can do is collect the results of the split in a new owned Vec, like this:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let parts: Vec<&[u8]> = slice.split(|b| splitter.contains(b)).collect();
let len = parts.len();
if len >= 2 {
Some(parts.iter().nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.iter().nth(0).unwrap().to_vec())
} else {
None
}
}

Resources