Implement slice_shift_char using the std library - rust

I'd like to use the &str method slice_shift_char, but it is marked as unstable in the documentation:
Unstable: awaiting conventions about shifting and slices and may not
be warranted with the existence of the chars and/or char_indices
iterators
What would be a good way to implement this method, with Rust's current std library? So far I have:
fn slice_shift_char(s: &str) -> Option<(char, &str)> {
let mut ixs = s.char_indices();
let next = ixs.next();
match next {
Some((next_pos, ch)) => {
let rest = unsafe {
s.slice_unchecked(next_pos, s.len())
};
Some((ch, rest))
},
None => None
}
}
I'd like to avoid the call to slice_unchecked. I'm using Rust 1.1.

Well, you can look at the source code, and you'll get https://github.com/rust-lang/rust/blob/master/src/libcollections/str.rs#L776-L778 and https://github.com/rust-lang/rust/blob/master/src/libcore/str/mod.rs#L1531-L1539 . The second:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let next_s = unsafe { self.slice_unchecked(ch.len_utf8(), self.len()) };
Some((ch, next_s))
}
}
If you don't want the unsafe, you can just use a normal slice:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let len = self.len();
let next_s = &self[ch.len_utf8().. len];
Some((ch, next_s))
}
}

The unstable slice_shift_char function has been deprecated since Rust 1.9.0 and removed completely in Rust 1.11.0.
As of Rust 1.4.0, the recommended approach of implementing this is:
Use .chars() to get an iterator of the char content
Iterate on this iterator once to get the first character.
Call .as_str() on that iterator to recover the remaining uniterated string.
fn slice_shift_char(a: &str) -> Option<(char, &str)> {
let mut chars = a.chars();
chars.next().map(|c| (c, chars.as_str()))
}
fn main() {
assert_eq!(slice_shift_char("hello"), Some(('h', "ello")));
assert_eq!(slice_shift_char("ĺḿńóṕ"), Some(('ĺ', "ḿńóṕ")));
assert_eq!(slice_shift_char(""), None);
}

Related

`while let Ok(t) ... = try_read!(...)` to make neater reading loop

Is it possible to make short, neat loop that will call , as long as result is Ok(x) and act on x ?
E.g. sth like :
use text_io::try_read; // Cargo.toml += text_io = "0.1"
fn main() {
while let Ok(t): Result<i64, _> = try_read!() {
println!("{}", t);
}
}
fails to compile.
If I try to provide type info, then it fails,
when I don't provide , then obviously it's ambiguous how to resolve try_read!.
Here is working - but IMHO way longer - snippet:
use text_io::try_read; // Cargo.toml += text_io = "0.1"
fn main() {
loop {
let mut tok: Result<i64, _> = try_read!();
match tok {
Ok(t) => println!("{}", t),
Err(_) => break,
}
}
}
You can qualify Ok as Result::Ok and then use the "turbofish" operator to provide the concrete type:
fn main() {
while let Result::<i64, _>::Ok(t) = try_read!() {
println!("{}", t);
}
}
(while let Ok::<i64, _>(t) also works, but is perhaps a bit more cryptic.)
Another option is to request the type inside the loop - rustc is smart enough to infer the type for try_read!() from that:
fn main() {
while let Ok(t) = try_read!() {
let t: i64 = t;
println!("{}", t);
}
}
The latter variant is particularly useful in for loops where the pattern match is partly hidden, so there is no place to ascribe the type to.

How do I convert a Peekable iterator back to the original iterator?

I want to implement an algorithm that skips ! or !^num at the start of a string:
fn extract_common_part(a: &str) -> Option<&str> {
let mut it = a.chars();
if it.next() != Some('!') {
return None;
}
let mut jt = it.clone().peekable();
if jt.peek() == Some(&'^') {
it.next();
jt.next();
while jt.peek().map_or(false, |v| !v.is_whitespace()) {
it.next();
jt.next();
}
it.next();
}
Some(it.as_str())
}
fn main() {
assert_eq!(extract_common_part("!^4324 1234"), Some("1234"));
assert_eq!(extract_common_part("!1234"), Some("1234"));
}
playground
This works, but I can not find way to return from Peekable to Chars, so I have to advance it and jt iterators. This causes duplicate code.
How can I return from Peekable iterator to corresponding Chars iterator, or maybe there is a simpler way to implement this algorithm?
In short, you cannot. The general answer is to use something like Iterator::by_ref to avoid consuming the Chars iterator:
fn extract_common_part(a: &str) -> Option<&str> {
let mut it = a.chars();
if it.next() != Some('!') {
return None;
}
{
let mut jt = it.by_ref().peekable();
if jt.peek() == Some(&'^') {
jt.next();
while jt.peek().map_or(false, |v| !v.is_whitespace()) {
jt.next();
}
}
}
Some(it.as_str())
}
The problem is that when you call peek and it fails, the underlying iterator has already been advanced. Getting the rest of the string will lose the character that tested false, returning 234.
However, Itertools has peeking_take_while and take_while_ref, both of which should solve the issue.
extern crate itertools;
use itertools::Itertools;
fn extract_common_part(a: &str) -> Option<&str> {
let mut it = a.chars();
if it.next() != Some('!') {
return None;
}
if it.peeking_take_while(|&c| c == '^').next() == Some('^') {
for _ in it.peeking_take_while(|v| !v.is_whitespace()) {}
for _ in it.peeking_take_while(|v| v.is_whitespace()) {}
}
Some(it.as_str())
}
Other options include:
using a crate like strcursor which is designed for this kind of incremental advance over a string.
do the parsing on regular strings directly, and hope the optimizer eliminates redundant bounds checks.
Use a regex or other parsing library
If you are only interested in the result, without validation:
fn extract_common_part(a: &str) -> Option<&str> {
a.chars().rev().position(|v| v.is_whitespace() || v == '!')
.map(|pos| &a[a.len() - pos..])
}
fn main() {
assert_eq!(extract_common_part("!^4324 1234"), Some("1234"));
assert_eq!(extract_common_part("!1234"), Some("1234"));
}

Convert vectors to arrays and back [duplicate]

This question already has an answer here:
Is there a good way to convert a Vec<T> to an array?
(1 answer)
Closed 7 years ago.
I am attempting to figure the most Rust-like way of converting from a vector to array and back. These macros will work and can even be made generic with some unsafe blocks, but it all feels very un-Rust like.
I would appreciate any input and hold no punches, I think this code is far from nice or optimal. I have only played with Rust for a few weeks now and chasing releases and docs so really appreciate help.
macro_rules! convert_u8vec_to_array {
($container:ident, $size:expr) => {{
if $container.len() != $size {
None
} else {
use std::mem;
let mut arr : [_; $size] = unsafe { mem::uninitialized() };
for element in $container.into_iter().enumerate() {
let old_val = mem::replace(&mut arr[element.0],element.1);
unsafe { mem::forget(old_val) };
}
Some(arr)
}
}};
}
fn array_to_vec(arr: &[u8]) -> Vec<u8> {
let mut vector = Vec::new();
for i in arr.iter() {
vector.push(*i);
}
vector
}
fn vector_as_u8_4_array(vector: Vec<u8>) -> [u8;4] {
let mut arr = [0u8;4];
for i in (0..4) {
arr[i] = vector[i];
}
arr
}
The code seems fine to me, although there's a very important safety thing to note: there can be no panics while arr isn't fully initialised. Running destructors on uninitialised memory could easily lead be undefined behaviour, and, in particular, this means that into_iter and the next method of it should never panic (I believe it is impossible for the enumerate and mem::* parts of the iterator to panic given the constraints of the code).
That said, one can express the replace/forget idiom with a single function: std::ptr::write.
for (idx, element) in $container.into_iter().enumerate() {
ptr::write(&mut arr[idx], element);
}
Although, I would write it as:
for (place, element) in arr.iter_mut().zip($container.into_iter()) {
ptr::write(place, element);
}
Similarly, one can apply some iterator goodness to the u8 specialised versions:
fn array_to_vec(arr: &[u8]) -> Vec<u8> {
arr.iter().cloned().collect()
}
fn vector_as_u8_4_array(vector: Vec<u8>) -> [u8;4] {
let mut arr = [0u8;4];
for (place, element) in arr.iter_mut().zip(vector.iter()) {
*place = *element;
}
arr
}
Although the first is probably better written as arr.to_vec(), and the second as
let mut arr = [0u8; 4];
std::slice::bytes::copy_memory(&vector, &mut arr);
arr
Although that function is unstable currently, and hence only usable on nightly.

How to call count on an iterator and still use the iterator's items?

parts.count() leads to ownership transfer, so parts can't be used any more.
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let len = parts.count(); //ownership transfer
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
It is also possible to avoid unnecessary allocations of Vec if you only need to use the first or the second part:
fn split<'a>(slice: &'a [u8], splitter: &[u8]) -> Option<&'a [u8]> {
let mut parts = slice.split(|b| splitter.contains(b)).fuse();
let first = parts.next();
let second = parts.next();
second.or(first)
}
Then if you actually need a Vec you can map on the result:
split(&[1u8, 2u8, 3u8], &[2u8]).map(|s| s.to_vec())
Of course, if you want, you can move to_vec() conversion to the function:
second.or(first).map(|s| s.to_vec())
I'm calling fuse() on the iterator in order to guarantee that it will always return None after the first None is returned (which is not guaranteed by the general iterator protocol).
The other answers are good suggestions to answer your problem, but I'd like to point out another general solution: create multiple iterators:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let mut parts = slice.split(|b| splitter.contains(b));
let parts2 = slice.split(|b| splitter.contains(b));
let len = parts2.count();
if len >= 2 {
Some(parts.nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.nth(0).unwrap().to_vec())
} else {
None
}
}
fn main() {
split(&[1u8, 2u8, 3u8], &[2u8]);
}
You can usually create multiple read-only iterators. Some iterators even implement Clone, so you could just say iter.clone().count(). Unfortunately, Split isn't one of them because it owns the passed-in closure.
One thing you can do is collect the results of the split in a new owned Vec, like this:
fn split(slice: &[u8], splitter: &[u8]) -> Option<Vec<u8>> {
let parts: Vec<&[u8]> = slice.split(|b| splitter.contains(b)).collect();
let len = parts.len();
if len >= 2 {
Some(parts.iter().nth(1).unwrap().to_vec())
} else if len >= 1 {
Some(parts.iter().nth(0).unwrap().to_vec())
} else {
None
}
}

Using the same iterator multiple times in Rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?
As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.
take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

Resources