Array of `Option<T>` to `Option` array - rust

I'm trying to work with fixed-size arrays. I want to transform an array of Option values [Option<T>; N] to an Option<[T ; N]> such that I get Some if all entries are Some, but None otherwise.
Rust typically uses iterators to transform collections, but these do not have length guarantees.
This shoulld be legal, as the result if present must be the same length as the argument, but is there a way to do this whilst preserving the compile-time length guarantees, without using unwrap or similar?

On nightly, using try_map() (maybe there is a crate that supports that on stable, I don't know):
#![feature(array_try_map)]
pub fn convert<T, const N: usize>(arr: [Option<T>; N]) -> Option<[T; N]> {
arr.try_map(|v| v)
}
On stable, allocating:
pub fn convert<T, const N: usize>(arr: [Option<T>; N]) -> Option<[T; N]> {
let arr = arr.into_iter().collect::<Option<Vec<T>>>()?;
Some(
arr.try_into()
.unwrap_or_else(|_| panic!("the array is of size {N}")),
)
}
On stable, non-allocating, but requires Default and may be little inefficient:
pub fn convert<T: Default, const N: usize>(arr: [Option<T>; N]) -> Option<[T; N]> {
let mut result = [(); N].map(|()| T::default());
for (item, result_item) in std::iter::zip(arr, &mut result) {
*result_item = item?;
}
Some(result)
}
On stable, non-allocating, using unsafe. Should be last resort. The following code is subtly incorrect as it does not drop elements in case of None in the middle, which just demonstrates how hard it is to write correct unsafe code:
use std::mem::{ManuallyDrop, MaybeUninit};
pub fn convert<T, const N: usize>(arr: [Option<T>; N]) -> Option<[T; N]> {
let arr = ManuallyDrop::new(arr);
let mut result_arr = MaybeUninit::<[T; N]>::uninit();
for (i, item) in arr.iter().enumerate() {
// SAFETY: This is in bounds, and wrapped in `ManuallyDrop` so not double drop.
unsafe {
result_arr
.as_mut_ptr()
.cast::<T>()
.add(i)
.write(std::ptr::read(item)?);
}
}
// SAFETY: We initialized it above.
Some(unsafe { result_arr.assume_init() })
}

You can .collect() into an Option<Vec<T>> and .and_then(|v| v.try_into().ok()):
fn convert<T, const N: usize>(a: [Option<T>; N]) -> Option<[T; N]> {
a.into_iter().collect::<Option<Vec<_>>>().and_then(|a| a.try_into().ok())
}
Which is straight forward and safe but creates an intermediate Vec unfortunately since T and Option<T> have different sizes I don't think you can do any better.
Since an iterator over an array ([T;N]) always yields N items a.try_into().ok() will always result in a Some

Related

Initialize a Vec with not-None values only

If I have variables like this:
let a: u32 = ...;
let b: Option<u32> = ...;
let c: u32 = ...;
, what is the shortest way to make a vector of those values, so that b is only included if it's Some?
In other words, is there something simpler than this:
let v = match b {
None => vec![a, c],
Some(x) => vec![a, x, c],
};
P.S. I would prefer a solution where we don't need to use the variables more than once. Consider this example:
let some_person: String = ...;
let best_man: Option<String> = ...;
let a_third_person: &str = ...;
let another_opt: Option<String> = ...;
...
As can be seen, we might have to use longer variable names, more than one Option (None), expressions (like a_third_person.to_string()), etc.
Yours is fine, but here's a sophisticated one:
[Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>()
This works since Option impls IntoIterator.
If it depends on just one variable:
b.map(|b| vec![a, b, c]).unwrap_or_else(|| vec![a, c]);
Playground
After some thinking and investigating, I've come with the following crazy thing.
The end goal is to have a macro, optional_vec![], that you can pass it either T or Option<T> and it should behave like described in the question. However, I decided on a strong restriction: it should have the best performance possible. So, you write:
optional_vec![a, b, c]
And get at least the performance of hand-written match, if not more. This forbids the use of the simple [Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>(), suggested in my other answer (though even this solution needs some way to differentiate between Option<T> and just T, which, like we'll see, is not an easy problem at all).
I will first warn that I've not found a way to make my macro work with Option. That is, if you want to build a vector of Option<T> from Option<T> and Option<Option<T>>, it will not work.
When a design a complex macro, I like to think first how the expanded code will look like. And in this macro, we have several hard problems to solve.
First, the macro take plain expressions. But somehow, it needs to switch on their type being T or Option<T>. How should such thing be done?
The feature we use to do such things is specialization.
#![feature(specialization)]
pub trait Optional {
fn some_method(self);
}
impl<T> Optional for T {
default fn some_method(self) {
// Just T
}
}
impl<T> Optional for Option<T> {
fn some_method(self) {
// Option<T>
}
}
Like you probably noticed, now we have two problems: first, specialization is unstable, and I'd like to stay with stable. Second, what should be inside the trait? The second problem is easier to solve, so let's begin with it.
Turns out that the most performant way to do the pushing to the vector is to pre-allocate capacity (Vec::with_capacity), write to the vector by using pointers (don't push(), it optimizes badly!) then set the length (Vec::set_len()).
We can get a pointer to the internal buffer of the vector using Vec::as_mut_ptr(), and advance the pointer via <*mut T>::add(1).
So, we need two methods: one to hint us about the capacity (can be zero for None or one for Some() and non-Option elements), and a write_and_advance() method:
pub trait Optional {
type Item;
fn len(&self) -> usize;
unsafe fn write_and_advance(self, place: &mut *mut Self::Item);
}
impl<T> Optional for T {
default type Item = Self;
default fn len(&self) -> usize { 1 }
default unsafe fn write_and_advance(self, place: &mut *mut Self) {
place.write(self);
*place = place.add(1);
}
}
impl<T> Optional<T> for Option<T> {
type Item = T;
fn len(&self) -> usize { self.is_some() as usize }
unsafe fn write_and_advance(self, place: &mut *mut T) {
if let Some(value) = self {
place.write(value);
*place = place.add(1);
}
}
}
It doesn't even compile! For the why, see Mismatch between associated type and type parameter only when impl is marked `default`. Luckily for us, the trick we'll use to workaround specialization not being stable does work in this situation. But for now, let's assume it works. How will the code using this trait look like?
match (a, b, c) { // The match is here because it's the best binding for liftimes: see https://stackoverflow.com/a/54855986/7884305
(a, b, c) => {
let len = Optional::len(&a) + Optional::len(&b) + Optional::len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
Optional::write_and_advance(a, &mut next_element);
Optional::write_and_advance(b, &mut next_element);
Optional::write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
And it works! Except that it does not, because the specialization does not compile as I said, and we also want to not repeat all of this boilerplate but insert it into a macro.
So, how do we solve the problems with specialization: being unstable and not working?
dtonlay has a very cool trick he calls autoref specialization (BTW, all of this repo is a very recommended reading!). This is a trick that can be used to emulate specialization. It works only in macros, but we're in a macro so this is fine.
I will not elaborate about the trick here (I recommend to read his post; he also used this trick in the excellent and very widely used anyhow crate). In short, the idea is to trick the typechecker by implementing a trait for T under certain conditions (the specialized impl) and other trait for &T for the general case (this could be inherent impl if not coherence). Since Rust performs automatic referencing during method resolution, that is take reference to the receiver as needed, this will work - the typechecker will autoref if needed, and will stop in the first applicable impl - i.e. the specialized impl if it matches, or the general impl otherwise.
Here's an example:
use std::fmt;
pub trait Display {
fn foo(&self);
}
// Level 1
impl<T: fmt::Display> Display for T {
fn foo(&self) { println!("Display({}), {}", std::any::type_name::<T>(), self); }
}
pub trait Debug {
fn foo(&self);
}
// Level 2
impl<T: fmt::Debug> Debug for &T {
fn foo(&self) { println!("Debug({}), {:?}", std::any::type_name::<T>(), self); }
}
macro_rules! foo {
($e:expr) => ((&$e).foo());
}
Playground.
We can use this trick in our case:
#[doc(hidden)]
pub mod autoref_specialization {
#[derive(Copy, Clone)]
pub struct OptionTag;
pub trait OptionKind {
fn optional_kind(&self) -> OptionTag;
}
impl<T> OptionKind for Option<T> {
#[inline(always)]
fn optional_kind(&self) -> OptionTag { OptionTag }
}
impl OptionTag {
#[inline(always)]
pub fn len<T>(self, this: &Option<T>) -> usize { this.is_some() as usize }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: Option<T>, place: &mut *mut T) {
if let Some(value) = this {
place.write(value);
*place = place.add(1);
}
}
}
#[derive(Copy, Clone)]
pub struct DefaultTag;
pub trait DefaultKind {
fn optional_kind(&self) -> DefaultTag;
}
impl<T> DefaultKind for &'_ T {
#[inline(always)]
fn optional_kind(&self) -> DefaultTag { DefaultTag }
}
impl DefaultTag {
#[inline(always)]
pub fn len<T>(self, _this: &T) -> usize { 1 }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: T, place: &mut *mut T) {
place.write(this);
*place = place.add(1);
}
}
}
And the expanded code will look like:
use autoref_specialization::{DefaultKind as _, OptionKind as _};
match (a, b, c) {
(a, b, c) => {
let (a_tag, b_tag, c_tag) = (
(&a).optional_kind(),
(&b).optional_kind(),
(&c).optional_kind(),
);
let len = a_tag.len(&a) + b_tag.len(&b) + c_tag.len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
a_tag.write_and_advance(a, &mut next_element);
b_tag.write_and_advance(b, &mut next_element);
c_tag.write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
It may be tempting to try to convert this immediately into a macro, but we still have one unsolved problem: our macro need to generate identifiers. This may not be obvious, but what if we pass optional_vec![1, Some(2), 3]? We need to generate the bindings for the match (in our case, (a, b, c) => ...) and the tag names ((a_tag, b_tag, c_tag)).
Unfortunately, generating names is not something macro_rules! can do in today's Rust. Fortunately, there is an excellent crate paste (another one from dtonlay!) that is a small proc-macro that allows you to do that. It is even available on the playground!
However, we need a series of identifiers. That can be done with tt-munching, by repeatedly adding some letter (I used a), so you get a, aa, aaa, ... you get the idea.
#[doc(hidden)]
pub mod reexports {
pub use std::vec::Vec;
pub use paste::paste;
}
#[macro_export]
macro_rules! optional_vec {
// Empty case
{ #generate_idents
exprs = []
processed_exprs = [$($e:expr,)*]
match_bindings = [$($binding:ident)*]
tags = [$($tag:ident)*]
} => {{
use $crate::autoref_specialization::{DefaultKind as _, OptionKind as _};
match ($($e,)*) {
($($binding,)*) => {
let ($($tag,)*) = (
$((&$binding).optional_kind(),)*
);
let len = 0 $(+ $tag.len(&$binding))*;
let mut result = $crate::reexports::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
$($tag.write_and_advance($binding, &mut next_element);)*
result.set_len(len);
}
result
}
}
}};
{ #generate_idents
exprs = [$e:expr, $($rest:expr,)*]
processed_exprs = [$($processed_exprs:tt)*]
match_bindings = [$first_binding:ident $($bindings:ident)*]
tags = [$($tags:ident)*]
} => {
$crate::reexports::paste! {
$crate::optional_vec! { #generate_idents
exprs = [$($rest,)*]
processed_exprs = [$($processed_exprs)* $e,]
match_bindings = [
[< $first_binding a >]
$first_binding
$($bindings)*
]
tags = [
[< $first_binding a_tag >]
$($tags)*
]
}
}
};
// Entry
[$e:expr $(, $exprs:expr)* $(,)?] => {
$crate::optional_vec! { #generate_idents
exprs = [$($exprs,)+]
processed_exprs = [$e,]
match_bindings = [__optional_vec_a]
tags = [__optional_vec_a_tag]
}
};
}
Playground.
I can also personally recommend
let mut v = vec![a, c];
v.extend(b);
Short and clear.
Sometime the straight forward solution is the best:
fn jim_power(a: u32, b: Option<u32>, c: u32) -> Vec<u32> {
let mut acc = Vec::with_capacity(3);
acc.push(a);
if let Some(b) = b {
acc.push(b);
}
acc.push(c);
acc
}
fn ys_iii(
some_person: String,
best_man: Option<String>,
a_third_person: String,
another_opt: Option<String>,
) -> Vec<String> {
let mut acc = Vec::with_capacity(4);
acc.push(some_person);
best_man.map(|x| acc.push(x));
acc.push(a_third_person);
another_opt.map(|x| acc.push(x));
acc
}
If you don't care about the order of the values, another option is
Iterator::chain(
[a, c].into_iter(),
[b].into_iter().flatten()
).collect()
Playground

How can I create a fixed size array of Strings using constant generics?

I have a function using a constant generic:
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = Default::default(); //It sucks because of default arrays are specified up to 32 only
// Some code
}
How can I create a fixed size array of Strings in my case? let mut row: [String; S] = ["".to_string(), S]; doesn't work because String doesn't implement the Copy trait.
You can do it with MaybeUninit and unsafe:
use std::mem::MaybeUninit;
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = unsafe {
let mut result = MaybeUninit::uninit();
let start = result.as_mut_ptr() as *mut String;
for pos in 0 .. S {
// SAFETY: safe because loop ensures `start.add(pos)`
// is always on an array element, of type String
start.add(pos).write(String::new());
}
// SAFETY: safe because loop ensures entire array
// has been manually initialised
result.assume_init()
};
// Some code
todo!()
}
Of course, it might be easier to abstract such logic to your own trait:
use std::mem::MaybeUninit;
trait DefaultArray {
fn default_array() -> Self;
}
impl<T: Default, const S: usize> DefaultArray for [T; S] {
fn default_array() -> Self {
let mut result = MaybeUninit::uninit();
let start = result.as_mut_ptr() as *mut T;
unsafe {
for pos in 0 .. S {
// SAFETY: safe because loop ensures `start.add(pos)`
// is always on an array element, of type T
start.add(pos).write(T::default());
}
// SAFETY: safe because loop ensures entire array
// has been manually initialised
result.assume_init()
}
}
}
(The only reason for using your own trait rather than Default is that implementations of the latter would conflict with those provided in the standard library for arrays of up to 32 elements; I wholly expect the standard library to replace its implementation of Default with something similar to the above once const generics have stabilised).
In which case you would now have:
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = DefaultArray::default_array();
// Some code
todo!()
}
See it on the Playground.
As of now, there is no way to compile constant generics. As #AlexLarionov said, you can try to use procedural macros, but that approach still has its bugs and limitations.
If you need a generic that has to be a number, you can use the Num crate, or the more verbose std::num.

Rust: How to short-circuit exit from a chain of iterator methods on first Err or None?

Consider a chain of iterator methods:
.iter().a().b().c()
where a produces values of type Option (or Result). Is there a way to have the whole chain return None (or (Err(_)) as soon as a yields a None (or Err(_))?
Detailed example
Given functions valid (identifying nonsensical input) and accept (an
arbitrary selection criterion):
type T = u8;
type ERR = u8;
fn valid(x: &T) -> Result<T, ERR> {
if *x < 10 { Ok(*x) } else { Err(*x) }
}
fn accept(x: &T) -> bool {
if *x > 9 { panic!("{} should have been rejected by validator") }
*x % 2 == 0
}
I would like to write a function
fn count_accepted(data: &[T]) -> Result<usize, ERR>
which
Returns Err(ERR) as soon as the first invalid element is encountered in the
input data
If all elements are valid, returns Ok(usize) containing the count of values
that satisfied the accept criterion
Here is a solution that uses a loop:
fn count_loop(data: &[T]) -> Result<usize, ERR> {
let mut count = 0;
for item in data {
valid(&item)?;
if accept(&item) { count += 1 }
}
Ok(count)
}
which seems to work as required, as witnessed by these tests:
macro_rules! testem {
($count:path) => {
#[test] fn empty() { assert_eq!($count(&[]) , Ok(0)) }
#[test] fn all_ok_and_accepted() { assert_eq!($count(&[2,6]) , Ok(2)) }
#[test] fn all_ok_some_rejected() { assert_eq!($count(&[2,3]) , Ok(1)) }
#[test] fn one_invalid() { assert_eq!($count(&[12]) , Err(12)) }
#[test] fn stop_on_first_invalid() { assert_eq!($count(&[2,13,6,12,5]), Err(13)) }
}
}
mod test_loop {testem!{super::count_loop}}
I would like to understand whether/how one could implement this behaviour using
iterators rather than a loop.
Consider a related, but simpler problem: if any of the data are not valid, bail
out immediately, otherwise collect all the data into a vector. In other words,
remove the accept condition from the previous problem.
This problem has quite a satisfactory solution, because the FromIterator
implementation of Result takes care of early termination:
fn related(data: &[T]) -> Result<Vec<T>, ERR> {
data.iter()
.map(valid)
.collect()
}
mod test_related {
#[test]
fn stop_on_first_invalid() { assert_eq!(super::related(&[2,13,6,12,5]), Err(13))}
}
Here is an extension of related which passes the same tests as count_loop:
fn count_via_vec(data: &[T]) -> Result<usize, ERR> {
Ok(data
.iter()
.map(valid)
.filter(|x| x.is_err() || accept(&x.unwrap()))
.collect::<Result<Vec<T>, ERR>>()?
.len())
}
mod test_vvec {testem!{super::count_via_vec}}
However: this solution has a number of drawbacks with respect to count_loop:
The filter condition is very noisy.
The filtering step still needs to be performed when the first invalid item has
been identified (unlike in the original loop implementation): the ? appears
2 lines later than it should ... if that were meaningful.
A vector is unnecessarily populated (unless Rust performs some cool
optimization that I'm, as yet, unaware of), so the space complexity rises from
O(1) to O(N).
The last point would normally be addressed by replacing
.collect::<Result<Vec<T>, ERR>>()?.len()) with .count(), but this has the
further detrimental effect of removing the recognition of invalid cases: they
are simply counted as successes, as witnessed by the test failed by this
implementation:
fn count_iterate(data: &[T]) -> Result<usize, ERR> {
Ok(data
.iter()
.map(valid)
.filter(|x| x.is_err() || accept(&x.unwrap()))
.count())
}
mod test_iter {testem!{super::count_iterate}}
Can you suggest some mechanism for early returning in chains of iterator methods
that can be used in cases such as this?
You are probably looking for std::iter::Sum which has an impl<T, U, E> Sum<Result<U, E>> for Result<T, E> where T: Sum<U>, while there are also impls for Sum for all basic integers.
So the following will Just Work:
fn valid(x: &u32) -> Result<u32, u32> {
if *x < 10 { Ok(1) } else { Err(*x) }
}
fn count(x: &[u32]) -> Result<u32, u32> {
x.iter()
.map(valid)
.sum()
}
fn main() {
println!("{:?}", count(&[13,1,2,3]));
}
As the docs on Sum say, this will short-circuit the iterator if an error is encountered. This will include short-circuiting chained iterators.

String join on strings in Vec in reverse order without a `collect`

I'm trying to join strings in a vector into a single string, in reverse from their order in the vector. The following works:
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
v.iter().rev().map(|s| s.clone()).collect::<Vec<String>>().connect(".")
However, this ends up creating a temporary vector that I don't actually need. Is it possible to do this without a collect? I see that connect is a StrVector method. Is there nothing for raw iterators?
I believe this is the shortest you can get:
fn main() {
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut r = v.iter()
.rev()
.fold(String::new(), |r, c| r + c.as_str() + ".");
r.pop();
println!("{}", r);
}
The addition operation on String takes its left operand by value and pushes the second operand in-place, which is very nice - it does not cause any reallocations. You don't even need to clone() the contained strings.
I think, however, that the lack of concat()/connect() methods on iterators is a serious drawback. It bit me a lot too.
I don't know if they've heard our Stack Overflow prayers or what, but the itertools crate happens to have just the method you need - join.
With it, your example might be laid out as follows:
use itertools::Itertools;
let v = ["a", "b", "c"];
let connected = v.iter().rev().join(".");
Here's an iterator extension trait that I whipped up, just for you!
pub trait InterleaveExt: Iterator + Sized {
fn interleave(self, value: Self::Item) -> Interleave<Self> {
Interleave {
iter: self.peekable(),
value: value,
me_next: false,
}
}
}
impl<I: Iterator> InterleaveExt for I {}
pub struct Interleave<I>
where
I: Iterator,
{
iter: std::iter::Peekable<I>,
value: I::Item,
me_next: bool,
}
impl<I> Iterator for Interleave<I>
where
I: Iterator,
I::Item: Clone,
{
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
// Don't return a value if there's no next item
if let None = self.iter.peek() {
return None;
}
let next = if self.me_next {
Some(self.value.clone())
} else {
self.iter.next()
};
self.me_next = !self.me_next;
next
}
}
It can be called like so:
fn main() {
let a = &["a", "b", "c"];
let s: String = a.iter().cloned().rev().interleave(".").collect();
println!("{}", s);
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let s: String = v.iter().map(|s| s.as_str()).rev().interleave(".").collect();
println!("{}", s);
}
I've since learned that this iterator adapter already exists in itertools under the name intersperse — go use that instead!.
Cheating answer
You never said you needed the original vector after this, so we can reverse it in place and just use join...
let mut v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
v.reverse();
println!("{}", v.join("."))

Using the same iterator multiple times in Rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?
As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.
take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

Resources