Why does Iterator::take_while take ownership of the iterator? - rust

I find it odd that Iterator::take_while takes ownership of the iterator. It seems like a useful feature to be able to take the first x elements which satisfy some function but still leave the rest of the elements available in the original iterator.
I understand that this is incompatible with a lazy implementation of take_while, but still feels useful. Was this just judged not useful enough to include in the standard library, or is there some other problem I'm not seeing?

All the iterator adapters take the original iterator by value for efficiency's sake. Additionally, taking ownership of the original iterator avoids having to deal with lifetimes when it isn't necessary.
If you wish to retain access to the original iterator, you can use by_ref. This introduces one level of indirection, but the programmer chooses to opt into the extra work when the feature is needed:
fn main() {
let v = [1, 2, 3, 4, 5, 6, 7, 8];
let mut i1 = v.iter();
for z in i1.by_ref().take_while(|&&v| v < 4) {
// ^^^^^^^^^
println!("Take While: {}", z);
}
for z in i1 {
println!("Rest: {}", z);
}
}
Has the output
Take While: 1
Take While: 2
Take While: 3
Rest: 5
Rest: 6
Rest: 7
Rest: 8
Iterator::by_ref works because there's an implementation of Iterator for any mutable reference to an iterator:
impl<'_, I> Iterator for &'_ mut I
where
I: Iterator + ?Sized,
This means that you can also take a mutable reference. The parenthesis are needed for precedence:
for z in (&mut i1).take_while(|&&v| v < 4)
Did you note that 4 was missing? That's because once take_while picks a value and decides to not use it, there's nowhere for it to "put it back". Putting it back would require opting into more storage and slowness than is always needed.
I've used the itertools crate to handle cases like this, specifically take_while_ref:
use itertools::Itertools; // 0.9.0
fn main() {
let v = [1, 2, 3, 4, 5, 6, 7, 8];
let mut i1 = v.iter();
for z in i1.take_while_ref(|&&v| v < 4) {
// ^^^^^^^^^^^^^^^
println!("Take While: {}", z);
}
for z in i1 {
println!("Rest: {}", z);
}
}
Take While: 1
Take While: 2
Take While: 3
Rest: 4
Rest: 5
Rest: 6
Rest: 7
Rest: 8

If it's getting too complicated, we may be using the wrong tool.
Note that 4 is present here.
fn main() {
let v = [1, 2, 3, 4, 5, 6, 7, 8];
let mut i1 = v.iter().peekable();
while let Some(z) = i1.next_if(|&n| n < &4) {
println!("Take While: {z}");
}
for z in i1 {
println!("Rest: {z}");
}
}
Take While: 1
Take While: 2
Take While: 3
Rest: 4
Rest: 5
Rest: 6
Rest: 7
Rest: 8
Playground
Yes, the OP asked for take_while and Shepmaster's solution is superb.

Related

parallel sorting on separate sections of a single slice

I'm trying to implement a sort of parallel bubble sort, e.g. have a number of threads work on distinct parts of the same slice and then have a final thread sort those two similar to a kind of merge sort
I have this code so far
pub fn parallel_bubble_sort(to_sort: Arc<&[i32]>) {
let midpoint = to_sort.len() / 2;
let ranges = [0..midpoint, midpoint..to_sort.len()];
let handles = (ranges).map(|range| {
thread::spawn(|| {
to_sort[range].sort();
})
});
}
But I get a series of errors, relating to 'to_sort's lifetime, etc
How would someone go about modifying distinct slices of a larger slice across thread bounds?
Disclaimer: I assume that you want to sort in place, as you call .sort().
There's a couple of problems with your code:
The to_sort isn't mutable, so you won't be able to modify it. Which is an essential part of sorting ;) So I think that Arc<&[i32]> should most certainly be &mut [i32].
You cannot split a mutable slice like this. Rust doesn't know if your ranges overlap, and therefore disallows this entirely. You can, however, use split_at to split it into two parts. This even works with mutable references, which is important in your case.
You cannot move mutable references to threads, because it's unknown how long the
thread will exists. Overcoming this issue is the hardest part, I'm afraid; I don't know how easy it is in normal Rust without the use of unsafe. I think the easiest solution would be to use a library like rayon which already solved those problems for you.
EDIT: Rust 1.63 introduces scoped threads, which eliminates the need for rayon in this usecase.
This should be a good start for you:
pub fn parallel_bubble_sort(to_sort: &mut [i32]) {
let midpoint = to_sort.len() / 2;
let (left, right) = to_sort.split_at_mut(midpoint);
std::thread::scope(|s| {
s.spawn(|| left.sort());
s.spawn(|| right.sort());
});
// TODO: merge left and right
}
fn main() {
let mut data = [1, 6, 3, 4, 9, 7, 4];
parallel_bubble_sort(&mut data);
println!("{:?}", data);
}
[1, 3, 6, 4, 4, 7, 9]
Previous answer for Rust versions older than 1.63
pub fn parallel_bubble_sort(to_sort: &mut [i32]) {
let midpoint = to_sort.len() / 2;
let (left, right) = to_sort.split_at_mut(midpoint);
rayon::scope(|s| {
s.spawn(|_| left.sort());
s.spawn(|_| right.sort());
});
// TODO: merge left and right
}
fn main() {
let mut data = [1, 6, 3, 4, 9, 7, 4];
parallel_bubble_sort(&mut data);
println!("{:?}", data);
}
[1, 3, 6, 4, 4, 7, 9]

Doing more than 1 thing in a iter().map()

I would like to use a map to create a new vector, but at the same time, do other things inside that map. I'm working on Advent of Code 2021, day 6 part 1.
This code loops through a vector and decrements all the values by one. If the value is at 0, then it resets that position to 6 and adds an 8 to the end of the vector.
fn run_growth_simulation(mut state: Vec<u8>, days: i32) -> usize {
for _day in 0..days {
let mut new_fish = 0;
state.iter_mut().map(|x| match x {
num: u8 # 1..=8 => {num - 1},
0 => {new_fish += 1; 6},
_ => unreachable!()
})
for _fish in 0..new_fish {
state.push(8);
}
}
state.iter().count() as usize
}
How do I return the right item from the closure?
I would mutate the value in the iterator directly and not build a new array, because of that use for_each instead of map (or preferable directly a for loop).
Then inside the match statement mutate the value:
state.iter_mut().for_each(|x| match x {
//: u8 removed because it gave me an syntax error
// mutate the number directly (we have to use `num` because x was moved)
num # 1..=8 => {*num -= 1;},
// mutate the number
0 => {new_fish += 1; *x = 6;},
_ => unreachable!()
});
A slightly different approach would be to count the 0s in the vector, remove them, subtract each value by 1 and add the new fish
As a complement to the answer stating that for_each() is preferable to map() here (since we don't consume what map() emits), below is a simpler example trying to illustrate the problem (and why the borrow-checker is right when it forbids such attempts).
In both cases (test1() and test2()) we are iterating over a vector while we are extending it (this is what was intended in the question).
In test1() the iterator considers the storage for the values once for all when it is created.
For all the subsequent iterations, it will refer to this initial storage, thus this storage must not move elsewhere in memory in the meantime.
That's why the iterator borrows the vector (mutably or not, this is not important here).
However, during these iterations we try to append new values to this vector: this may move the storage (for reallocation purpose) and fortunately this requires a mutable borrow of the vector (then it's rejected).
In test2() we avoid keeping a reference to the initial storage, and use a counter instead.
This works, but this is suboptimal since at each iteration this index operation ([]) needs to check the bounds.
The iterator in the previous function knows the bounds one for all; that's why iterators lead to better optimisation opportunities by the compiler.
Note that len() is evaluated once for all at the beginning of the loop here; this is probably what we want, but if we wanted to reevaluate it at each iteration, then we would have to use a loop {} instruction.
What is discussed here is not specific to the language but to the problem itself.
With a more permissive programming language, the first attempt may have been allowed but would have lead to memory errors; or such language should shift systematically towards the second attempt and pay the cost of bound checking at each iteration.
In the end, your solution with a second loop is probably the best choice.
fn test1() {
let mut v = vec![1, 2, 3, 4, 5, 6, 7, 8];
v.iter_mut().for_each(|e| {
if *e <= 3 {
let n = *e + 100;
// v.push(n) // !!! INCORRECT !!!
// we are trying to reallocate the storage while iterating over it
} else {
*e += 10;
}
});
println!("{:?}", v);
}
fn test2() {
let mut v = vec![1, 2, 3, 4, 5, 6, 7, 8];
for i in 0..v.len() {
let e = &mut v[i];
if *e <= 3 {
let n = *e + 100;
v.push(n);
} else {
*e += 10;
}
}
println!("{:?}", v);
}
fn main() {
test1(); // [1, 2, 3, 14, 15, 16, 17, 18]
test2(); // [1, 2, 3, 14, 15, 16, 17, 18, 101, 102, 103]
}

Why can I not use a slice pattern to filter a Window iterator?

I have a vector of numbers and use the windows(2) method to create an iterator that gives me neighbouring pairs. For example, the vector [1, 2, 3] is transformed into [1, 2], [2, 3]. I want to use the find method to find a slice that fulfills a specific condition:
fn step(g: u64) -> Option<(u64, u64)> {
let prime_list: Vec<u64> = vec![2, 3, 5, 7]; //For example
if prime_list.len() < 2 {
return None;
}
let res = prime_list.windows(2).find(|&&[a, b]| b - a == g)?;
//...
None
}
I get an error:
error[E0005]: refutable pattern in function argument: `&&[]` not covered
--> src/lib.rs:6:43
|
6 | let res = prime_list.windows(2).find(|&&[a, b]| b - a == g)?;
| ^^^^^^^^ pattern `&&[]` not covered
I don't know what that error means: the list cannot have less than two elements, for example. Maybe the closure parameter is wrong? I tried to vary it but that didn't change anything. a and b are being properly detected as u64 in my IDE too. What is going on here?
You, the programmer, know that each iterated value will have a length of 2, but how do you know that? You can only tell that from the prose documentation of the function:
Returns an iterator over all contiguous windows of length size. The windows overlap. If the slice is shorter than size, the iterator returns no values.
Nowhere does the compiler know this information. The implementation of Windows only states that the iterated value will be a slice:
impl<'a, T> Iterator for Windows<'a, T> {
type Item = &'a [T];
}
I'd convert the slice into an array reference, discarding any slices that were the wrong length (which you know cannot happen):
use std::convert::TryFrom;
fn step(g: u64) -> Option<(u64, u64)> {
let prime_list: Vec<u64> = vec![2, 3, 5, 7]; // For example
if prime_list.len() < 2 {
return None;
}
let res = prime_list
.windows(2)
.flat_map(<&[u64; 2]>::try_from)
.find(|&&[a, b]| b - a == g)?;
//...
None
}
See also:
How to convert a slice into an array reference?
How can I find a subsequence in a &[u8] slice?
How do I imply the type of the value when there are no type parameters or ascriptions?
Alternatively, you could use an iterator of integers and chunk it up.
See also:
Are there equivalents to slice::chunks/windows for iterators to loop over pairs, triplets etc?
At some point in the future, const generics might be stabilized and allow baking the array length into the function call and the return type.
See also:
Is it possible to control the size of an array using the type parameter of a generic?

How does one operate over a subset of a vector?

I understand how to operate on an entire vector, though I don't think this is idiomatic Rust:
fn median(v: &Vec<u32>) -> f32 {
let count = v.len();
if count % 2 == 1 {
v[count / 2] as f32
} else {
(v[count / 2] as f32 + v[count / 2 - 1] as f32) / 2.0
}
}
fn main() {
let mut v1 = vec![3, 7, 8, 5, 12, 14, 21, 13, 18];
v1.sort();
println!("{:.*}", 1, median(&v1));
}
But what if I want to operate on only half of this vector? For example, the first quartile is the median of the lower half, and the third quartile is the median of the upper half. My first thought was to construct two new vectors, but that did not seem quite right.
How do I get "half" a vector?
As mentioned, you want to create a slice using the Index trait with a Range:
let slice = &v1[0..v1.len() / 2];
This is yet another reason why it is discouraged to accept a &Vec. The current code would require converting the slice into an allocated Vec. Instead, rewrite it to accept a slice:
fn median(v: &[u32]) -> f32 {
// ...
}
Since you are likely interested in splitting a vector / slice in half and getting both parts, split_at may be relevant:
let (head, tail) = v1.split_at(v1.len() / 2);
println!("{:.*}", 1, median(head));
println!("{:.*}", 1, median(tail));
How to find the median on vector:
fn median(numbers: &mut Vec<i32>) -> i32 {
numbers.sort();
let mid = numbers.len() / 2;
if numbers.len() % 2 == 0 {
mean(&vec![numbers[mid - 1], numbers[mid]]) as i32
} else {
numbers[mid]
}
}
How to get half a vector:
Use Slice:
let slice: &[i32] = &numbers[0..numbers.len() / 2];
Creates a draining iterator
let half: Vec<i32> = numbers.drain(..numbers.len()/2).collect()

Borrow a section of a borrowed array as a borrowed array

As the title reads, how would I go about doing this?
fn foo(array: &[u32; 10]) -> &[u32; 5] {
&array[0..5]
}
Compiler error
error[E0308]: mismatched types
--> src/main.rs:2:5
|
2 | &array[0..5]
| ^^^^^^^^^^^^ expected array of 5 elements, found slice
|
= note: expected type `&[u32; 5]`
= note: found type `&[u32]`
arrayref implements a safe interface for doing this operation, using macros (and compile-time constant slicing bounds, of course).
Their readme explains
The goal of arrayref is to enable the effective use of APIs that involve array references rather than slices, for situations where parameters must have a given size.
and
let addr: &[u8; 16] = ...;
let mut segments = [0u16; 8];
// array-based API with arrayref
for i in 0 .. 8 {
segments[i] = read_u16_array(array_ref![addr,2*i,2]);
}
Here the array_ref![addr,2*i,2] macro allows us to take an array reference to a slice consisting of two bytes starting at 2*i. Apart from the syntax (less nice than slicing), it is essentially the same as the slice approach. However, this code makes explicit the need for precisely two bytes both in the caller, and in the function signature.
Stable Rust
It's not possible to do this using only safe Rust. To understand why, it's important to understand how these types are implemented. An array is guaranteed to have N initialized elements. It cannot get smaller or larger. At compile time, those guarantees allow the size aspect of the array to be removed, and the array only takes up N * sizeof(element) space.
That means that [T; N] and [T; M] are different types (when N != M) and you cannot convert a reference of one to the other.
The idiomatic solution is to use a slice instead:
fn foo(array: &[u32; 10]) -> &[u32] {
&array[0..5]
}
A slice contains a pointer to the data and the length of the data, thus moving that logic from compile time to run time.
Nightly Rust
You can perform a runtime check that the slice is the correct length and convert it to an array in one step:
#![feature(try_from)]
use std::convert::TryInto;
fn foo(array: &[u32; 10]) -> &[u32; 5] {
array[0..5].try_into().unwrap()
}
fn main() {}
Unsafe Rust
Because someone might want to do this the unsafe way in an earlier version of Rust, I'll present code based on the standard library implementation:
fn foo(array: &[u32; 10]) -> &[u32; 5] {
let slice = &array[0..5];
if slice.len() == 5 {
let ptr = slice.as_ptr() as *const [u32; 5];
unsafe { &*ptr }
} else {
panic!("Needs to be length 5")
}
}
fn main() {
let input = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
let output = foo(&input);
println!("{:?}", output);
}

Resources