How can I drain a vector in chunks? - rust

I would like to have something like:
fn drain_in_chunks<T>(mut v: Vec<T> {
for chunk in v.drain.chunks(2){
do_something(chunk)}
}
where I remove chunks of size two from v in each iteration. Why I want to do this is, because I want to move the chunks into a function. However, I can't move elements from a vector without removing them.
I could do this, but it feels to verbose.
for (i, chunk) in v.chunks(2).enumerate().zip(0..) {
v.drain(i*2..(i+1)*2);
do_something(chunk)
}
Any more elegant solutions?

You can use itertools's tuples():
use itertools::Itertools;
fn drain_in_chunks<T>(mut v: Vec<T>) {
for (a, b) in v.drain(..).tuples() {
do_something([a, b]);
}
}

Related

what is diference between map, and for_each to collect a BTreeMap?

I have a struct that has a field that is a BTreeMap whose value is another struct that implements From<&[u8]>
MyStruct {
...
btree: BTreeMap<String, MyOtherStruct>
...
}
MyOtherStruct implements From<&[u8]> because i'm recovering it from a file.
impl From<&[u8]> for OtherMyStruct {
fn from(stream: &[u8]) -> Self {
...
}
}
I read the file that has a list of MyOtherStruct, and I have a function that parses the stream and returns an array of streams, which represents the streams of each struct MyOtherStruct
fn read_file(path: &PathBuf) -> Vec<u8> {
....
}
fn find_streams(stream: &[u8]) -> Vec<&[u8]> {
....
}
Then to build MyStruct, I take the array of streams and for each stream I create MyOtherStruct from the stream
fn main() {
let file_content = read_file(PathBuf::from("path"));
let streams = find_streams(&file_content);
let mut my_other_structs = BTreeMap::<String, MyOtherStruct>::new();
// here is where i collect my items
streams.iter().for_each(|s| {
let item = MyOtherStruct::from(*s);
my_other_structs.insert(String::from("some key"), item);
});
....
....
}
The question is in the part where I collect my items. Before using a for_each I used a map but the compiler gave me an error that said the trait 'FromIterator<IndexEntry>' is not implemented for 'BTreeMap<std::string::String, IndexEntry>'.
Of course I understand what the compiler error refers to, so I copied the signature of the trait I needed, pasted it into the editor and implemented it.
impl FromIterator<MyOtherStruct> for BTreeMap<String, MyOtherStruct> {
fn from_iter<T: IntoIterator<Item = MyOtherStruct>>(iter: T) -> Self {
let mut btree = BTreeMap::new();
iter.into_iter().for_each(|e| {
btree.insert(String::from("some key"), e);
});
btree
}
}
so, then instead of doing it this way
let mut my_other_structs = BTreeMap::<String, MyOtherStruct>::new();
streams.iter().for_each(|s| {
let item = MyOtherStruct::from(*s);
my_other_structs.insert(String::from("some key"), item);
});
it looked something like this
let my_other_structs = streams.iter()
.map(|s| MyOtherStruct::from(*s) )
.collect();
My question is, beyond cosmetics, is there any significant difference in the way things look on the back end? When assembling my BTreeMap one way or the other.
I mean I love how it looks when I do it with the FromIterator and just use a .map where I need it, but internally I do a for_each and it's the same thing I'm doing the other way without augmenting a .map on top of it.
so is there any relevant difference in this case?
map().collect() is more idiomatic for a couple of reasons. For one a simple for loop is recommended over the use of for_each by it's own documentation unless it makes the code possible or more readable.
The second and more important reason is .collect() can and will use size hints of the iterator where it can and preallocate the storage needed, so it will perform as good or better than for_each(insert).
Your FromIterator<MyOtherStruct> implementation could also be streamlined using the existing impl<K, V> FromIterator<(K, V)> for HashMap<K, V> like this:
impl FromIterator<MyOtherStruct> for BTreeMap<String, MyOtherStruct> {
fn from_iter<T: IntoIterator<Item = MyOtherStruct>>(iter: T) -> Self {
iter.into_iter()
.map(|e| (String::from("some key"), e))
.collect()
}
}
Or depending on your actual uses just do that directly instead of implementing FromIterator in the first place.

Mutably iterate through an iterator using Itertools' tuple_windows

I'm attempting to store a series of entries inside a Vec. Later I need to reprocess through the Vec to fill in some information in each entry about the next entry. The minimal example would be something like this:
struct Entry {
curr: i32,
next: Option<i32>
}
struct History {
entries: Vec<Entry>
}
where I would like to fill in the next fields to the next entries' curr value. To achieve this, I want to make use of the tuple_windows function from Itertools on the mutable iterator. I expect I can write a function like this:
impl History {
fn fill_next_with_itertools(&mut self) {
for (a, b) in self.entries.iter_mut().tuple_windows() {
a.next = Some(b.curr);
}
}
}
(playground)
However, it refuse to compile because the iterator Item's type, &mut Entry, is not Clone, which is required by tuple_windows function. I understand there is a way to iterate through the list using the indices like this:
fn fill_next_with_index(&mut self) {
for i in 0..(self.entries.len()-1) {
self.entries[i].next = Some(self.entries[i+1].curr);
}
}
(playground)
But I feel the itertools' approach more natural and elegant. What's the best ways to achieve the same effect?
From the documentation:
tuple_window clones the iterator elements so that they can be part of successive windows, this makes it most suited for iterators of references and other values that are cheap to copy.
This means that if you were to implement it with &mut items, then you'd have multiple mutable references to the same thing which is undefined behaviour.
If you still need shared, mutable access you'd have to wrap it in Rc<RefCell<T>>, Arc<Mutex<T>> or something similar:
fn fill_next_with_itertools(&mut self) {
for (a, b) in self.entries.iter_mut().map(RefCell::new).map(Rc::new).tuple_windows() {
a.borrow_mut().next = Some(b.borrow().curr);
}
}

How to iterate through the keys of a HashMap in order

I'd like to iterate through the keys of a HashMap in order. Is there an elegant way to do this? The best I can think of is this:
use std::collections::HashMap;
fn main() {
let mut m = HashMap::<String, String>::new();
m.insert("a".to_string(), "1".to_string());
m.insert("b".to_string(), "2".to_string());
m.insert("c".to_string(), "3".to_string());
m.insert("d".to_string(), "4".to_string());
let mut its = m.iter().collect::<Vec<_>>();
its.sort();
for (k, v) in &its {
println!("{}: {}", k, v);
}
}
I'd like to be able to do something like this:
for (k, v) in m.iter_sorted() {
}
for (k, v) in m.iter_sorted_by(...) {
}
Obviously I can write a trait to do that, but my question is does something like this already exist?
Edit: Also, since people are pointing out that BTreeMap is already sorted I should probably note that while this is true, it isn't actually as fast as a HashMap followed by sort() (as long as you only sort it once of course). Here are some benchmark results for random u32->u32 maps:
Additionally, a BTreeMap only allows a single sort order.
HashMap doesn't guarantee a particular order of iteration. Simplest way to achieve consistent order is to use BTreeMap which is based on B-tree, where data is sorted.
You should understand that any implementation will do this in O(n) memory, particularly storing references to all items and at least O(n * log(n)) time to sort data out.
If you understand cost of doing this you can use IterTools::sorted from itertools crate.
use itertools::Itertools; // 0.8.2
use std::collections::HashMap;
fn main() {
let mut m = HashMap::<String, String>::new();
m.insert("a".to_string(), "1".to_string());
m.insert("b".to_string(), "2".to_string());
m.insert("c".to_string(), "3".to_string());
m.insert("d".to_string(), "4".to_string());
println!("{:#?}", m.iter().sorted())
}
Playground link
Based on what #Inline wrote, a more generic solution using HashMap, allowing for sorting by value and changing values. (Note that the content of the HashMap was adjusted in order to make the distinction of sorting by key and value visible.)
use itertools::Itertools; // itertools = "0.10"
use std::collections::HashMap;
fn main() {
let mut m = HashMap::<String, String>::new();
m.insert("a".to_string(), "4".to_string());
m.insert("b".to_string(), "3".to_string());
m.insert("c".to_string(), "2".to_string());
m.insert("d".to_string(), "1".to_string());
// iterate (sorted by keys)
for (k, v) in m.iter().sorted_by_key(|x| x.0) {
println!("k={}, v={}", k, v);
}
println!();
// iterate (sorted by values)
for (k, v) in m.iter().sorted_by_key(|x| x.1) {
println!("k={}, v={}", k, v);
}
println!();
// iterate (sorted by keys), write to values
for (k, v) in m.iter_mut().sorted_by_key(|x| x.0) {
*v += "v"; // append 'v' to value
println!("k={}, v={}", k, v);
}
}
Playground link

How do I shuffle a VecDeque?

I can shuffle a regular vector quite simply like this:
extern crate rand;
use rand::Rng;
fn shuffle(coll: &mut Vec<i32>) {
rand::thread_rng().shuffle(coll);
}
The problem is, my code now requires the use of a std::collections::VecDeque instead, which causes this code to not compile.
What's the simplest way of getting around this?
As of Rust 1.48, VecDeque supports the make_contiguous() method. That method doesn't allocate and has complexity of O(n), like shuffling itself. Therefore you can shuffle a VecDeque by calling make_contiguous() and then shuffling the returned slice:
use rand::prelude::*;
use std::collections::VecDeque;
pub fn shuffle<T>(v: &mut VecDeque<T>, rng: &mut impl Rng) {
v.make_contiguous().shuffle(rng);
}
Playground
Historical answer follows below.
Unfortunately, the rand::Rng::shuffle method is defined to shuffle slices. Due to its own complexity constraints a VecDeque cannot store its elements in a slice, so shuffle can never be directly invoked on a VecDeque.
The real requirement of the values argument to shuffle algorithm are finite sequence length, O(1) element access, and the ability to swap elements, all of which VecDeque fulfills. It would be nice if there were a trait that incorporates these, so that values could be generic on that, but there isn't one.
With the current library, you have two options:
Use Vec::from(deque) to copy the VecDeque into a temporary Vec, shuffle the vector, and return the contents back to VecDeque. The complexity of the operation will remain O(n), but it will require a potentially large and costly heap allocation of the temporary vector.
Implement the shuffle on VecDeque yourself. The Fisher-Yates shuffle used by rand::Rng is well understood and easy to implement. While in theory the standard library could switch to a different shuffle algorithm, that is not likely to happen in practice.
A generic form of the second option, using a trait to express the len-and-swap requirement, and taking the code of rand::Rng::shuffle, could look like this:
use std::collections::VecDeque;
// Real requirement for shuffle
trait LenAndSwap {
fn len(&self) -> usize;
fn swap(&mut self, i: usize, j: usize);
}
// A copy of an earlier version of rand::Rng::shuffle, with the signature
// modified to accept any type that implements LenAndSwap
fn shuffle(values: &mut impl LenAndSwap, rng: &mut impl rand::Rng) {
let mut i = values.len();
while i >= 2 {
// invariant: elements with index >= i have been locked in place.
i -= 1;
// lock element i in place.
values.swap(i, rng.gen_range(0..=i));
}
}
// VecDeque trivially fulfills the LenAndSwap requirement, but
// we have to spell it out.
impl<T> LenAndSwap for VecDeque<T> {
fn len(&self) -> usize {
self.len()
}
fn swap(&mut self, i: usize, j: usize) {
self.swap(i, j)
}
}
fn main() {
let mut v: VecDeque<u64> = [1, 2, 3, 4].into_iter().collect();
shuffle(&mut v, &mut rand::thread_rng());
println!("{:?}", v);
}
You can use make_contiguous (documentation) to create a mutable slice that you can then shuffle:
use rand::prelude::*;
use std::collections::VecDeque;
fn main() {
let mut deque = VecDeque::new();
for p in 0..10 {
deque.push_back(p);
}
deque.make_contiguous().shuffle(&mut rand::thread_rng());
println!("Random deque: {:?}", deque)
}
Playground Link if you want to try it out online.
Shuffle the components of the VecDeque separately, starting with VecDeque.html::as_mut_slices:
use rand::seq::SliceRandom; // 0.6.5;
use std::collections::VecDeque;
fn shuffle(coll: &mut VecDeque<i32>) {
let mut rng = rand::thread_rng();
let (a, b) = coll.as_mut_slices();
a.shuffle(&mut rng);
b.shuffle(&mut rng);
}
As Lukas Kalbertodt points out, this solution never swaps elements between the two slices so a certain amount of randomization will not happen. Depending on your needs of randomization, this may be unnoticeable or a deal breaker.

How to achieve equivalent of take_while on a slice?

Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa
First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.
It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.

Resources