How to iterate through the keys of a HashMap in order - rust

I'd like to iterate through the keys of a HashMap in order. Is there an elegant way to do this? The best I can think of is this:
use std::collections::HashMap;
fn main() {
let mut m = HashMap::<String, String>::new();
m.insert("a".to_string(), "1".to_string());
m.insert("b".to_string(), "2".to_string());
m.insert("c".to_string(), "3".to_string());
m.insert("d".to_string(), "4".to_string());
let mut its = m.iter().collect::<Vec<_>>();
its.sort();
for (k, v) in &its {
println!("{}: {}", k, v);
}
}
I'd like to be able to do something like this:
for (k, v) in m.iter_sorted() {
}
for (k, v) in m.iter_sorted_by(...) {
}
Obviously I can write a trait to do that, but my question is does something like this already exist?
Edit: Also, since people are pointing out that BTreeMap is already sorted I should probably note that while this is true, it isn't actually as fast as a HashMap followed by sort() (as long as you only sort it once of course). Here are some benchmark results for random u32->u32 maps:
Additionally, a BTreeMap only allows a single sort order.

HashMap doesn't guarantee a particular order of iteration. Simplest way to achieve consistent order is to use BTreeMap which is based on B-tree, where data is sorted.
You should understand that any implementation will do this in O(n) memory, particularly storing references to all items and at least O(n * log(n)) time to sort data out.
If you understand cost of doing this you can use IterTools::sorted from itertools crate.
use itertools::Itertools; // 0.8.2
use std::collections::HashMap;
fn main() {
let mut m = HashMap::<String, String>::new();
m.insert("a".to_string(), "1".to_string());
m.insert("b".to_string(), "2".to_string());
m.insert("c".to_string(), "3".to_string());
m.insert("d".to_string(), "4".to_string());
println!("{:#?}", m.iter().sorted())
}
Playground link

Based on what #Inline wrote, a more generic solution using HashMap, allowing for sorting by value and changing values. (Note that the content of the HashMap was adjusted in order to make the distinction of sorting by key and value visible.)
use itertools::Itertools; // itertools = "0.10"
use std::collections::HashMap;
fn main() {
let mut m = HashMap::<String, String>::new();
m.insert("a".to_string(), "4".to_string());
m.insert("b".to_string(), "3".to_string());
m.insert("c".to_string(), "2".to_string());
m.insert("d".to_string(), "1".to_string());
// iterate (sorted by keys)
for (k, v) in m.iter().sorted_by_key(|x| x.0) {
println!("k={}, v={}", k, v);
}
println!();
// iterate (sorted by values)
for (k, v) in m.iter().sorted_by_key(|x| x.1) {
println!("k={}, v={}", k, v);
}
println!();
// iterate (sorted by keys), write to values
for (k, v) in m.iter_mut().sorted_by_key(|x| x.0) {
*v += "v"; // append 'v' to value
println!("k={}, v={}", k, v);
}
}
Playground link

Related

How can I drain a vector in chunks?

I would like to have something like:
fn drain_in_chunks<T>(mut v: Vec<T> {
for chunk in v.drain.chunks(2){
do_something(chunk)}
}
where I remove chunks of size two from v in each iteration. Why I want to do this is, because I want to move the chunks into a function. However, I can't move elements from a vector without removing them.
I could do this, but it feels to verbose.
for (i, chunk) in v.chunks(2).enumerate().zip(0..) {
v.drain(i*2..(i+1)*2);
do_something(chunk)
}
Any more elegant solutions?
You can use itertools's tuples():
use itertools::Itertools;
fn drain_in_chunks<T>(mut v: Vec<T>) {
for (a, b) in v.drain(..).tuples() {
do_something([a, b]);
}
}

Mutably iterate through an iterator using Itertools' tuple_windows

I'm attempting to store a series of entries inside a Vec. Later I need to reprocess through the Vec to fill in some information in each entry about the next entry. The minimal example would be something like this:
struct Entry {
curr: i32,
next: Option<i32>
}
struct History {
entries: Vec<Entry>
}
where I would like to fill in the next fields to the next entries' curr value. To achieve this, I want to make use of the tuple_windows function from Itertools on the mutable iterator. I expect I can write a function like this:
impl History {
fn fill_next_with_itertools(&mut self) {
for (a, b) in self.entries.iter_mut().tuple_windows() {
a.next = Some(b.curr);
}
}
}
(playground)
However, it refuse to compile because the iterator Item's type, &mut Entry, is not Clone, which is required by tuple_windows function. I understand there is a way to iterate through the list using the indices like this:
fn fill_next_with_index(&mut self) {
for i in 0..(self.entries.len()-1) {
self.entries[i].next = Some(self.entries[i+1].curr);
}
}
(playground)
But I feel the itertools' approach more natural and elegant. What's the best ways to achieve the same effect?
From the documentation:
tuple_window clones the iterator elements so that they can be part of successive windows, this makes it most suited for iterators of references and other values that are cheap to copy.
This means that if you were to implement it with &mut items, then you'd have multiple mutable references to the same thing which is undefined behaviour.
If you still need shared, mutable access you'd have to wrap it in Rc<RefCell<T>>, Arc<Mutex<T>> or something similar:
fn fill_next_with_itertools(&mut self) {
for (a, b) in self.entries.iter_mut().map(RefCell::new).map(Rc::new).tuple_windows() {
a.borrow_mut().next = Some(b.borrow().curr);
}
}

Is there a nicer way to create a vector from indexed data in rust?

I have the problem, that I get some data from an iterator (specifically a bevy Query) and part of that data is an index. I now want to create a vector with this data, where every element is placed at the index it is intended to go.
To illustrate the problem, this is how I might solve this in python:
def iterator():
# The iterator doesn't necessarily give back the data in the correct order
# However, the indices are always dense from 0...n
yield ("A", 0)
yield ("C", 2)
yield ("B", 1)
def some_computation(x):
return x + " + some other data"
data_list = [0] * 3 # I know the length
for data, index in iterator():
data_list[index] = some_computation(data)
print(data_list)
Below is how I currently implemented this in rust, but I'm not super happy about it. Firstly, it takes O(n log n) time.
Secondly, its feels difficult to read, especially the conversion to get rid of the index in the data.
Thirdly, I haven't yet figured out how to extract this into its own function without violating lifetimes.
Open in the Rust Playground
// I have so far avoided deriving Clone on this struct.
#[derive(Debug)]
struct OwnedDataWithReference<'a> {
data: &'a str,
}
fn do_something_with_data_list_and_drop_it(x: Vec<OwnedDataWithReference>) {
println!("{:?}", x);
}
fn main() {
// In my application this is a bevy query, which can't be indexed.
let iterator = vec![("A", 0), ("C", 2), ("B", 1)];
// First the data is created and stored in a vector with a index associated
// and only subsequently sorted, which costs runtime, but has nice ownership.
let mut data_list_with_index = Vec::new();
for (data_ref, index) in iterator.iter() {
// In my application I first have to fetch data using data_ref
let new_data = OwnedDataWithReference {
data: data_ref,
};
data_list_with_index.push((new_data, index));
}
// This reverse sort and stack based map is needed because I don't want to
// make `OwnedDataWithReference` Clone. This way, I can transfer ownership.
data_list_with_index.sort_by_key(|(_, i)| -*i);
let mut data_list = Vec::new();
while !data_list_with_index.is_empty() {
data_list.push(data_list_with_index.pop().unwrap().0);
}
// In my application I then serialize this with serde.
do_something_with_data_list_and_drop_it(data_list);
}
I have tried an approach with using Vec<Option<OwnedDataWithReference>>, but this complained that OwnedDataWithReference is not Clone, which I've tried to avoid so far.
Please let me know if you can find a nicer solution to this problem, or if you think I should derive Clone. (The problem is still not solved then, but I think it gets easier)
I have tried an approach with using Vec<Option<OwnedDataWithReference>>, but this complained that OwnedDataWithReference is not Clone, which I've tried to avoid so far.
You probably tried creating a Vec of None values using the vec! macro which requires T: Clone. You can instead create it using another way, for example (0..n).map(|_| None).collect(). With that, this can be done like this:
#[derive(Debug)]
struct OwnedDataWithReference<'a> {
data: &'a str,
other_stuff: u32,
}
fn do_something_with_data_list_and_drop_it(x: Vec<OwnedDataWithReference>) {
println!("{:?}", x);
}
fn main() {
let iterator = vec![("A", 0), ("C", 2), ("B", 1)];
let mut data_list = (0..iterator.len()).map(|_| None).collect::<Vec<_>>();
for (data_ref, index) in iterator.iter() {
let new_data = OwnedDataWithReference {
data: data_ref,
other_stuff: 42,
};
data_list[*index] = Some(new_data);
}
// This will panic if any element is still `None` at this point.
let data_list = data_list.into_iter().map(Option::unwrap).collect();
do_something_with_data_list_and_drop_it(data_list);
}
Playground

How to sort a vector containing structs?

Lets say I have some code like:
struct GenericStruct {
a: u8,
b: String,
}
fn sort_array(generic_vector: Vec<GenericStruct>) -> Vec<GenericStruct> {
// Some code here to sort a vector.
todo!();
}
fn main() {
let some_words = String::from("Hello Word");
let x = GenericStruct { a: 25, b: some_words };
let some_vector: Vec<GenericStruct> = vec![x];
}
How could I sort vectors based on one part of it, such as sorting by "a", or sorting by the length of "b"?
Two possiblities.
Either implement the Ord trait for your struct, or use the sort_unstable_by_key method.
You'd use the former if, for your generic_struct, there is an obvious and single way to sort them that makes sense not just in your current sorting use case but generally.
You'd use the latter if this sorting scheme is more of a "one off".
somevector.sort_unstable_by_key(|element| element.a)

How to partition vector of results in Rust?

Basically, I'm looking for the partitionEithers equivalent in Rust, i.e. to convert Vec<Result<A, B>> into Result<Vec<A>, Vec<B>>.
I know I can transform a Vec<Result<A, B>> into Result<Vec<A>, B> by using collect::<Result<Vec<A>, B>>, but when I try collect::<Result<Vec<A>, Vec<B>>>, I'll get an error saying such implementation is missing.
Also I know this can be done using mutations, but I'm wondering if there are any immutable alternatives that I can look into?
You can partition by using the partition() method, on your particular case, use partition_map() from itertools:
use itertools::{Either, Itertools};
fn main() {
let successes_and_failures = vec![Ok(1), Err(false), Err(true), Ok(2)];
let (successes, failures): (Vec<_>, Vec<_>) =
successes_and_failures
.into_iter()
.partition_map(|r| match r {
Ok(v) => Either::Left(v),
Err(v) => Either::Right(v),
});
assert_eq!(successes, [1, 2]);
assert_eq!(failures, [false, true]);
}

Resources