How to resize and rehash for SeparateChainingHashST? - rust

I am implementing the SeparateChainingHashST in Rust. The complete code can be found here.
pub struct SeparateChainingHashST<K, V> {
n: usize, // number of key-value pairs
m: usize, // hash table size
st: Vec<SequentialSearchST<K, V>>,
}
, where SequentialSearchST is a linked list.
What bothers me now is how to implement the resize method. Its main idea is: create a new SeparateChainingHashST with the new size, and then put every key/value into this new symbol table.
What I have done:
fn resize(self, chains: usize) -> Self {
let mut tmp = SeparateChainingHashST::new(chains);
for table in self.st.into_iter() {
for (k, v) in table.into_items() {
tmp.put(k, v);
}
}
tmp
}
Since put() method accepts K and V by value, I add an into iterator into_items to generate (K, V).
This version does not really make sense, and what I want is fn resize(&mut self, chains: usize). Note that I don't like to add Copy or Clone trait to K or V.

The key point is to take value from self.st (a vector), and since the order is not important here, one solution is to use swap_remove:
fn resize(&mut self, chains: usize) {
let mut tmp = SeparateChainingHashST::new(chains);
for _ in 0..self.m {
for (k, v) in self.st.swap_remove(0).into_items() {
tmp.put(k, v);
}
}
*self = tmp;
}
Alternatively, using pop.
fn resize(&mut self, chains: usize) {
let mut tmp = SeparateChainingHashST::new(chains);
while let Some(table) = self.st.pop() {
for (k, v) in table.into_items() {
tmp.put(k, v);
}
}
*self = tmp;
}

Related

Is there something similar to try_retain()?

Is it possible to apply FnMut(&mut V) -> Result<bool> for collection elements and if result is:
Ok(false) -- remove element and continue
Ok(true) -- leave element in collection and continue
Err(_) -- stop and return the error
Basically, how to code a try_retain(), an equivalent of this C++ code:
struct S {
// ...
};
// Updates `s` (and possibly some external state), returns true if `s` is no longer needed
// throws on error
bool bar(S& s);
void foo(std::map<int, S>& m) {
try
{
for(auto it = m.begin(), it_end = m.end(); it != it_end; ) {
if (bar(it->second))
m.erase(it++);
else
++it;
}
}
catch(...)
{
printf("bailed early\n");
throw;
}
}
For some reason I keep running into this pattern and I can't figure out how to do it in Rust without traversing collection twice or using additional memory...
This won't short-circuit, but you can do it by storing the global result in a mutable variable that's updated by the closure you pass to retain. Something like this:
use std::collections::HashMap;
fn try_retain<K, V, E, F>(m: &mut HashMap<K, V>, mut f: F) -> Result<(), E>
where
F: FnMut(&K, &mut V) -> Result<bool, E>,
{
let mut result = Ok(());
m.retain(|k, v| match f(k, v) {
Ok(b) => b,
Err(e) => {
result = Err(e);
true
}
});
return result;
}
Playground
If you're fine with switching to hashbrown::HashMap (which is what std::collections::HashMap uses as it's inner implementation), you could copy the implementation of fn retain and slightly change it to use the ? operator to return the first error:
fn try_retain<K, V, E, F>(m: &mut hashbrown::HashMap<K, V>, mut f: F) -> Result<(), E>
where
F: FnMut(&K, &mut V) -> Result<bool, E>,
{
let mut raw_table = m.raw_table();
// Here we only use `iter` as a temporary, preventing use-after-free
unsafe {
for item in raw_table.iter() {
let &mut (ref key, ref mut value) = item.as_mut();
if !f(key, value)? {
raw_table.erase(item);
}
}
}
Ok(())
}
This requires hashbrown's raw feature to be enabled, so you can access the underlying raw table.
This stops at the first error, and the elements are visited in a random order, so it's non-deterministic which elements are removed if an error occurs.
We could use try_for_each and remember the keys to remove (not to retain) till the end of the iteration because of the borrow checker:
fn try_retain(
map: &mut HashMap<i32, i32>,
to_retain: fn(k: i32, v: i32) -> bool,
) -> Result<bool, String> {
let mut to_remove: Vec<i32> = vec![];
map.iter().try_for_each(|(k, v)| {
if has_error() {
return Err("bailed early".to_string());
}
if !to_retain(*k, *v) {
to_remove.push(*k);
}
Ok::<_, String>(())
})?; // short-circuiting
if to_remove.is_empty() {
Ok(true)
} else {
(*map).retain(|&k, _| !to_remove.contains(&k));
Ok(false)
}
}
The return value results from whether there are keys to remove.
Can short-circuiting.
Playground
If the extra space for the to-remove-items is still too much, here is an unstable solution with drain_filter:
fn try_retain(
map: &mut HashMap<i32, i32>,
to_retain: fn(k: i32, v: i32) -> bool,
) -> bool {
let mut rslt = true;
map.drain_filter(|k, v| {
if !to_retain(*k, *v) {
rslt = false;
true
} else {
false
}
});
rslt
}
Short-circuiting is only possible through panic.
Playground

Caching/memoization vs object lifetime

My program is structured as a series of function calls building up the resulting value - each function returns (moves) the returned value to it's caller. This is a simplified version:
struct Value {}
struct ValueBuilder {}
impl ValueBuilder {
pub fn do_things_with_value(&mut self, v : &Value) {
// expensive computations
}
pub fn make_value(&self) -> Value {
Value {}
}
pub fn f(&mut self) -> Value {
let v = self.make_value();
self.do_things_with_value(&v);
v
}
pub fn g(&mut self) -> Value {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
play.rust-lang version
Imagine that there are many more functions similar to f and g, both between them and above. You can see that do_things_with_value is called twice with the same value. I would like to cache/memoize this call so that in the example below "expensive computations" are performed only once. This is my (obviously incorrect) attempt:
#[derive(PartialEq)]
struct Value {}
struct ValueBuilder<'a> {
seen_values: Vec<&'a Value>,
}
impl<'a> ValueBuilder<'a> {
pub fn do_things_with_value(&mut self, v: &'a Value) {
if self.seen_values.iter().any(|x| **x == *v) {
return;
}
self.seen_values.push(v)
// expensive computations
}
pub fn make_value(&self) -> Value {
Value {}
}
pub fn f(&mut self) -> Value {
let v = self.make_value();
self.do_things_with_value(&v); // error: `v` does not live long enough
v
}
pub fn g(&mut self) -> Value {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
play.rust-lang version
I understand why the compiler is doing it - while in this case it happens that v is not dropped between two calls to do_things_with_value, there is no guarantee that it will not be dropped, and dereferencing it would crash the program.
What is a better way to structure this program? Let's assume that:
cloning and storing Values is expensive, and we can't afford seen_values keeping a copy of everything we've ever seen
we also can't refactor the code / Value object to carry additional data (i.e. a bool indicating whether we did expensive computations with this value). It needs to rely on comparing the values using PartialEq
If you need to keep the same value at different points in the program it's easiest to copy or clone it.
However, if cloning is not an option because it is too expensive wrap the values in an Rc. That is a reference counted smart pointer which allows shared ownership of its content. It is relatively cheap to clone without duplicating the contained value.
Note that simply storing Rc<Value> in seen_values will keep all values alive at least as long as the value builder lives. You can avoid that by storing Weak references.
use std::rc::{Rc, Weak};
#[derive(PartialEq)]
struct Value {}
struct ValueBuilder {
seen_values: Vec<Weak<Value>>,
}
impl ValueBuilder {
pub fn do_things_with_value(&mut self, v: &Rc<Value>) {
if self
.seen_values
.iter()
.any(|x| x.upgrade().as_ref() == Some(v))
{
return;
}
self.seen_values.push(Rc::downgrade(v))
// expensive computations
}
pub fn make_value(&self) -> Rc<Value> {
Rc::new(Value {})
}
pub fn f(&mut self) -> Rc<Value> {
let v = self.make_value();
self.do_things_with_value(&v);
v
}
pub fn g(&mut self) -> Rc<Value> {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
While a Rc<Value> is in use by the chain of functions do_things() will remember the value and skip computations. If a value becomes unused (all references dropped) and is later created again, do_things() will repeat the computations.

Rust function that accepts either HashMap and BtreeMap

fn edit_map_values(
map1: &mut HashMap<String, i128> || &mut BTreeMap<String, i128>){
for tuple in map1.iter_mut() {
if !map1.contains_key(&"key1") {
*tuple.1 += 1;
}
}
map1.insert(&"key2", 10);
}
How do I write one function that accepts either HashMap and BtreeMap like in the example above?
It is possible to abstract over types by using traits and for your specific use-case, you can take a look at this more constrained example.
use core::{borrow::Borrow, hash::Hash};
use std::collections::{BTreeMap, HashMap};
trait GenericMap<K, V> {
fn contains_key<Q>(&self, k: &Q) -> bool
where
K: Borrow<Q>,
Q: Hash + Eq + Ord;
fn each_mut<F>(&mut self, cb: F)
where
F: FnMut((&K, &mut V));
fn insert(&mut self, key: K, value: V) -> Option<V>;
}
impl<K, V> GenericMap<K, V> for HashMap<K, V>
where
K: Eq + Hash,
{
fn contains_key<Q>(&self, k: &Q) -> bool
where
K: Borrow<Q>,
Q: Hash + Eq + Ord,
{
self.contains_key(k)
}
fn each_mut<F>(&mut self, mut cb: F)
where
F: FnMut((&K, &mut V)),
{
self.iter_mut().for_each(|x| cb(x))
}
fn insert(&mut self, key: K, value: V) -> Option<V> {
self.insert(key, value)
}
}
impl<K, V> GenericMap<K, V> for BTreeMap<K, V>
where
K: Ord,
{
fn contains_key<Q>(&self, k: &Q) -> bool
where
K: Borrow<Q>,
Q: Hash + Eq + Ord,
{
self.contains_key(k)
}
fn each_mut<F>(&mut self, mut cb: F)
where
F: FnMut((&K, &mut V)),
{
self.iter_mut().for_each(|x| cb(x))
}
fn insert(&mut self, key: K, value: V) -> Option<V> {
self.insert(key, value)
}
}
fn edit_map_values<T: GenericMap<String, i128>>(map: &mut T) {
map.each_mut(|(k, v)| {
if k != "key1" {
*v += 1;
}
});
map.insert("key2".into(), 10);
}
fn main() {
let mut hm: HashMap<String, i128> = [("One".into(), 1), ("Two".into(), 2)]
.iter()
.cloned()
.collect();
let mut btm: BTreeMap<String, i128> = [("Five".into(), 5), ("Six".into(), 6)]
.iter()
.cloned()
.collect();
dbg!(&hm);
dbg!(&btm);
edit_map_values(&mut hm);
edit_map_values(&mut btm);
dbg!(&hm);
dbg!(&btm);
}
Way back before the 1.0 release, there used to be Map and MutableMap traits, but they have been removed before stabilization. The Rust type system is currently unable to express these traits in a nice way due to the lack of higher kinded types.
The eclectic crate provides experimental collection traits, but they haven't been updated for a year, so I'm not sure they are still useful for recent versions of Rust.
Further information:
Does Rust have Collection traits?
No common trait for Map types? (Rust language forum)
Associated type constructors, part 1: basic concepts and introduction (blog post by Niko Matsakis)
Generic associated type RFC
While there is no common Map trait, you could use a combination of other traits to operate on an Iterator to achieve similar functionality. Although this might not be very memory efficient due to cloning, and also a bit involved depending on the kind of operation you are trying to perform. The operation you tried to do may be implemented like this:
fn edit_map_values<I>(map: &mut I)
where
I: Clone + IntoIterator<Item = (String, i128)> + std::iter::FromIterator<(String, i128)>,
{
// Since into_iter consumes self, we have to clone here.
let (keys, _values): (Vec<String>, Vec<_>) = map.clone().into_iter().unzip();
*map = map
.clone()
.into_iter()
// iterating while mutating entries can be done with map
.map(|mut tuple| {
if !keys.contains(&"key1".to_string()) {
tuple.1 += 1;
}
tuple
})
// inserting an element can be done with chain and once
.chain(std::iter::once(("key2".into(), 10)))
.collect();
// removing an element could be done with filter
// removing and altering elements could be done with filter_map
// etc.
}
fn main() {
use std::collections::{BTreeMap, HashMap};
{
let mut m = HashMap::new();
m.insert("a".to_string(), 0);
m.insert("key3".to_string(), 1);
edit_map_values(&mut m);
println!("{:#?}", m);
}
{
let mut m = BTreeMap::new();
m.insert("a".to_string(), 0);
m.insert("key3".to_string(), 1);
edit_map_values(&mut m);
println!("{:#?}", m);
}
}
Both times the output is the same, except for the order of the HashMap of course:
{
"a": 1,
"key2": 10,
"key3": 2,
}

How do I remove excessive `clone` calls from a struct that caches arbitrary results?

I am reading the section on closures in the second edition of the Rust book. At the end of this section, there is an exercise to extend the Cacher implementation given before. I gave it a try:
use std::clone::Clone;
use std::cmp::Eq;
use std::collections::HashMap;
use std::hash::Hash;
struct Cacher<T, K, V>
where
T: Fn(K) -> V,
K: Eq + Hash + Clone,
V: Clone,
{
calculation: T,
values: HashMap<K, V>,
}
impl<T, K, V> Cacher<T, K, V>
where
T: Fn(K) -> V,
K: Eq + Hash + Clone,
V: Clone,
{
fn new(calculation: T) -> Cacher<T, K, V> {
Cacher {
calculation,
values: HashMap::new(),
}
}
fn value(&mut self, arg: K) -> V {
match self.values.clone().get(&arg) {
Some(v) => v.clone(),
None => {
self.values
.insert(arg.clone(), (self.calculation)(arg.clone()));
self.values.get(&arg).unwrap().clone()
}
}
}
}
After creating a version that finally works, I am really unhappy with it. What really bugs me is that cacher.value(...) has 5(!) calls to clone() in it. Is there a way to avoid this?
Your suspicion is correct, the code contains too many calls to clone(), defeating the very optimizations Cacher is designed to achieve.
Cloning the entire cache
The one to start with is the call to self.values.clone() - it creates a copy of the entire cache on every single access.
After non-lexical lifetimes
Remove this clone.
Before non-lexical lifetimes
As you likely discovered yourself, simply removing .clone() doesn't compile. This is because the borrow checker considers the map referenced for the entire duration of match. The shared reference returned by HashMap::get points to the item inside the map, which means that while it exists, it is forbidden to create another mutable reference to the same map, which is required by HashMap::insert. For the code to compile, you need to split up the match in order to force the shared reference to go out of scope before insert is invoked:
// avoids unnecessary clone of the whole map
fn value(&mut self, arg: K) -> V {
if let Some(v) = self.values.get(&arg).map(V::clone) {
return v;
} else {
let v = (self.calculation)(arg.clone());
self.values.insert(arg, v.clone());
v
}
}
This is much better and probably "good enough" for most practical purposes. The hot path, where the value is already cached, now consists of only a single clone, and that one is actually necessary because the original value must remain in the hash map. (Also, note that cloning doesn't need to be expensive or imply deep copying - the stored value can be an Rc<RealValue>, which buys object sharing for free. In that case, clone() will simply increment the reference count on the object.)
Clone on cache miss
In case of cache miss, the key must be cloned, because calculation is declared to consume it. A single cloning will be sufficient, though, so we can pass the original arg to insert without cloning it again. The key clone still feels unnecessary, though - a calculation function shouldn't require ownership of the key it is transforming. Removing this clone boils down to modifying the signature of the calculation function to take the key by reference. Changing the trait bounds of T to T: Fn(&K) -> V allows the following formulation of value():
// avoids unnecessary clone of the key
fn value(&mut self, arg: K) -> V {
if let Some(v) = self.values.get(&arg).map(V::clone) {
return v;
} else {
let v = (self.calculation)(&arg);
self.values.insert(arg, v.clone());
v
}
}
Avoiding double lookups
Now are left with exactly two calls to clone(), one in each code path. This is optimal, as far as value cloning is concerned, but the careful reader will still be nagged by one detail: in case of cache miss, the hash table lookup will effectively happen twice for the same key: once in the call to HashMap::get, and then once more in HashMap::insert. It would be nice if we could instead reuse the work done the first time and perform only one hash map lookup. This can be achieved by replacing get() and insert() with entry():
// avoids the second lookup on cache miss
fn value(&mut self, arg: K) -> V {
match self.values.entry(arg) {
Entry::Occupied(entry) => entry.into_mut(),
Entry::Vacant(entry) => {
let v = (self.calculation)(entry.key());
entry.insert(v)
}
}.clone()
}
We've also taken the opportunity to move the .clone() call after the match.
Runnable example in the playground.
I was solving the same exercise and ended with the following code:
use std::thread;
use std::time::Duration;
use std::collections::HashMap;
use std::hash::Hash;
use std::fmt::Display;
struct Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash + Clone,
{
calculation: T,
values: HashMap<P, R>,
}
impl<P, R, T> Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash + Clone,
{
fn new(calculation: T) -> Cacher<P, R, T> {
Cacher {
calculation,
values: HashMap::new(),
}
}
fn value<'a>(&'a mut self, key: P) -> &'a R {
let calculation = &self.calculation;
let key_copy = key.clone();
self.values
.entry(key_copy)
.or_insert_with(|| (calculation)(&key))
}
}
It only makes a single copy of the key in the value() method. It does not copy the resulting value, but instead returns a reference with a lifetime specifier, which is equal to the lifetime of the enclosing Cacher instance (which is logical, I think, because values in the map will continue to exist until the Cacher itself is dropped).
Here's a test program:
fn main() {
let mut cacher1 = Cacher::new(|num: &u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
*num
});
calculate_and_print(10, &mut cacher1);
calculate_and_print(20, &mut cacher1);
calculate_and_print(10, &mut cacher1);
let mut cacher2 = Cacher::new(|str: &&str| -> usize {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
str.len()
});
calculate_and_print("abc", &mut cacher2);
calculate_and_print("defghi", &mut cacher2);
calculate_and_print("abc", &mut cacher2);
}
fn calculate_and_print<P, R, T>(intensity: P, cacher: &mut Cacher<P, R, T>)
where
T: Fn(&P) -> R,
P: Eq + Hash + Clone,
R: Display,
{
println!("{}", cacher.value(intensity));
}
And its output:
calculating slowly...
10
calculating slowly...
20
10
calculating slowly...
3
calculating slowly...
6
3
If you remove the requirement of returning values, you don't need to perform any clones by making use of the Entry:
use std::{
collections::{hash_map::Entry, HashMap},
fmt::Display,
hash::Hash,
thread,
time::Duration,
};
struct Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
{
calculation: T,
values: HashMap<P, R>,
}
impl<P, R, T> Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
{
fn new(calculation: T) -> Cacher<P, R, T> {
Cacher {
calculation,
values: HashMap::new(),
}
}
fn value<'a>(&'a mut self, key: P) -> &'a R {
let calculation = &self.calculation;
match self.values.entry(key) {
Entry::Occupied(e) => e.into_mut(),
Entry::Vacant(e) => {
let result = (calculation)(e.key());
e.insert(result)
}
}
}
}
fn main() {
let mut cacher1 = Cacher::new(|num: &u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(1));
*num
});
calculate_and_print(10, &mut cacher1);
calculate_and_print(20, &mut cacher1);
calculate_and_print(10, &mut cacher1);
let mut cacher2 = Cacher::new(|str: &&str| -> usize {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
str.len()
});
calculate_and_print("abc", &mut cacher2);
calculate_and_print("defghi", &mut cacher2);
calculate_and_print("abc", &mut cacher2);
}
fn calculate_and_print<P, R, T>(intensity: P, cacher: &mut Cacher<P, R, T>)
where
T: Fn(&P) -> R,
P: Eq + Hash,
R: Display,
{
println!("{}", cacher.value(intensity));
}
You could then choose to wrap this in another struct that performed a clone:
struct ValueCacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
R: Clone,
{
cacher: Cacher<P, R, T>,
}
impl<P, R, T> ValueCacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
R: Clone,
{
fn new(calculation: T) -> Self {
Self {
cacher: Cacher::new(calculation),
}
}
fn value(&mut self, key: P) -> R {
self.cacher.value(key).clone()
}
}

How do I efficiently build a vector and an index of that vector while processing a data stream?

I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?
In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.
not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}
The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.

Resources