Prehash a struct - rust

I have struct with many fields that I want to use as a key in a HashMap.
I often need to use the struct multiple times to access different HashMaps and I don't want to compute the hash and possible clone each time as the program needs to be as performant as possible and it will be accessing the HashMaps a lot (billions of times so the time really stacks up).
Here is a simplified example:
use std::collections::HashMap;
#[derive(Hash, Eq, PartialEq, Clone)]
struct KeyStruct {
field1: usize,
field2: bool,
}
fn main() {
// This is what I'm doing now
let key = KeyStruct { field1: 1, field2: true };
// This is what I'd like to do
// let key = key.get_hash()
let mut map1 = HashMap::new();
let mut map2 = HashMap::new();
let mut map3 = HashMap::new();
let mut map4 = HashMap::new();
if !map1.contains_key(&key) {
map1.insert(key.clone(), 1);
}
if !map2.contains_key(&key) {
map2.insert(key.clone(), 2);
}
if !map3.contains_key(&key) {
map3.insert(key.clone(), 3);
}
if !map4.contains_key(&key) {
map4.insert(key.clone(), 4);
}
}
I never actually use the values in the KeyStruct, I just want to use it as a key to the HashMaps. I would like to avoid hashing it multiple times and cloning it like is done in that example.

Related

How can I remove duplicates in HashMap values and replace them with references

My hashmap has a lot of duplicate values in it, and instead of leaving them as their own variables, I want to create another hashmap of every possible unique value, then in the original hashmap, it simply just references one of the values in the unique vector.
let mut styles: HashMap<String, Character> = Vec::new();
Then it my function return the new optimized vector in a struct
pub fn loadtheme(filepath: &str) -> Theme {
[...]
mut styles: HashMap<String, Character> = Vec::new(); //
[...]
Theme {
version,
name,
styledata,
}
}
Theme Struct:
#[derive(Debug)]
pub struct Theme {
version: u16,
name: String,
styledata: HashMap<(String, u32), HashMap<String, Character>>,
}
Here is my current function that removes duplicates and replaces them with references
// Get every possible style with no duplicates
let mut nodupstyles: Vec<&HashMap<String, Character>> = Vec::new();
for (_, style) in elementsetupcharacters.iter() {
if !nodupstyles.contains(&style) {
nodupstyles.push(style);
}
}
// Compile Style Data
let mut styledata: HashMap<(String, u32), HashMap<String, Character>> = HashMap::new();
for (props, style) in elementsetupcharacters.clone().into_iter() {
// Loop through possible styles and get one that matches the style
let stylereference: HashMap<String, Character> = HashMap::new();
for nodupstyle in nodupstyles.iter() {
if nodupstyle.keys().cloned().collect::<Vec<String>>()
== style.keys().cloned().collect::<Vec<String>>()
&& nodupstyle.values().cloned().collect::<Vec<Character>>()
== style.values().cloned().collect::<Vec<Character>>()
{
let stylereference = &nodupstyle;
break;
}
}
if props.1.len() > 0 {
styledata.insert((String::from(props.0), bitstou32(props.1)), stylereference);
} else {
styledata.insert((String::from(props.0), 0), stylereference);
}
}
What is the best way I could achieve this functionality?

Storing an iterator for a HashMap in a struct

Edit
As it seemms from the suggested solution, What I'm trying to achieve seems impossible/Not the correct way, therefore - I'll explain the end goal here:
I am parsing the values for Foo from a YAML file using serde, and I would like to let the user get one of those stored values from the yaml at a time, this is why I wanted to store an iterator in my struct
I have two struct similar to the following:
struct Bar {
name: String,
id: u32
}
struct Foo {
my_map: HashMap<String, Bar>
}
In my Foo struct, I wish to store an iterator to my HashMap, so a user can borrow values from my map on demand.
Theoretically, the full Foo class would look something like:
struct Foo {
my_map: HashMap<String, Bar>,
my_map_iter: HashMap<String, Bar>::iterator
}
impl Foo {
fn get_pair(&self) -> Option<(String, Bar)> {
// impl...
}
}
But I can't seem to pull it off and create such a variable, no matter what I try (Various compilation errors which seems like I'm just trying to do that wrong).
I would be glad if someone can point me to the correct way to achieve that and if there is a better way to achieve what I'm trying to do - I would like to know that.
Thank you!
I am parsing the values for Foo from a YAML file using serde
When you parse them you should put the values in a Vec instead of a HashMap.
I imagine the values you have also have names which is why you thought a HashMap would be good. You could instead store them like so:
let parsed = vec![]
for _ in 0..n_to_parse {
// first item of the tuple is the name second is the value
let key_value = ("Get from", "serde");
parsed.push(key_value);
}
then once you stored it like so it will be easy to get the pairs from it by keeping track of the current index:
struct ParsedHolder {
parsed: Vec<(String, String)>,
current_idx: usize,
}
impl ParsedHolder {
fn new(parsed: Vec<(String, String)>) -> Self {
ParsedHolder {
parsed,
current_idx: 0,
}
}
fn get_pair(&mut self) -> Option<&(String, String)> {
if let Some(pair) = self.parsed.get(self.current_idx) {
self.current_idx += 1;
Some(pair)
} else {
self.current_idx = 0;
None
}
}
}
Now this could be further improved upon by using VecDeque which will allow you to efficiently take out the first element of parsed. Which will make it easy to not use clone. But this way you will be only able to go through all the parsed values once which I think is actually what you want in your use case.
But I'll let you implement VecDeque 😃
The reason why this is a hard is that unless we make sure the HashMap isn't mutated while we iterate we could get into some trouble. To make sure the HashMap is immutable until the iterator lives:
use std::collections::HashMap;
use std::collections::hash_map::Iter;
struct Foo<'a> {
my_map: &'a HashMap<u8, u8>,
iterator: Iter<'a, u8, u8>,
}
fn main() {
let my_map = HashMap::new();
let iterator = my_map.iter();
let f = Foo {
my_map: &my_map,
iterator: iterator,
};
}
If you can make sure or know that the HashMap won't have new keys or keys removed from it (editing values with existing keys is fine) then you can do this:
struct Foo {
my_map: HashMap<String, String>,
current_idx: usize,
}
impl Foo {
fn new(my_map: HashMap<String, String>) -> Self {
Foo {
my_map,
current_idx: 0,
}
}
fn get_pair(&mut self) -> Option<(&String, &String)> {
if let Some(pair) = self.my_map.iter().skip(self.current_idx).next() {
self.current_idx += 1;
Some(pair)
} else {
self.current_idx = 0;
None
}
}
fn get_pair_cloned(&mut self) -> Option<(String, String)> {
if let Some(pair) = self.my_map.iter().skip(self.current_idx).next() {
self.current_idx += 1;
Some((pair.0.clone(), pair.1.clone()))
} else {
self.current_idx = 0;
None
}
}
}
This is fairly inefficient though because we need to iterate though the keys to find the next key each time.

Group vector of structs by field

I want to create a vector with all of the matching field id from the struct, process that new vector and then repeat the process. Basically grouping together the structs with matching field id.
Is there a way to do this by not using the unstable feature drain_filter?
#![feature(drain_filter)]
#[derive(Debug)]
struct Person {
id: u32,
}
fn main() {
let mut people = vec![];
for p in 0..10 {
people.push(Person { id: p });
}
while !people.is_empty() {
let first_person_id = people.first().unwrap().id;
let drained: Vec<Person> = people.drain_filter(|p| p.id == first_person_id).collect();
println!("{:#?}", drained);
}
}
Playground
If you are looking to group your vector by the person id, it's likely to be more efficient using a HashMap from id to Vec<Person>, where each id hold a vector of persons. And then you can loop through the HashMap and process each vector / group. This is potentially more efficient than draining people in each iteration, which in worst case has O(N^2) time complexity while with a HashMap the time complexity is O(N).
#![feature(drain_filter)]
use std::collections::HashMap;
#[derive(Debug)]
struct Person {
id: u32,
}
fn main() {
let mut people = vec![];
let mut groups: HashMap<u32, Vec<Person>> = HashMap::new();
for p in 0..10 {
people.push(Person { id: p });
}
people.into_iter().for_each(|person| {
let group = groups.entry(person.id).or_insert(vec![]);
group.push(person);
});
for (_id, group) in groups {
println!("{:#?}", group);
}
}
Playground

Adding a new key to each element of vector in Rust

I have a vector of structures. I want to add one additional field to each element. What's the best way to do that?
Something like this:
// Pseudo code
let items = vec![elem1, elem2, elem3, elem4];
for x in items {
// Something like this
x["some_additional_key"] = get_data(x);
}
//
// Now I have items[i].some_additional_key in each element
Rust is a statically-typed language; you may be familiar with other similar languages like C++, Java or Swift. In these languages, the members, types, and layout of a struct are fixed when the program is compiled.
Because of this, there's no way to add a new struct field at runtime — no "ifs", "ands", or "buts" — you can't do it.
Instead, you have to model that dynamic nature some other way:
Use a type that allows for arbitrary expansion. HashMap and BTreeMap (and many other similar types) allow you to have an arbitrary number of key-value pairs. Under the hood, this is basically how many dynamic languages work - a mapping of strings to arbitrary values:
use std::collections::HashMap;
#[derive(Debug, Default)]
struct Element(HashMap<String, u8>);
fn get_data(_: &Element) -> u8 {
42
}
fn main() {
let mut items = vec![
Element::default(),
Element::default(),
Element::default(),
Element::default(),
];
for x in &mut items {
let value = get_data(x);
x.0
.entry("some_additional_key".to_string())
.or_insert(value);
}
}
Use a type that allows for specific expansion. Option allows for a value to be present or not:
#[derive(Debug, Default)]
struct Element {
some_additional_key: Option<u8>,
}
fn get_data(_: &Element) -> u8 {
42
}
fn main() {
let mut items = vec![
Element::default(),
Element::default(),
Element::default(),
Element::default(),
];
for x in &mut items {
let value = get_data(x);
x.some_additional_key = Some(value);
}
}
Use composition. Create a new type that wraps your existing type:
#[derive(Debug, Default)]
struct Element;
#[derive(Debug)]
struct EnhancedElement {
element: Element,
some_additional_key: u8,
}
fn get_data(_: &Element) -> u8 {
42
}
fn main() {
let items = vec![
Element::default(),
Element::default(),
Element::default(),
Element::default(),
];
let enhanced: Vec<_> = items
.into_iter()
.map(|element| {
let some_additional_key = get_data(&element);
EnhancedElement {
element,
some_additional_key,
}
})
.collect();
}
See also:
How to lookup from and insert into a HashMap efficiently?
Update value in mutable HashMap

How can you allocate a raw mutable pointer in stable Rust?

I was trying to build a naive implementation of a custom String-like struct with small string optimization. Now that unions are allowed in stable Rust, I came up with the following code:
struct Large {
capacity: usize,
buffer: *mut u8,
}
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
I can't seem to find a way how to allocate that *mut u8. Is it possible to do in stable Rust? It looks like using alloc::heap would work, but it is only available in nightly.
As of Rust 1.28, std::alloc::alloc is stable.
Here is an example which shows in general how it can be used.
use std::{
alloc::{self, Layout},
cmp, mem, ptr, slice, str,
};
// This really should **not** be copied
#[derive(Copy, Clone)]
struct Large {
capacity: usize,
buffer: *mut u8,
}
// This really should **not** be copied
#[derive(Copy, Clone, Default)]
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
impl MyString {
fn new() -> Self {
MyString {
len: 0,
container: Container {
small: Small::default(),
},
}
}
fn as_buf(&self) -> &[u8] {
unsafe {
if self.len <= 16 {
&self.container.small.0[..self.len]
} else {
slice::from_raw_parts(self.container.large.buffer, self.len)
}
}
}
pub fn as_str(&self) -> &str {
unsafe { str::from_utf8_unchecked(self.as_buf()) }
}
// Not actually UTF-8 safe!
fn push(&mut self, c: u8) {
unsafe {
use cmp::Ordering::*;
match self.len.cmp(&16) {
Less => {
self.container.small.0[self.len] = c;
}
Equal => {
let capacity = 17;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::alloc(layout);
{
let buf = self.as_buf();
ptr::copy_nonoverlapping(buf.as_ptr(), buffer, buf.len());
}
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
Greater => {
let Large {
mut capacity,
buffer,
} = self.container.large;
capacity += 1;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::realloc(buffer, layout, capacity);
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
}
self.len += 1;
}
}
}
impl Drop for MyString {
fn drop(&mut self) {
unsafe {
if self.len > 16 {
let Large { capacity, buffer } = self.container.large;
let layout =
Layout::from_size_align(capacity, mem::align_of::<u8>()).expect("Bad layout");
alloc::dealloc(buffer, layout);
}
}
}
}
fn main() {
let mut s = MyString::new();
for _ in 0..32 {
s.push(b'a');
println!("{}", s.as_str());
}
}
I believe this code to be correct with respect to allocations, but not for anything else. Like all unsafe code, verify it yourself. It's also completely inefficient as it reallocates for every additional character.
If you'd like to allocate a collection of u8 instead of a single u8, you can create a Vec and then convert it into the constituent pieces, such as by calling as_mut_ptr:
use std::mem;
fn main() {
let mut foo = vec![0; 1024]; // or Vec::<u8>::with_capacity(1024);
let ptr = foo.as_mut_ptr();
let cap = foo.capacity();
let len = foo.len();
mem::forget(foo); // Avoid calling the destructor!
let foo_again = unsafe { Vec::from_raw_parts(ptr, len, cap) }; // Rebuild it to drop it
// Do *NOT* use `ptr` / `cap` / `len` anymore
}
Re allocating is a bit of a pain though; you'd have to convert back to a Vec and do the whole dance forwards and backwards
That being said, your Large struct seems to be missing a length, which would be distinct from capacity. You could just use a Vec instead of writing it out. I see now it's up a bit in the hierarchy.
I wonder if having a full String wouldn't be a lot easier, even if it were a bit less efficient in that the length is double-counted...
union Container {
large: String,
small: Small,
}
See also:
What is the right way to allocate data to pass to an FFI call?
How do I use the Rust memory allocator for a C library that can be provided an allocator?
What about Box::into_raw()?
struct TypeMatches(*mut u8);
TypeMatches(Box::into_raw(Box::new(0u8)));
But it's difficult to tell from your code snippet if this is what you really need. You probably want a real allocator, and you could use libc::malloc with an as cast, as in this example.
There's a memalloc crate which provides a stable allocation API. It's implemented by allocating memory with Vec::with_capacity, then extracting the pointer:
let vec = Vec::with_capacity(cap);
let ptr = buf.as_mut_ptr();
mem::forget(vec);
To free the memory, use Vec::from_raw_parts.

Resources