"Converting" unordered data structure to 2D HashMap

"Converting" unordered data structure to 2D HashMap - rust

I have this datastructure in Rust:
objects: [Object { ob: 1, id: 0 },
Object { ob: 1, id: 1 },
Object { ob: 0, id: 3 },
Object { ob: 0, id: 2 },
...
]
I'm using a HashMap where i can use the two keys "ob" and "id" to reference to data that im gonna store in this 2D HashMap. To create a 2D HashMap, i'm using the code found in this answer: How to implement HashMap with two keys?.
In the code below, i'm trying to firstly create all the structs (aCustomStructOfMine), and store them in the table with their belonging two keys, "ob" and "id". Then later in the code (not written yet), i will be accessing the structs iteratively and will be editing them and using their at-the-moment stored data.
fn analyse(objects: Vec<Object>) {
let mut table = Table::new();
for object in objects {
let ob = &object.ob;
let id = &object.id;
table.set(A(ob), B(id), aCustomStructOfMine::new());
}
}
I'm getting the following error:
borrowed value does not live long enough
assignment requires that `object.id` is borrowed for `'static`
How can i achieve the desired 2D HashMap from the given objects variable.
I'm new to rust, and although i understand the basic principles of ownership, borrowing, lifetime and more, it's really hard to actually code with these concepts in mind.

I personally think you are getting confused by the other post and make this way too complicated.
If two types KeyA and KeyB implement Hash, then the tuple (KeyA, KeyB) automatically does so as well. There is no need to create a fancy new Table type that can take two keys. Just use a tuple.
There are some other remarks concerning ownership I'd like to make, but I think they are best annotated in the source code.
Here is a working version of what you intended, I think:
use std::collections::HashMap;
// This is just for demonstration. Use your own types.
// I used `String` because it already implements `Eq` and `Hash`.
type KeyA = String;
type KeyB = String;
// This is an example type that you should replace with your own
// data type in the HashMap
pub struct MyData {}
// Some generic input type, to stick as close to your example as possible
pub struct Object {
a: KeyA,
b: KeyB,
}
// The analyze function, as in your example.
// Note that we take `objects` by value, not by reference.
// That means we own it and can do whatever we want with it.
// This is important because you never mentioned that the Keys are supposed to be
// `Clone` or `Copy`.
pub fn analyse(objects: Vec<Object>) -> HashMap<(KeyA, KeyB), MyData> {
// No need to use fancy new types for your table.
// Use use a tuple (KeyA, KeyB) as the key.
// If both KeyA and KeyB implement `Hash`, then the
// tuple implements it as automatically well.
let mut table = HashMap::new();
// Note that I wrote `in objects` and not `in &objects`, meaning this loop actually
// consumes and destroys `objects`, extracts `object` out of it and gives you a fully owned
// `object`.
for object in objects {
// This again is not a borrow, but a move.
// The `object.a` gets moved out of `object` and into `key_a`.
// No need to reference here, `object` doesn't get used any further
// anyway. And for inserting it into a `HashMap`, we need owned keys anyway.
//
// I think this was the main point you were struggling with in the original code,
// that you used references here.
let key_a = object.a;
let key_b = object.b;
// Just create the key tuple and insert
table.insert((key_a, key_b), MyData {});
}
// Return the table
table
}

Related

Why are the values in a given HashMap mutable when I don't believe I have explicitly declared them as such?

I wrote a working solution to the third proposed exercise in section 8.3 of The Book, but some of the behavior defies my intuition. Specifically, it appears that I'm able to mutate a vector that appears to be instantiated as immutable.
I've included the portions of the code that I believe are relevant, eliding the portions of code that do not interact with the Vecs stored in the HashMap.
I've engaged in some speculation after the code block, but I could really use a more sure-footed explanation of what's actually happening.
// Start by declaring a new HashMap. This is where the mystery begins for me.
let mut departments: HashMap<String, Vec<String>> = HashMap::new();
// This code deals with getting user input from stdin and has been elided
// ...
// Match the first string slice
match args[0] {
// Add an employee to a department in the HashMap
"add" => {
// POINT OF INTEREST 1
// Adding a department that isn't already in the HashMap will cause a new empty
// vector to be inserted at the newly created key
let list = departments.entry(args[2].to_string()).or_insert(Vec::new());
// In any case, we insert the employee's name into the vector associated with
// whatever department we got from stdin
list.push(args[1].to_string());
}
// List employees
"who" => match args[1] {
// List all employees in every department, employees shown in alphabetical order
"all" => {
for (department, employees) in &mut departments {
// POINT OF INTEREST 2
// Why am I allowed to sort this vector? The artifact underlying the reference
// was never explicitly made mutable.
employees.sort();
for ee in employees {
println!("{}: {}", department, ee);
}
}
}
// List all employees in a given department, employees shown in alphabetical order
dept => {
let employees = departments.get_mut(dept);
match employees {
Some(entries) => {
// POINT OF INTEREST 3
// This one is seems the least mysterious to me, since I get the value
// of the HashMap at `dept` through `.get_mut()`.
println!("{}:", dept);
entries.sort();
for ee in entries {
println!("\t{}", ee);
}
}
_ => (),
}
}
}
}
Hypothesis 1: At POINT OF INTEREST 1, my call to .or_insert() returns a mutable reference to a new vector, and this is why later calls to .sort() on values in the HashMap work.
This does not seem like the likely answer! In the beginning, I declared departments to be of the type HashMap<String, Vec<String>>, not HashMap<String, &mut Vec<String>>.
Hypothesis 2: When I declare departments as mutable, its keys and values inherit that mutability. This also seems unlikely, as nothing in my (very limited) experience has suggested such a thing is a feature of Rust. I also like to think that if that were explicitly stated in the first 8 chapters of The Book it would have caught my attention, but I've been known to skim over important details before.

for (department, employees) in &mut departments {
The for loop leverages this IntoIter implementation:
impl<'a, K, V, S> IntoIterator for &'a mut HashMap<K, V, S> {
type Item = (&'a K, &'a mut V);
}
Because of this implementation, when you iterate over a &mut HashMap<K, V> you get back tuples of (&K, &mut V). Notice that the keys are borrowed immutably while the values are mutable. This lets you modify the values, because employees is of type &mut Vec<String>.
Why is the map able to return mutable references? The map owns both the keys and values, so it can return mutable references to either of them if it wants to. That's what being an owner means: you can let others mutably borrow your objects if you so wish.
HashMap is happy to let you mutate values because that won't affect the data structure. It doesn't let you modify the keys because that would change their hashes and invalidate where they're stored in the hash table. HashMap could return &mut K. The borrow checker wouldn't stop it. It doesn't, though, because then callers could corrupt the hash map.

struct with reference to element of a vector in another field

I have the below example where I want a struct which holds a vector of data in one field, and has another field which contains the currently selected field. My understanding is that this is not possible in rust because I could remove the element in tables which selected points to, thereby creating a dangling pointer (can't borrow mut when an immutable borrow exists). The obvious workaround is to instead store a usize index for the element, rather than a &'a String. But this means I need to update the index if I remove an element from tables. Is there any way to avoid this using smart pointers, or just any better solutions in general? I've looked at other questions but they are not quite the same as below, and have extra information which makes them harder to follow for a beginner like myself, whereas below is a very minimal example.
struct Data<'a> {
selected: &'a String,
tables: Vec<String>,
}
fn main() {
let tables = vec!["table1".to_string(), "table2".to_string()];
let my_stuff = Data {
selected: &tables[0],
tables: tables,
};
}

You quite rightfully assessed that the way you wrote it is not possible, because Rust guarantees memory safety and storing it as a reference would give the possibility to create a dangling pointer.
There are several solutions that I could see here.
Static Strings
This of course only works if you store compile-time static strings.
struct Data {
selected: &'static str,
tables: Vec<&'static str>,
}
fn main() {
let tables = vec!["table1", "table2"];
let my_stuff = Data {
selected: &tables[0],
tables,
};
}
The reason this works is because static strings are non-mutable and guaranteed to never be deallocated. Also, in case this is confusing, I recommend reading up on the differences between Strings and str slices.
You can even go one further and reduce the lifetime down to 'a. But then, you have to store them as &'a str in the vector, to ensure they cannot be edited.
But that then allows you to store Strings in them, as long as the strings can be borrowed for the entire lifetime of the Data object.
struct Data<'a> {
selected: &'a str,
tables: Vec<&'a str>,
}
fn main() {
let str1 = "table1".to_string();
let str2 = "table2".to_string();
let tables = vec![str1.as_str(), str2.as_str()];
let my_stuff = Data {
selected: &tables[0],
tables,
};
}
Reference counting smart pointers
Depending your situation, there are several types that are recommended:
Rc<...> - if your data is immutable. Otherwise, you need to create interior mutability with:
Rc<Cell<...>> - safest and best solution IF your problem is single-threaded and deals with simple data types
Rc<RefCell<...>> - for more complex data types that have to be updated in-place and can't just be moved in and out
Arc<Mutex<...>> - as soon as your problem stretches over multiple threads
In your case, the data is in fact simple and your program is single-threaded, so I'd go with:
use std::{cell::Cell, rc::Rc};
struct Data {
selected: Rc<Cell<String>>,
tables: Vec<Rc<Cell<String>>>,
}
fn main() {
let tables = vec![
Rc::new(Cell::new("table1".to_string())),
Rc::new(Cell::new("table2".to_string())),
];
let my_stuff = Data {
selected: tables[0].clone(),
tables,
};
}
Of course, if you don't want to modify your strings after creation, you could go with:
use std::rc::Rc;
struct Data {
selected: Rc<String>,
tables: Vec<Rc<String>>,
}
fn main() {
let tables = vec![Rc::new("table1".to_string()), Rc::new("table2".to_string())];
let my_stuff = Data {
selected: tables[0].clone(),
tables,
};
}
Hiding the data structure and using an index
As you already mentioned, you could use an index instead. Then you would have to hide the vector and provide getters/setters/modifiers to make sure the index is kept in sync when the vector changes.
I'll keep the implementation up to the reader and won't provide an example here :)
I hope this helped already, or at least gave you a couple of new ideas. I'm happy to see new people come to the community, so feel free to ask further questions if you have any :)

Initializing a struct field-by-field. Is it possible to know if all the fields were initialized?

I'm following the example from the official documentation. I'll copy the code here for simplicity:
#[derive(Debug, PartialEq)]
pub struct Foo {
name: String,
list: Vec<u8>,
}
let foo = {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
// Initializing the `name` field
// Using `write` instead of assignment via `=` to not call `drop` on the
// old, uninitialized value.
unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }
// Initializing the `list` field
// If there is a panic here, then the `String` in the `name` field leaks.
unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }
// All the fields are initialized, so we call `assume_init` to get an initialized Foo.
unsafe { uninit.assume_init() }
};
What bothers me is the second unsafe comment: If there is a panic here, then the String in the name field leaks. This is exactly what I want to avoid. I modified the example so now it reflects my concerns:
use std::mem::MaybeUninit;
use std::ptr::addr_of_mut;
#[derive(Debug, PartialEq)]
pub struct Foo {
name: String,
list: Vec<u8>,
}
#[allow(dead_code)]
fn main() {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
init_foo(ptr);
// this is wrong because it tries to read the uninitialized field
// I could avoid this call if the function `init_foo` returns a `Result`
// but I'd like to know which fields are initialized so I can cleanup
let _foo = unsafe { uninit.assume_init() };
}
fn init_foo(foo_ptr: *mut Foo) {
unsafe { addr_of_mut!((*foo_ptr).name).write("Bob".to_string()); }
// something happened and `list` field left uninitialized
return;
}
The code builds and runs. But using MIRI I see the error:
Undefined Behavior: type validation failed at .value.list.buf.ptr.pointer: encountered uninitialized raw pointer
The question is how I can figure out which fields are initialized and which are not? Sure, I could return a result with the list of field names or similar, for example. But I don't want to do it - my struct can have dozens of fields, it changes over time and I'm too lazy to maintain an enum that should reflect the fields. Ideally I'd like to have something like this:
if addr_initialized!((*ptr).name) {
clean(addr_of_mut!((*ptr).name));
}
Update: Here's an example of what I want to achieve. I'm doing some Vulkan programming (with ash crate, but that's not important). I want to create a struct that holds all the necessary objects, like Device, Instance, Surface, etc.:
struct VulkanData {
pub instance: Instance,
pub device: Device,
pub surface: Surface,
// 100500 other fields
}
fn init() -> Result<VulkanData, Error> {
// let vulkan_data = VulkanData{}; // can't do that because some fields are not default constructible.
let instance = create_instance(); // can fail
let device = create_device(instance); // can fail, in this case instance have to be destroyed
let surface = create_surface(device); // can fail, in this case instance and device have to be destroyed
//other initialization routines
VulkanData{instance, device, surface, ...}
}
As you can see, for every such object, there's a corresponding create_x function, which can fail. Obviously, if I fail in the middle of the process, I don't want to proceed. But I want to clear already created objects. As you mentioned, I could create a wrapper. But it's very tedious work to create wrappers for hundreds of types, I absolutely want to avoid this (btw, ash is already a wrapper over C-types). Moreover, because of the asynchronous nature of CPU-GPU communication, sometimes it makes no sense to drop an object, it can lead to errors. Instead, some form of a signal should come from the GPU that indicates that an object is safe to destroy. That's the main reason why I can't implement Drop for the wrappers.
But as soon as the struct is successfully initialized I know that it's safe to read any of its fields. That's why don't want to use an Option - it adds some overhead and makes no sense in my particular example.
All that is trivially achievable in C++ - create an uninitialized struct (well, by default all Vulkan objects are initialized with VK_NULL_HANDLE), start to fill it field-by-field, if something went wrong just destroy the objects that are not null.

There is no general purpose way to tell if something is initialized or not. Miri can detect this because it adds a lot of instrumentation and overhead to track memory operations.
All that is trivially achievable in C++ - create an uninitialized struct (well, by default all Vulkan objects are initialized with VK_NULL_HANDLE), start to fill it field-by-field, if something went wrong just destroy the objects that are not null.
You could theoretically do the same in Rust, however this is quite unsafe and makes a lot of assumptions about the construction of the ash types.
If the functions didn't depend on each other, I might suggest something like this:
let instance = create_instance();
let device = create_device();
let surface = create_surface();
match (instance, device, surface) {
(Ok(instance), Ok(device), Ok(surface)) => {
Ok(VulkanData{
instance,
device,
surface,
})
}
instance, device, surface {
// clean up the `Ok` ones and return some error
}
}
However, your functions are dependent on others succeeding (e.g. need the Instance to create a Device) and this also has the disadvantage that it would keep creating values when one already failed.
Creating wrappers with custom drop behavior is the most robust way to accomplish this. There is the vulkano crate that is built on top of ash that does this among other things. But if that's not to your liking you can use something like scopeguard to encapsulate drop logic on the fly.
use scopeguard::{guard, ScopeGuard}; // 1.1.0
fn init() -> Result<VulkanData, Error> {
let instance = guard(create_instance()?, destroy_instance);
let device = guard(create_device(&instance)?, destroy_device);
let surface = guard(create_surface(&device)?, destroy_surface);
Ok(VulkanData {
// use `into_inner` to escape the drop behavior
instance: ScopeGuard::into_inner(instance),
device: ScopeGuard::into_inner(device),
surface: ScopeGuard::into_inner(surface),
})
}
See a full example on the playground. No unsafe required.

I believe MaybeUninit is designed for the cases when you have all the information about its contents and can make the code safe "by hand".
If you need to figure out in runtime if a field has a value, then use Option<T>.
Per the documentation:
You can think of MaybeUninit<T> as being a bit like Option<T> but without any of the run-time tracking and without any of the safety checks.

How to deal with Result<T,E> inside the closure function in Rust?

I have a nested object in mongodb that I want to deserialize/decode back into a struct.
Here's the structure :
pub struct ItemUnit {
/// unit's name like PCS, DOZEN, PACK, etc
pub name: String,
/// denote this unit value
pub multiplier: f64,
/// item's price in this unit
pub price: Option<f64>,
}
pub struct Item {
/// item's name
pub name: String,
/// available unit types available for this item, in an array format
pub units: Vec<ItemUnit>,
}
as you can see the struct is nested and units is an array (Vec<ItemUnit>).
As Rust have a culture of "not letting you go with error possibility", this quickly becomes tricky. In Rust mongodb driver, you have to deserialize/decode them back from 'bson' document into "native type" such as String, f64, and so on. Each deserialize operation returns Result<> since there's a possibility of error in it.
I have a problem with nested object with an array of another object in it:
// doc is the data from a single `Item`
// since `get_array` might throw a ValueAccess error, I map it into my own custom Error type
// so `units_bson` is now a `Vec<mongodb::bson::Bson>` type
let units_bson = doc.get_array("units").map_err(Self::map_mongodb_err)?;
// in the next step, I need to decode the mongodb::bson::Bson for each element in this array
/// here's I use map() with closure
let units: Vec<ItemUnit> = units_bson.iter().map(|u| {
// get_str() returns Result<T,E> but since ItemUnit only accepts a string not Result<> type
// I had to handle the error possibilities here in the closure/map function
// but the ? operator only available on a function that returns `Result` or `Option`
// (or another type that implements `Try`)
let name = u.as_document().unwrap().get_str("name").map_err(Self::map_mongodb_err);
return ItemUnit { name, multiplier, p_buy };
}).collect();
So my question is how do you catch an error inside a closure?
Or is there any other workaround like try-catch block that can also catch any error inside a closure ?

The first thing you need to do is propagate the error into the Vec.
units_bson.iter().map(|u| {
// This question-mark returns from the closure, not the whole function.
let name = code_and_stuff.map_err(...)?;
// We're returning Result values now, so wrap the value in Ok.
Ok(ItemUnit { name, ... })
});
At this point, we have an iterator over values of type Result<ItemUnit, SomeError>. If we were to call collect now, naively we might expect to get a Vec<Result<ItemUnit, SomeError>>, and indeed that is one correct result. But we can also request a Result<Vec<ItemUnit>, SomeError>, because Result<A, E> has a FromIterator instance for this exact use case.
units_bson.iter().map(|u| {
// This question-mark returns from the closure, not the whole function.
let name = code_and_stuff.map_err(...)?;
// We're returning Result values now, so wrap the value in Ok.
Ok(ItemUnit { name, ... })
}).collect::<Result<Vec<_>, _>>()?;
Then, with one more question mark after our collect call, we propagate the error to our outer function's result.
Welcome to the wonderful world of Rust error values. I promise this sort of thing becomes second nature the more you do it; you won't even notice yourself adding the little question marks in a handful of places.

How can I simultaneously iterate over a Rust HashMap and modify some of its values?

I'm trying Advent of Code in Rust this year, as a way of learning the language. I've parsed the input (from day 7) into the following structure:
struct Process {
name: String,
weight: u32,
children: Vec<String>,
parent: Option<String>
}
These are stored in a HashMap<String, Process>. Now I want to iterate over the values in the map and update the parent values, based on what I find in the parent's "children" vector.
What doesn't work is
for p in self.processes.values() {
for child_name in p.children {
let mut child = self.processes.get_mut(child_name).expect("Child not found.");
child.parent = p.name;
}
}
I can't have both a mutable reference to the HashMap (self.processes) and a non-mutable reference, or two mutable references.
So, what is the most idiomatic way to accomplish this in Rust? The two options I can see are:
Copy the parent/child relationships into a new temporary data structure in one pass, and then update the Process structs in a second pass, after the immutable reference is out of scope.
Change my data structure to put "parent" in its own HashMap.
Is there a third option?

Yes, you can grant internal mutability to the HashMap's values using RefCell:
struct ProcessTree {
processes: HashMap<String, RefCell<Process>>, // change #1
}
impl ProcessTree {
fn update_parents(&self) {
for p in self.processes.values() {
let p = p.borrow(); // change #2
for child_name in &p.children {
let mut child = self.processes
.get(child_name) // change #3
.expect("Child not found.")
.borrow_mut(); // change #4
child.parent = Some(p.name.clone());
}
}
}
}
borrow_mut will panic at runtime if the child is already borrowed with borrow. This happens if a process is its own parent (which should presumably never happen, but in a more robust program you'd want to give a meaningful error message instead of just panicking).
I invented some names and made a few small changes (besides the ones specifically indicated) to make this code compile. Notably, p.name.clone() makes a full copy of p.name. This is necessary because both name and parent are owned Strings.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

"Converting" unordered data structure to 2D HashMap - rust

Related

Why are the values in a given HashMap mutable when I don't believe I have explicitly declared them as such?

struct with reference to element of a vector in another field

Initializing a struct field-by-field. Is it possible to know if all the fields were initialized?

How to deal with Result<T,E> inside the closure function in Rust?

How can I simultaneously iterate over a Rust HashMap and modify some of its values?

Categories

Resources