Adding a new key to each element of vector in Rust - rust

I have a vector of structures. I want to add one additional field to each element. What's the best way to do that?
Something like this:
// Pseudo code
let items = vec![elem1, elem2, elem3, elem4];
for x in items {
// Something like this
x["some_additional_key"] = get_data(x);
}
//
// Now I have items[i].some_additional_key in each element

Rust is a statically-typed language; you may be familiar with other similar languages like C++, Java or Swift. In these languages, the members, types, and layout of a struct are fixed when the program is compiled.
Because of this, there's no way to add a new struct field at runtime — no "ifs", "ands", or "buts" — you can't do it.
Instead, you have to model that dynamic nature some other way:
Use a type that allows for arbitrary expansion. HashMap and BTreeMap (and many other similar types) allow you to have an arbitrary number of key-value pairs. Under the hood, this is basically how many dynamic languages work - a mapping of strings to arbitrary values:
use std::collections::HashMap;
#[derive(Debug, Default)]
struct Element(HashMap<String, u8>);
fn get_data(_: &Element) -> u8 {
42
}
fn main() {
let mut items = vec![
Element::default(),
Element::default(),
Element::default(),
Element::default(),
];
for x in &mut items {
let value = get_data(x);
x.0
.entry("some_additional_key".to_string())
.or_insert(value);
}
}
Use a type that allows for specific expansion. Option allows for a value to be present or not:
#[derive(Debug, Default)]
struct Element {
some_additional_key: Option<u8>,
}
fn get_data(_: &Element) -> u8 {
42
}
fn main() {
let mut items = vec![
Element::default(),
Element::default(),
Element::default(),
Element::default(),
];
for x in &mut items {
let value = get_data(x);
x.some_additional_key = Some(value);
}
}
Use composition. Create a new type that wraps your existing type:
#[derive(Debug, Default)]
struct Element;
#[derive(Debug)]
struct EnhancedElement {
element: Element,
some_additional_key: u8,
}
fn get_data(_: &Element) -> u8 {
42
}
fn main() {
let items = vec![
Element::default(),
Element::default(),
Element::default(),
Element::default(),
];
let enhanced: Vec<_> = items
.into_iter()
.map(|element| {
let some_additional_key = get_data(&element);
EnhancedElement {
element,
some_additional_key,
}
})
.collect();
}
See also:
How to lookup from and insert into a HashMap efficiently?
Update value in mutable HashMap

Related

Prehash a struct

I have struct with many fields that I want to use as a key in a HashMap.
I often need to use the struct multiple times to access different HashMaps and I don't want to compute the hash and possible clone each time as the program needs to be as performant as possible and it will be accessing the HashMaps a lot (billions of times so the time really stacks up).
Here is a simplified example:
use std::collections::HashMap;
#[derive(Hash, Eq, PartialEq, Clone)]
struct KeyStruct {
field1: usize,
field2: bool,
}
fn main() {
// This is what I'm doing now
let key = KeyStruct { field1: 1, field2: true };
// This is what I'd like to do
// let key = key.get_hash()
let mut map1 = HashMap::new();
let mut map2 = HashMap::new();
let mut map3 = HashMap::new();
let mut map4 = HashMap::new();
if !map1.contains_key(&key) {
map1.insert(key.clone(), 1);
}
if !map2.contains_key(&key) {
map2.insert(key.clone(), 2);
}
if !map3.contains_key(&key) {
map3.insert(key.clone(), 3);
}
if !map4.contains_key(&key) {
map4.insert(key.clone(), 4);
}
}
I never actually use the values in the KeyStruct, I just want to use it as a key to the HashMaps. I would like to avoid hashing it multiple times and cloning it like is done in that example.

Initialize a Vec with not-None values only

If I have variables like this:
let a: u32 = ...;
let b: Option<u32> = ...;
let c: u32 = ...;
, what is the shortest way to make a vector of those values, so that b is only included if it's Some?
In other words, is there something simpler than this:
let v = match b {
None => vec![a, c],
Some(x) => vec![a, x, c],
};
P.S. I would prefer a solution where we don't need to use the variables more than once. Consider this example:
let some_person: String = ...;
let best_man: Option<String> = ...;
let a_third_person: &str = ...;
let another_opt: Option<String> = ...;
...
As can be seen, we might have to use longer variable names, more than one Option (None), expressions (like a_third_person.to_string()), etc.
Yours is fine, but here's a sophisticated one:
[Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>()
This works since Option impls IntoIterator.
If it depends on just one variable:
b.map(|b| vec![a, b, c]).unwrap_or_else(|| vec![a, c]);
Playground
After some thinking and investigating, I've come with the following crazy thing.
The end goal is to have a macro, optional_vec![], that you can pass it either T or Option<T> and it should behave like described in the question. However, I decided on a strong restriction: it should have the best performance possible. So, you write:
optional_vec![a, b, c]
And get at least the performance of hand-written match, if not more. This forbids the use of the simple [Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>(), suggested in my other answer (though even this solution needs some way to differentiate between Option<T> and just T, which, like we'll see, is not an easy problem at all).
I will first warn that I've not found a way to make my macro work with Option. That is, if you want to build a vector of Option<T> from Option<T> and Option<Option<T>>, it will not work.
When a design a complex macro, I like to think first how the expanded code will look like. And in this macro, we have several hard problems to solve.
First, the macro take plain expressions. But somehow, it needs to switch on their type being T or Option<T>. How should such thing be done?
The feature we use to do such things is specialization.
#![feature(specialization)]
pub trait Optional {
fn some_method(self);
}
impl<T> Optional for T {
default fn some_method(self) {
// Just T
}
}
impl<T> Optional for Option<T> {
fn some_method(self) {
// Option<T>
}
}
Like you probably noticed, now we have two problems: first, specialization is unstable, and I'd like to stay with stable. Second, what should be inside the trait? The second problem is easier to solve, so let's begin with it.
Turns out that the most performant way to do the pushing to the vector is to pre-allocate capacity (Vec::with_capacity), write to the vector by using pointers (don't push(), it optimizes badly!) then set the length (Vec::set_len()).
We can get a pointer to the internal buffer of the vector using Vec::as_mut_ptr(), and advance the pointer via <*mut T>::add(1).
So, we need two methods: one to hint us about the capacity (can be zero for None or one for Some() and non-Option elements), and a write_and_advance() method:
pub trait Optional {
type Item;
fn len(&self) -> usize;
unsafe fn write_and_advance(self, place: &mut *mut Self::Item);
}
impl<T> Optional for T {
default type Item = Self;
default fn len(&self) -> usize { 1 }
default unsafe fn write_and_advance(self, place: &mut *mut Self) {
place.write(self);
*place = place.add(1);
}
}
impl<T> Optional<T> for Option<T> {
type Item = T;
fn len(&self) -> usize { self.is_some() as usize }
unsafe fn write_and_advance(self, place: &mut *mut T) {
if let Some(value) = self {
place.write(value);
*place = place.add(1);
}
}
}
It doesn't even compile! For the why, see Mismatch between associated type and type parameter only when impl is marked `default`. Luckily for us, the trick we'll use to workaround specialization not being stable does work in this situation. But for now, let's assume it works. How will the code using this trait look like?
match (a, b, c) { // The match is here because it's the best binding for liftimes: see https://stackoverflow.com/a/54855986/7884305
(a, b, c) => {
let len = Optional::len(&a) + Optional::len(&b) + Optional::len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
Optional::write_and_advance(a, &mut next_element);
Optional::write_and_advance(b, &mut next_element);
Optional::write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
And it works! Except that it does not, because the specialization does not compile as I said, and we also want to not repeat all of this boilerplate but insert it into a macro.
So, how do we solve the problems with specialization: being unstable and not working?
dtonlay has a very cool trick he calls autoref specialization (BTW, all of this repo is a very recommended reading!). This is a trick that can be used to emulate specialization. It works only in macros, but we're in a macro so this is fine.
I will not elaborate about the trick here (I recommend to read his post; he also used this trick in the excellent and very widely used anyhow crate). In short, the idea is to trick the typechecker by implementing a trait for T under certain conditions (the specialized impl) and other trait for &T for the general case (this could be inherent impl if not coherence). Since Rust performs automatic referencing during method resolution, that is take reference to the receiver as needed, this will work - the typechecker will autoref if needed, and will stop in the first applicable impl - i.e. the specialized impl if it matches, or the general impl otherwise.
Here's an example:
use std::fmt;
pub trait Display {
fn foo(&self);
}
// Level 1
impl<T: fmt::Display> Display for T {
fn foo(&self) { println!("Display({}), {}", std::any::type_name::<T>(), self); }
}
pub trait Debug {
fn foo(&self);
}
// Level 2
impl<T: fmt::Debug> Debug for &T {
fn foo(&self) { println!("Debug({}), {:?}", std::any::type_name::<T>(), self); }
}
macro_rules! foo {
($e:expr) => ((&$e).foo());
}
Playground.
We can use this trick in our case:
#[doc(hidden)]
pub mod autoref_specialization {
#[derive(Copy, Clone)]
pub struct OptionTag;
pub trait OptionKind {
fn optional_kind(&self) -> OptionTag;
}
impl<T> OptionKind for Option<T> {
#[inline(always)]
fn optional_kind(&self) -> OptionTag { OptionTag }
}
impl OptionTag {
#[inline(always)]
pub fn len<T>(self, this: &Option<T>) -> usize { this.is_some() as usize }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: Option<T>, place: &mut *mut T) {
if let Some(value) = this {
place.write(value);
*place = place.add(1);
}
}
}
#[derive(Copy, Clone)]
pub struct DefaultTag;
pub trait DefaultKind {
fn optional_kind(&self) -> DefaultTag;
}
impl<T> DefaultKind for &'_ T {
#[inline(always)]
fn optional_kind(&self) -> DefaultTag { DefaultTag }
}
impl DefaultTag {
#[inline(always)]
pub fn len<T>(self, _this: &T) -> usize { 1 }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: T, place: &mut *mut T) {
place.write(this);
*place = place.add(1);
}
}
}
And the expanded code will look like:
use autoref_specialization::{DefaultKind as _, OptionKind as _};
match (a, b, c) {
(a, b, c) => {
let (a_tag, b_tag, c_tag) = (
(&a).optional_kind(),
(&b).optional_kind(),
(&c).optional_kind(),
);
let len = a_tag.len(&a) + b_tag.len(&b) + c_tag.len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
a_tag.write_and_advance(a, &mut next_element);
b_tag.write_and_advance(b, &mut next_element);
c_tag.write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
It may be tempting to try to convert this immediately into a macro, but we still have one unsolved problem: our macro need to generate identifiers. This may not be obvious, but what if we pass optional_vec![1, Some(2), 3]? We need to generate the bindings for the match (in our case, (a, b, c) => ...) and the tag names ((a_tag, b_tag, c_tag)).
Unfortunately, generating names is not something macro_rules! can do in today's Rust. Fortunately, there is an excellent crate paste (another one from dtonlay!) that is a small proc-macro that allows you to do that. It is even available on the playground!
However, we need a series of identifiers. That can be done with tt-munching, by repeatedly adding some letter (I used a), so you get a, aa, aaa, ... you get the idea.
#[doc(hidden)]
pub mod reexports {
pub use std::vec::Vec;
pub use paste::paste;
}
#[macro_export]
macro_rules! optional_vec {
// Empty case
{ #generate_idents
exprs = []
processed_exprs = [$($e:expr,)*]
match_bindings = [$($binding:ident)*]
tags = [$($tag:ident)*]
} => {{
use $crate::autoref_specialization::{DefaultKind as _, OptionKind as _};
match ($($e,)*) {
($($binding,)*) => {
let ($($tag,)*) = (
$((&$binding).optional_kind(),)*
);
let len = 0 $(+ $tag.len(&$binding))*;
let mut result = $crate::reexports::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
$($tag.write_and_advance($binding, &mut next_element);)*
result.set_len(len);
}
result
}
}
}};
{ #generate_idents
exprs = [$e:expr, $($rest:expr,)*]
processed_exprs = [$($processed_exprs:tt)*]
match_bindings = [$first_binding:ident $($bindings:ident)*]
tags = [$($tags:ident)*]
} => {
$crate::reexports::paste! {
$crate::optional_vec! { #generate_idents
exprs = [$($rest,)*]
processed_exprs = [$($processed_exprs)* $e,]
match_bindings = [
[< $first_binding a >]
$first_binding
$($bindings)*
]
tags = [
[< $first_binding a_tag >]
$($tags)*
]
}
}
};
// Entry
[$e:expr $(, $exprs:expr)* $(,)?] => {
$crate::optional_vec! { #generate_idents
exprs = [$($exprs,)+]
processed_exprs = [$e,]
match_bindings = [__optional_vec_a]
tags = [__optional_vec_a_tag]
}
};
}
Playground.
I can also personally recommend
let mut v = vec![a, c];
v.extend(b);
Short and clear.
Sometime the straight forward solution is the best:
fn jim_power(a: u32, b: Option<u32>, c: u32) -> Vec<u32> {
let mut acc = Vec::with_capacity(3);
acc.push(a);
if let Some(b) = b {
acc.push(b);
}
acc.push(c);
acc
}
fn ys_iii(
some_person: String,
best_man: Option<String>,
a_third_person: String,
another_opt: Option<String>,
) -> Vec<String> {
let mut acc = Vec::with_capacity(4);
acc.push(some_person);
best_man.map(|x| acc.push(x));
acc.push(a_third_person);
another_opt.map(|x| acc.push(x));
acc
}
If you don't care about the order of the values, another option is
Iterator::chain(
[a, c].into_iter(),
[b].into_iter().flatten()
).collect()
Playground

Storing an iterator for a HashMap in a struct

Edit
As it seemms from the suggested solution, What I'm trying to achieve seems impossible/Not the correct way, therefore - I'll explain the end goal here:
I am parsing the values for Foo from a YAML file using serde, and I would like to let the user get one of those stored values from the yaml at a time, this is why I wanted to store an iterator in my struct
I have two struct similar to the following:
struct Bar {
name: String,
id: u32
}
struct Foo {
my_map: HashMap<String, Bar>
}
In my Foo struct, I wish to store an iterator to my HashMap, so a user can borrow values from my map on demand.
Theoretically, the full Foo class would look something like:
struct Foo {
my_map: HashMap<String, Bar>,
my_map_iter: HashMap<String, Bar>::iterator
}
impl Foo {
fn get_pair(&self) -> Option<(String, Bar)> {
// impl...
}
}
But I can't seem to pull it off and create such a variable, no matter what I try (Various compilation errors which seems like I'm just trying to do that wrong).
I would be glad if someone can point me to the correct way to achieve that and if there is a better way to achieve what I'm trying to do - I would like to know that.
Thank you!
I am parsing the values for Foo from a YAML file using serde
When you parse them you should put the values in a Vec instead of a HashMap.
I imagine the values you have also have names which is why you thought a HashMap would be good. You could instead store them like so:
let parsed = vec![]
for _ in 0..n_to_parse {
// first item of the tuple is the name second is the value
let key_value = ("Get from", "serde");
parsed.push(key_value);
}
then once you stored it like so it will be easy to get the pairs from it by keeping track of the current index:
struct ParsedHolder {
parsed: Vec<(String, String)>,
current_idx: usize,
}
impl ParsedHolder {
fn new(parsed: Vec<(String, String)>) -> Self {
ParsedHolder {
parsed,
current_idx: 0,
}
}
fn get_pair(&mut self) -> Option<&(String, String)> {
if let Some(pair) = self.parsed.get(self.current_idx) {
self.current_idx += 1;
Some(pair)
} else {
self.current_idx = 0;
None
}
}
}
Now this could be further improved upon by using VecDeque which will allow you to efficiently take out the first element of parsed. Which will make it easy to not use clone. But this way you will be only able to go through all the parsed values once which I think is actually what you want in your use case.
But I'll let you implement VecDeque 😃
The reason why this is a hard is that unless we make sure the HashMap isn't mutated while we iterate we could get into some trouble. To make sure the HashMap is immutable until the iterator lives:
use std::collections::HashMap;
use std::collections::hash_map::Iter;
struct Foo<'a> {
my_map: &'a HashMap<u8, u8>,
iterator: Iter<'a, u8, u8>,
}
fn main() {
let my_map = HashMap::new();
let iterator = my_map.iter();
let f = Foo {
my_map: &my_map,
iterator: iterator,
};
}
If you can make sure or know that the HashMap won't have new keys or keys removed from it (editing values with existing keys is fine) then you can do this:
struct Foo {
my_map: HashMap<String, String>,
current_idx: usize,
}
impl Foo {
fn new(my_map: HashMap<String, String>) -> Self {
Foo {
my_map,
current_idx: 0,
}
}
fn get_pair(&mut self) -> Option<(&String, &String)> {
if let Some(pair) = self.my_map.iter().skip(self.current_idx).next() {
self.current_idx += 1;
Some(pair)
} else {
self.current_idx = 0;
None
}
}
fn get_pair_cloned(&mut self) -> Option<(String, String)> {
if let Some(pair) = self.my_map.iter().skip(self.current_idx).next() {
self.current_idx += 1;
Some((pair.0.clone(), pair.1.clone()))
} else {
self.current_idx = 0;
None
}
}
}
This is fairly inefficient though because we need to iterate though the keys to find the next key each time.

How can I create a fixed size array of Strings using constant generics?

I have a function using a constant generic:
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = Default::default(); //It sucks because of default arrays are specified up to 32 only
// Some code
}
How can I create a fixed size array of Strings in my case? let mut row: [String; S] = ["".to_string(), S]; doesn't work because String doesn't implement the Copy trait.
You can do it with MaybeUninit and unsafe:
use std::mem::MaybeUninit;
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = unsafe {
let mut result = MaybeUninit::uninit();
let start = result.as_mut_ptr() as *mut String;
for pos in 0 .. S {
// SAFETY: safe because loop ensures `start.add(pos)`
// is always on an array element, of type String
start.add(pos).write(String::new());
}
// SAFETY: safe because loop ensures entire array
// has been manually initialised
result.assume_init()
};
// Some code
todo!()
}
Of course, it might be easier to abstract such logic to your own trait:
use std::mem::MaybeUninit;
trait DefaultArray {
fn default_array() -> Self;
}
impl<T: Default, const S: usize> DefaultArray for [T; S] {
fn default_array() -> Self {
let mut result = MaybeUninit::uninit();
let start = result.as_mut_ptr() as *mut T;
unsafe {
for pos in 0 .. S {
// SAFETY: safe because loop ensures `start.add(pos)`
// is always on an array element, of type T
start.add(pos).write(T::default());
}
// SAFETY: safe because loop ensures entire array
// has been manually initialised
result.assume_init()
}
}
}
(The only reason for using your own trait rather than Default is that implementations of the latter would conflict with those provided in the standard library for arrays of up to 32 elements; I wholly expect the standard library to replace its implementation of Default with something similar to the above once const generics have stabilised).
In which case you would now have:
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = DefaultArray::default_array();
// Some code
todo!()
}
See it on the Playground.
As of now, there is no way to compile constant generics. As #AlexLarionov said, you can try to use procedural macros, but that approach still has its bugs and limitations.
If you need a generic that has to be a number, you can use the Num crate, or the more verbose std::num.

How can I return the combination of two borrowed RefCells?

I have a struct with two Vecs wrapped in RefCells. I want to have a method on that struct that combines the two vectors and returns them as a new RefCell or RefMut:
use std::cell::{RefCell, RefMut};
struct World {
positions: RefCell<Vec<Option<Position>>>,
velocities: RefCell<Vec<Option<Velocity>>>,
}
type Position = i32;
type Velocity = i32;
impl World {
pub fn new() -> World {
World {
positions: RefCell::new(vec![Some(1), None, Some(2)]),
velocities: RefCell::new(vec![None, None, Some(1)]),
}
}
pub fn get_pos_vel(&self) -> RefMut<Vec<(Position, Velocity)>> {
let mut poses = self.positions.borrow_mut();
let mut vels = self.velocities.borrow_mut();
poses
.iter_mut()
.zip(vels.iter_mut())
.filter(|(e1, e2)| e1.is_some() && e2.is_some())
.map(|(e1, e2)| (e1.unwrap(), e2.unwrap()))
.for_each(|elem| println!("{:?}", elem));
}
}
fn main() {
let world = World::new();
world.get_pos_vel();
}
How would I return the zipped contents of the vectors as a new RefCell? Is that possible?
I know there is RefMut::map() and I tried to nest two calls to map, but didn't succeed with that.
You want to be able to modify the positions and velocities. If these have to be stored in two separate RefCells, what about side-stepping the problem and using a callback to do the modification?
use std::cell::RefCell;
struct World {
positions: RefCell<Vec<Option<Position>>>,
velocities: RefCell<Vec<Option<Velocity>>>,
}
type Position = i32;
type Velocity = i32;
impl World {
pub fn new() -> World {
World {
positions: RefCell::new(vec![Some(1), None, Some(2)]),
velocities: RefCell::new(vec![None, None, Some(1)]),
}
}
pub fn modify_pos_vel<F: FnMut(&mut Position, &mut Velocity)>(&self, mut f: F) {
let mut poses = self.positions.borrow_mut();
let mut vels = self.velocities.borrow_mut();
poses
.iter_mut()
.zip(vels.iter_mut())
.filter_map(|pair| match pair {
(Some(e1), Some(e2)) => Some((e1, e2)),
_ => None,
})
.for_each(|pair| f(pair.0, pair.1))
}
}
fn main() {
let world = World::new();
world.modify_pos_vel(|position, velocity| {
// Some modification goes here, for example:
*position += *velocity;
});
}
If you want to return a new Vec, then you don't need to wrap it in RefMut or RefCell:
Based on your code with filter and map
pub fn get_pos_vel(&self) -> Vec<(Position, Velocity)> {
let mut poses = self.positions.borrow_mut();
let mut vels = self.velocities.borrow_mut();
poses.iter_mut()
.zip(vels.iter_mut())
.filter(|(e1, e2)| e1.is_some() && e2.is_some())
.map(|(e1, e2)| (e1.unwrap(), e2.unwrap()))
.collect()
}
Alternative with filter_map
poses.iter_mut()
.zip(vels.iter_mut())
.filter_map(|pair| match pair {
(Some(e1), Some(e2)) => Some((*e1, *e2)),
_ => None,
})
.collect()
You can wrap it in RefCell with RefCell::new, if you really want to, but I would leave it up to the user of the function to wrap it in whatever they need.

Resources