Efficiently get all items of a vector with a given id - rust

Say I have a vector of items where each item has an id, like in the example below. I can of course get all the items in the vector with a given id using something like large_vector.iter().filter(|item| item.id == given_id). However, for improved performance I can do some preprocessing and sort the vector by item id and store the bounds for each id, like in the example below. This way I can quickly access a slice of the vector for any given id. I end up doing this alot but feel like I am reinventing the wheel and needlessly opening myself up to bugs. Is there a better way to do this directly, preferably using the standard library else some other library?
use std::{collections::HashMap, ops::Range};
#[derive(Debug)]
struct Item {
id: String,
val: f64,
}
impl Item {
fn new(id: &str, val: f64) -> Item {
Item { id: id.into(), val }
}
}
fn main() {
let mut large_vector = vec![
Item::new("C", 2.21),
Item::new("A", 34.2),
Item::new("B", 23.54),
Item::new("C", 34.34),
Item::new("C", 45.21),
Item::new("B", 21.34),
];
// first sort by id
large_vector.sort_by(|item1, item2| item1.id.cmp(&item2.id));
dbg!(&large_vector);
// now create a HasMap storing bounds for each id
let mut lookup = HashMap::new();
let mut start: usize = 0;
let mut end: usize = 0;
if let Some(first_item) = large_vector.get(0) {
let mut current_id = first_item.id.clone();
// insert bound if entered new id section or is last item
for item in &large_vector {
if current_id != item.id {
lookup.insert(current_id.clone(), Range { start, end });
current_id = item.id.clone();
start = end;
}
end += 1;
}
lookup.insert(current_id.clone(), Range { start, end });
}
// test by getting the items for a given id
dbg!(&lookup);
let range = lookup.get("C").unwrap();
dbg!(range);
let items = large_vector[range.start..range.end]
.iter()
.collect::<Vec<_>>();
dbg!(items);
}
[src/main.rs:26] &large_vector = [
Item {
id: "A",
val: 34.2,
},
Item {
id: "B",
val: 23.54,
},
Item {
id: "B",
val: 21.34,
},
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]
[src/main.rs:47] &lookup = {
"A": 0..1,
"B": 1..3,
"C": 3..6,
}
[src/main.rs:49] range = 3..6
[src/main.rs:53] items = [
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]

Assuming that your items have to be in a vector, and you can only sort them, I can think of two possibilities:
The solution you proposed. It should be the fastest one for lookup, but has the drawback that the lookup tables get completely invalidated every time you insert/remove an item.
Keep the vector sorted and perform a log(n) based divide-and-conquer search to get the range. If you are interested in what I mean with that, I can provide you with some code.
But in general, I think a vector is simply the wrong data structure. I'd try to change that first.

Related

How to use map to filter value of field into an iterable of struct

I would like to filter into an iterable that contain a struct of Person with map() instruction but I don't know if it's possible on this way?
I would like to know which is the bast way to complete this task properly in Rust.
// This my structure of objet Person
#[derive(Debug, Clone)]
struct Person {
product_id: i64,
nom: String,
prenom: String,
email: Option<String>,
actif: bool,
}
// Main function which create a person and add this person into a HashMap<i64, Person>
fn main() {
// create list of person
let mut personnes = Personnes::new();
// Create one person
let person = Person {
product_id: 1,
nom: String::from("TestNom"),
prenom: String::from("TestPrenom"),
email: Some("test#mail.com".to_string()),
actif: true,
};
// Add person into my iterable
personnes.add(person);
// Add few persons...
// Imagine multiple Person and find person who actif = true
persons_actives(&personnes);
}
// Something like that :
fn persons_actives(personnes: &Personnes) {
let persons_actives = personnes
.inner
.iter()
.map(|person.actif| person.actif == true)
.collect();
}
// But it's maybe impossible in this way ?
I tried :
fn persons_actives(personnes: &Personnes) {
let persons_actives = personnes.inner.iter().filter(|person| person.actif == true).collect();
}
and :
fn persons_actives(personnes: &Personnes) {
let persons_actives = personnes.inner.iter().find(|person| person.actif == true).collect();
}
but i have the same error :
"no field actif on type &(&i64, &Person)"
You can iterate through your HashMap values, then filter() and collect() the result into a Vector.
use std::collections::HashMap;
#[derive(Debug)]
struct Person {
is_active: bool
//other params
}
fn main() {
// define your map
let mut map :HashMap<i32,Person> = HashMap::new();
// init a Person
let person: Person = Person{
is_active: true
// set other params
};
// insert it to map
map.insert(1,person);
// filter person.is_active == true
let active_persons : Vec<_> = map.values().filter(|person| person.is_active==true).collect();
// result
println!("{:?}",active_persons);
}
Now you have your desired result in active_persons.
Thanks for your reply,
I have tested this one :
let active_persons : Vec<_> = personnes.inner.values().filter(|&person| if_else!(person.actif, true, false)).collect();
with ternary-rs library for the if_else syntax.
Your right i shold have noticed the structiure of Personnes which is :
struct Personnes {
inner: HashMap<i64, Person>,
}
But i have keep your Vec<_> Structure for the active person.
I created two persons, one is active and the other not and it's worked !
Thanks a lot !

Group vector of structs by field

I want to create a vector with all of the matching field id from the struct, process that new vector and then repeat the process. Basically grouping together the structs with matching field id.
Is there a way to do this by not using the unstable feature drain_filter?
#![feature(drain_filter)]
#[derive(Debug)]
struct Person {
id: u32,
}
fn main() {
let mut people = vec![];
for p in 0..10 {
people.push(Person { id: p });
}
while !people.is_empty() {
let first_person_id = people.first().unwrap().id;
let drained: Vec<Person> = people.drain_filter(|p| p.id == first_person_id).collect();
println!("{:#?}", drained);
}
}
Playground
If you are looking to group your vector by the person id, it's likely to be more efficient using a HashMap from id to Vec<Person>, where each id hold a vector of persons. And then you can loop through the HashMap and process each vector / group. This is potentially more efficient than draining people in each iteration, which in worst case has O(N^2) time complexity while with a HashMap the time complexity is O(N).
#![feature(drain_filter)]
use std::collections::HashMap;
#[derive(Debug)]
struct Person {
id: u32,
}
fn main() {
let mut people = vec![];
let mut groups: HashMap<u32, Vec<Person>> = HashMap::new();
for p in 0..10 {
people.push(Person { id: p });
}
people.into_iter().for_each(|person| {
let group = groups.entry(person.id).or_insert(vec![]);
group.push(person);
});
for (_id, group) in groups {
println!("{:#?}", group);
}
}
Playground

How to move values out of a vector when the vector is immediately discarded?

I am receiving data in the form of a string vector, and need to populate a struct using a subset of the values, like this:
const json: &str = r#"["a", "b", "c", "d", "e", "f", "g"]"#;
struct A {
third: String,
first: String,
fifth: String,
}
fn main() {
let data: Vec<String> = serde_json::from_str(json).unwrap();
let a = A {
third: data[2],
first: data[0],
fifth: data[4],
};
}
This doesn't work because I'm moving values out of the vector. The compiler believes that this leaves data in an uninitialized state that can cause problems, but because I never use data again, it shouldn't matter.
The conventional solution is swap_remove, but it is problematic because the elements are not accessed in reverse order (assuming the structure is populated top to bottom).
I solve this now by doing a mem::replace and having data as mut, which clutters this otherwise clean code:
fn main() {
let mut data: Vec<String> = serde_json::from_str(json).unwrap();
let a = A {
third: std::mem::replace(&mut data[2], "".to_string()),
first: std::mem::replace(&mut data[0], "".to_string()),
fifth: std::mem::replace(&mut data[4], "".to_string())
};
}
Is there an alternative to this solution that doesn't require me to have all these replace calls and data unnecessarily mut?
I've been in this situation, and the cleanest solution I've found was to create an extension:
trait Extract: Default {
/// Replace self with default and returns the initial value.
fn extract(&mut self) -> Self;
}
impl<T: Default> Extract for T {
fn extract(&mut self) -> Self {
std::mem::replace(self, T::default())
}
}
And in your solution, you can replace the std::mem::replace with it:
const JSON: &str = r#"["a", "b", "c", "d", "e", "f", "g"]"#;
struct A {
third: String,
first: String,
fifth: String,
}
fn main() {
let mut data: Vec<String> = serde_json::from_str(JSON).unwrap();
let _a = A {
third: data[2].extract(),
first: data[0].extract(),
fifth: data[4].extract(),
};
}
That's basically the same code, but it is much more readable.
If you like funny things, you can even write a macro:
macro_rules! vec_destruc {
{ $v:expr => $( $n:ident : $i:expr; )+ } => {
let ( $( $n ),+ ) = {
let mut v = $v;
(
$( std::mem::replace(&mut v[$i], Default::default()) ),+
)
};
}
}
const JSON: &str = r#"["a", "b", "c", "d", "e", "f", "g"]"#;
#[derive(Debug)]
struct A {
third: String,
first: String,
fifth: String,
}
fn main() {
let data: Vec<String> = serde_json::from_str(JSON).unwrap();
vec_destruc! { data =>
first: 0;
third: 2;
fifth: 4;
};
let a = A { first, third, fifth };
println!("{:?}", a);
}
In small cases like this (also seen in naïve command line argument processing), I transfer ownership of the vector into an iterator and pop all the values off, keeping those I'm interested in:
fn main() {
let data: Vec<String> = serde_json::from_str(json).unwrap();
let mut data = data.into_iter().fuse();
let first = data.next().expect("Needed five elements, missing the first");
let _ = data.next();
let third = data.next().expect("Needed five elements, missing the third");
let _ = data.next();
let fifth = data.next().expect("Needed five elements, missing the fifth");
let a = A {
third,
first,
fifth,
};
}
I'd challenge the requirement to have a vector, however. Using a tuple is simpler and avoids much of the error handling needed, if you have exactly 5 elements:
fn main() {
let data: (String, String, String, String, String) = serde_json::from_str(json).unwrap();
let a = A {
third: data.2,
first: data.0,
fifth: data.4,
};
}
See also:
How can I ignore extra tuple items when deserializing with Serde? ("trailing characters" error)
Another option is to use a vector of Option<String>. This allows us to move the values out, while keeping track of what values have been moved, so they are not dropped with the vector.
let mut data: Vec<Option<String>> = serde_json::from_str(json).unwrap();
let a = A {
third: data[2].take().unwrap(),
first: data[0].take().unwrap(),
fifth: data[4].take().unwrap(),
};

Functionally creating a nested object from a flat structure

I am attempting to turn a flat structure like the following:
let flat = vec![
Foo {
a: "abc1".to_owned(),
b: "efg1".to_owned(),
c: "yyyy".to_owned(),
d: "aaaa".to_owned(),
},
Foo {
a: "abc1".to_owned(),
b: "efg2".to_owned(),
c: "zzzz".to_owned(),
d: "bbbb".to_owned(),
}];
into a nested JSON object through serde_json that looks something like:
{
"abc1": {
"efg1": {
"c": "hij1",
"d": "aaaa",
},
"efg2": {
"c": "zzzz",
"d": "bbbb",
},
}
}
(The values b are guaranteed to be unique within the array)
If I had needed only one layer, I would do something like this:
let map = flat.into_iter().map(|input| (input.a, NewType {
b: input.b,
c: input.c,
d: input.d,
})).collect::<Hashmap<String, NewType>>();
let out = serde_json::to_string(map).unwrap();
However, this doesn't seem to scale to multiple layers (i.e. (String, (String, NewType)) can't collect into Hashmap<String, Hashmap<String, NewType>>)
Is there a better way than manually looping and inserting entries into the hashmaps, before turning them into json?
A map will preserve the shape of the data. That is not what you want; the cardinality of the data has been changed after the transformation. So a mere map won't be sufficient.
Instead, a fold will do: you start with an empty HashMap, and populate it as you iterate through the collection. But it is hardly any more readable than a loop in this case. I find a multimap is quite useful here:
use multimap::MultiMap;
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
c: String,
d: String,
}
#[derive(Debug)]
struct NewFoo {
c: String,
d: String,
}
fn main() {
let flat = vec![
Foo {
a: "abc1".to_owned(),
b: "efg1".to_owned(),
c: "yyyy".to_owned(),
d: "aaaa".to_owned(),
},
Foo {
a: "abc1".to_owned(),
b: "efg2".to_owned(),
c: "zzzz".to_owned(),
d: "bbbb".to_owned(),
},
];
let map = flat
.into_iter()
.map(|e| (e.a, (e.b, NewFoo { c: e.c, d: e.d })))
.collect::<MultiMap<_, _>>()
.into_iter()
.map(|e| (e.0, e.1.into_iter().collect::<HashMap<_, _>>()))
.collect::<HashMap<_, _>>();
println!("{:#?}", map);
}
If you need to do something custom to flatten/merge your Foo structure, you could turn it into json Values in your rust code using something this:
let mut root: Map<String, Value> = Map::new();
for foo in flat.into_iter() {
let b = json!({ "c": foo.c, "d": foo.d });
if let Some(a) = root.get_mut(&foo.a) {
if let Value::Object(map) = a {
map.insert(foo.b, b);
}
} else {
root.insert(foo.a, json!({foo.b: b}));
}
};
link to playground

Borrowing within a vector of structures

I have a vector of structures and i would like to update one structure with values in another. For my use case, I prefer to do it in a loop. I'm hitting the borrow-checker but it seems like there must be a simple solution to this type of problem.
#[derive(Debug)]
struct Column {
header: String,
amount: i32,
}
fn main() {
let mut spreadsheet: Vec<Column> = Vec::new();
spreadsheet.push(Column {
header: "Car".to_string(),
amount: 30300,
});
spreadsheet.push(Column {
header: "House".to_string(),
amount: 210800,
});
spreadsheet.push(Column {
header: "Total".to_string(),
amount: 0,
});
for column in &mut spreadsheet {
//mutable borrow here
if column.header == "Total" {
column.amount = spreadsheet[0].amount //immutable borrow here
+ spreadsheet[1].amount;
} else {
column.amount -= 300;
}
}
for column in spreadsheet {
println!("{:?}", column);
}
}
You are trying to set spreadsheet vector element while iterating inside of it. Since you always wanting to use spreadsheet[0].amount and spreadsheet[1].amount you can clone this values into another variable and work with them instead of using them inside of spreadsheet.
Here is the working code:
#[derive(Debug)]
struct Column {
header: String,
amount: i32,
}
fn main() {
let mut spreadsheet: Vec<Column> = Vec::new();
spreadsheet.push(Column {
header: "Car".to_string(),
amount: 30300,
});
spreadsheet.push(Column {
header: "House".to_string(),
amount: 210800,
});
spreadsheet.push(Column {
header: "Total".to_string(),
amount: 0,
});
let car_amount = spreadsheet[0].amount;
let header_amount = spreadsheet[1].amount;
spreadsheet.iter_mut().for_each(|column| {
if column.header == "Total" {
column.amount = car_amount + header_amount;
} else {
column.amount -= 300;
}
});
for column in spreadsheet {
println!("{:?}", column);
}
}
Playground with using iter()
Since you want to do these operations in a for loop instead of iterator you can change the spreadsheet.iter_mut()... code block to the following:
for column in &mut spreadsheet {
if column.header == "Total" {
column.amount = car_amount + header_amount;
} else {
column.amount -= 300;
}
}
Playground with using for loop

Resources