How to convert Vec of values into slices of references - rust

I have
struct Data {
// details omitted
}
fn parse_data(raw: &str) -> Option<Data> {
// details omitted
}
fn step1(datas: &[&Data]) {
// details omitted
}
fn step2(datas: &[&Data]) {
// details omitted
}
I want a function that takes a collection of the raw data type (in my example &str), converts them and the passes borrows of the data into step1 and step2.
fn do_things(raw_strs: &[&str]) {
// notice this Vec owns all the Data items
let datas: Vec<Data> = raw_strs.iter()
.flat_map(|r| parse_data(r))
.collect();
// this eludes me on how to call these
step1(datas.as_slice());
step2(datas.as_slice());
}
I get "expected &Data, found struct Data".
How can I create a slice of references so I can pass the data to multiple functions, without creating copies of the Data?

You can always create another Vec<&Data> and pass a slice to the methods:
fn do_things() {
// notice this Vec owns all the Data items
let datas: Vec<Data> = (0..10)
.map(|r| Data {})
.collect();
let data_refs = datas.iter().collect::<Vec<_>>();
// this eludes me on how to call these
step1(&data_refs);
step2(&data_refs);
}
Playground
But, unless you already have working structures with [&Data], your methods should take just &[Data], there is no moving involving, you are just referencing your Vec as an slice.
How can I create a slice of references so I can pass the data to multiple functions, without creating copies of the Data?
As said, when using &[Data] you are not copying them, you just pass a reference to a slice of Data. Not moving or copying involved in the Data themselves.

Related

Rust: how to assign `iter().map()` or `iter().enumarate()` to same variable

struct A {...whatever...};
const MY_CONST_USIZE:usize = 127;
// somewhere in function
// vec1_of_A:Vec<A> vec2_of_A_refs:Vec<&A> have values from different data sources and have different inside_item types
let my_iterator;
if my_rand_condition() { // my_rand_condition is random and compiles for sake of simplicity
my_iterator = vec1_of_A.iter().map(|x| (MY_CONST_USIZE, &x)); // Map<Iter<Vec<A>>>
} else {
my_iterator = vec2_of_A_refs.iter().enumerate(); // Enumerate<Iter<Vec<&A>>>
}
how to make this code compile?
at the end (based on condition) I would like to have iterator able build from both inputs and I don't know how to integrate these Map and Enumerate types into single variable without calling collect() to materialize iterator as Vec
reading material will be welcomed
In the vec_of_A case, first you need to replace &x with x in your map function. The code you have will never compile because the mapping closure tries to return a reference to one of its parameters, which is never allowed in Rust. To make the types match up, you need to dereference the &&A in the vec2_of_A_refs case to &A instead of trying to add a reference to the other.
Also, -127 is an invalid value for usize, so you need to pick a valid value, or use a different type than usize.
Having fixed those, now you need some type of dynamic dispatch. The simplest approach would be boxing into a Box<dyn Iterator>.
Here is a complete example:
#![allow(unused)]
#![allow(non_snake_case)]
struct A;
// Fixed to be a valid usize.
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator: Box<dyn Iterator<Item=(usize, &A)>>;
if my_rand_condition() {
// Fixed to return x instead of &x
my_iterator = Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
// Added map to deref &&A to &A to make the types match
my_iterator = Box::new(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Instead of a boxed trait object, you could also use the Either type from the either crate. This is an enum with Left and Right variants, but the Either type itself implements Iterator if both the left and right types also do, with the same type for the Item associated type. For example:
#![allow(unused)]
#![allow(non_snake_case)]
use either::Either;
struct A;
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator;
if my_rand_condition() {
my_iterator = Either::Left(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
my_iterator = Either::Right(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Why would you choose one approach over the other?
Pros of the Either approach:
It does not require a heap allocation to store the iterator.
It implements dynamic dispatch via match which is likely (but not guaranteed) to be faster than dynamic dispatch via vtable lookup.
Pros of the boxed trait object approach:
It does not depend on any external crates.
It scales easily to many different types of iterators; the Either approach quickly becomes unwieldy with more than two types.
You can do this using a Boxed trait object like so:
let my_iterator: Box<dyn Iterator<Item = _>> = if my_rand_condition() {
Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)))
} else {
Box::new(vec2_of_A_refs.iter().enumerate().map(|(i, x)| (i, *x)))
};
I don't think this is a good idea generally though. A few things to note:
The use of trait objects means the types here must be resolved dynamically. This adds a lot of overhead.
The closure in vec1's iterator's map method cannot reference its arguments. Instead the second map must be added to vec2s iterator. The effect of this is that all the items are being copied regardless. If you are doing this, why not collect()? The overhead for creating the Vec or whatever you choose should be less than that of the dynamic resolution.
Bit pedantic, but remember if statements are expressions in Rust, and so the assignment can be expressed a little more cleanly as I have done above.

struct with reference to element of a vector in another field

I have the below example where I want a struct which holds a vector of data in one field, and has another field which contains the currently selected field. My understanding is that this is not possible in rust because I could remove the element in tables which selected points to, thereby creating a dangling pointer (can't borrow mut when an immutable borrow exists). The obvious workaround is to instead store a usize index for the element, rather than a &'a String. But this means I need to update the index if I remove an element from tables. Is there any way to avoid this using smart pointers, or just any better solutions in general? I've looked at other questions but they are not quite the same as below, and have extra information which makes them harder to follow for a beginner like myself, whereas below is a very minimal example.
struct Data<'a> {
selected: &'a String,
tables: Vec<String>,
}
fn main() {
let tables = vec!["table1".to_string(), "table2".to_string()];
let my_stuff = Data {
selected: &tables[0],
tables: tables,
};
}
You quite rightfully assessed that the way you wrote it is not possible, because Rust guarantees memory safety and storing it as a reference would give the possibility to create a dangling pointer.
There are several solutions that I could see here.
Static Strings
This of course only works if you store compile-time static strings.
struct Data {
selected: &'static str,
tables: Vec<&'static str>,
}
fn main() {
let tables = vec!["table1", "table2"];
let my_stuff = Data {
selected: &tables[0],
tables,
};
}
The reason this works is because static strings are non-mutable and guaranteed to never be deallocated. Also, in case this is confusing, I recommend reading up on the differences between Strings and str slices.
You can even go one further and reduce the lifetime down to 'a. But then, you have to store them as &'a str in the vector, to ensure they cannot be edited.
But that then allows you to store Strings in them, as long as the strings can be borrowed for the entire lifetime of the Data object.
struct Data<'a> {
selected: &'a str,
tables: Vec<&'a str>,
}
fn main() {
let str1 = "table1".to_string();
let str2 = "table2".to_string();
let tables = vec![str1.as_str(), str2.as_str()];
let my_stuff = Data {
selected: &tables[0],
tables,
};
}
Reference counting smart pointers
Depending your situation, there are several types that are recommended:
Rc<...> - if your data is immutable. Otherwise, you need to create interior mutability with:
Rc<Cell<...>> - safest and best solution IF your problem is single-threaded and deals with simple data types
Rc<RefCell<...>> - for more complex data types that have to be updated in-place and can't just be moved in and out
Arc<Mutex<...>> - as soon as your problem stretches over multiple threads
In your case, the data is in fact simple and your program is single-threaded, so I'd go with:
use std::{cell::Cell, rc::Rc};
struct Data {
selected: Rc<Cell<String>>,
tables: Vec<Rc<Cell<String>>>,
}
fn main() {
let tables = vec![
Rc::new(Cell::new("table1".to_string())),
Rc::new(Cell::new("table2".to_string())),
];
let my_stuff = Data {
selected: tables[0].clone(),
tables,
};
}
Of course, if you don't want to modify your strings after creation, you could go with:
use std::rc::Rc;
struct Data {
selected: Rc<String>,
tables: Vec<Rc<String>>,
}
fn main() {
let tables = vec![Rc::new("table1".to_string()), Rc::new("table2".to_string())];
let my_stuff = Data {
selected: tables[0].clone(),
tables,
};
}
Hiding the data structure and using an index
As you already mentioned, you could use an index instead. Then you would have to hide the vector and provide getters/setters/modifiers to make sure the index is kept in sync when the vector changes.
I'll keep the implementation up to the reader and won't provide an example here :)
I hope this helped already, or at least gave you a couple of new ideas. I'm happy to see new people come to the community, so feel free to ask further questions if you have any :)

Transfer ownership of Vec to VecDeque

In the following code,
I want to transfer the staged_bytes
vector into the buffer. Specifically,
I want 'buffer' to take ownership
of staged_bytes so that I can
reuse the staged_bytes field for
a brand new vec of u8.
I show a solution to my problem in the code.
Unfortunately according
to rust documentation, it implies a copy
of the vector elements. Since that vector
can be big, I don't want the copy, hence
my desire for an ownership transfer.
I think I can't do it the way I want because the ownership is
at the same time in staged_bytes and buffer
during the transfer().
So what are the solutions ? I thought of shared
pointers (Rc, etc.) but it seems overkill since
it's not actually shared (maybe the optimizer
figures it out ?)...
use std::collections::VecDeque;
struct M {
buffer : VecDeque<Vec<u8>>,
staged_bytes: Vec<u8>
}
impl M {
fn transfer(&mut self) {
// This is what I want to do (the idea)
// (it doesn't compile, E0507)
// self.buffer.push_front(self.staged_bytes);
// This is what I don't want to do
// (it compiles)
self.buffer.push_front(self.staged_bytes.clone());
// After the transfer, I can start with a new vec.
self.staged_bytes = Vec::new();
}
}
fn main() {
let mut s = M {
buffer : VecDeque::new(),
staged_bytes: Vec::new()
};
s.staged_bytes.push(112);
s.transfer();
}
You want to use std::mem::take() here, which returns the value moved from a mutable reference and replaces it with the default value of the type:
self.buffer.push_front(std::mem::take(&mut self.staged_bytes));
Since the default value of a Vec is an empty vector, you can remove the following assignment (self.staged_bytes = Vec::new();) as it is redundant.

Rust error :Cannot return value referencing temporary value

I'm trying to make a code that returns the mode of a list of given numbers.
Here's the code :
use std::collections::HashMap;
fn mode (vector: &Vec<i32>) -> Vec<&&i32> {
let mut occurrences = HashMap::new();
let mut n= Vec::new();
let mut mode = Vec::new();
for i in vector {
let j= occurrences.entry(i).or_insert(0);
*j+=1;
}
for (num, occ) in occurrences.clone().iter() {
if occ> n[0] {
n.clear();
mode.clear();
n.push(occ);
mode.push(num);
} else if occ== n[0] {
mode.push(num);
}
}
mode
}
fn main () {
let mut numbers: Vec<i32>= vec![1,5,2,2,5,3]; // 2 and 5 are the mode
numbers.sort();
println!("the mode is {:?}:", mode(&numbers));
}
I used a vector for the mode since a dataset could be multimodal.
Anyway, I'm getting the following error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:26:5
|
13 | for (num, occ) in occurrences.clone().iter() {
| ------------------- temporary value created here
...
26 | mode
| ^^^^ returns a value referencing data owned by the current function
When you return from the current function, any owned values are destroyed (other than the ones being returned from the function), and any data referencing that destroyed data therefore cannot be returned, e.g.:
fn example() -> &str {
let s = String::from("hello"); // owned data
&s // error: returns a value referencing data owned by the current function
// you can imagine this is added by the compiler
drop(s);
}
The issue you have comes from iter(). iter() returns an iterator of shared references:
let values: Vec<i32> = vec![1, 2, 3];
for i in values.iter() {
// i is a &i32
}
for i in values {
// i is an i32
}
So when you call occurrences.clone().iter() you're creating a temporary value (via clone()) which is owned by the current function, then iterating over that data via shared reference. When you destructure the tuple in (num, occ), these are also shared references.
Because you later call mode.push(num), Rust realizes that mode has the type Vec<&i32>. However, there is an implicit lifetime here. The lifetime of num is essentially the lifetime of the current function (let's call that 'a), so the full type of mode is Vec<&'a i32>.
Because of that, you can't return it from the current function.
To fix
Removing iter() should work, since then you will be iterating over owned values. You might also find that you can remove .clone() too, I haven't looked too closely but it seems like it's redundant.
A couple of other points while you're here:
It's rare to interact with &Vec<Foo>, instead it's much more usual to use slices: &[Foo]. They're more general, and in almost all cases more performant (you can still pass your data in like: &numbers)
Check out clippy, it has a bunch of linter rules that can catch a bunch of errors much earlier, and usually does a good job explaining them: https://github.com/rust-lang/rust-clippy

Replace iterator in struct [duplicate]

This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Is there an owned version of String::chars?
(6 answers)
Closed 2 years ago.
Newbie question: how do I store an iterator in a struct, and half-way the struct's lifetime, replace that iterator with a new one?
My test program, takes a vec of strings, and wraps that in a CharStream (or CharIterator or CharGenerator - whatever you want to call it).
Goal is to do this without transforming or copying the original data, so no concat or map() or collect() into an intermediate list of some sorts.
use core::str::Chars;
struct CharStream<'a> {
stream: std::vec::Vec<String>, //test data
line: usize, //test data
char_it: Chars<'a>
}
impl<'a> CharStream<'a> {
fn new(stream: std::vec::Vec<String>) -> CharStream<'a> {
CharStream { stream: stream, line: 0, char_it: stream[0].chars()}
}
/// next() should give the next char in the 'stream'.
/// to keep the function simple, it just returns the first char of each line...
/// and tries to replace the chars iterator with the next one.
fn next(&mut self) -> Option<char> {
self.line += 1;
self.char_it = self.stream[self.line].chars(); //<< this fails!
self.char_it.next()
}
}
fn main() {
//just test data, imagine some stream, read line by line:
let mut stream = Vec::new();
stream.push("initial".to_string());
stream.push("second".to_string());
stream.push("third".to_string());
let mut cont = CharStream::new(stream);
for i in 0..1 {
let c = cont.next();
println!("{:?}", c);
}
}
If possible, I'm looking for the smallest fix to this code, but if in Rust, another pattern is better to implement this...
The goal remains to be able to generate one stream from another without intermediate storage or full consumption of the incoming stream.

Resources