Not live long enough with CSV and dataflow - rust

fn main() {
timely::execute_from_args(std::env::args().skip(0), move |worker| {
let (mut input, probe) = worker.dataflow::<_, _, _>(|scope| {
let (input, data) = scope.new_collection();
let probe = data.inspect(|x| println!("observed data: {:?}", x)).probe();
(input, probe)
});
let mut rdr = csv::ReaderBuilder::new()
.has_headers(false)
.flexible(true)
.delimiter(b'\t')
.from_reader(io::stdin());
for result in rdr.deserialize() {
let record = result.expect("a CSV record");
let mut vec = Vec::new();
for i in 0..13 {
vec.push(&record[i]);
}
input.insert(vec);
}
});
}
The error is record can not live long enough. I try to read the CSV record and read it as a vector. Then insert records in to the data flow. I can run them separate. I can read the CSv as vector and use the data flow in other place.

The problem is that you are pushing to the Vec a borrowed value: &record[i]. The & means borrow, and as a consequence the original value record must outlive the borrower vec.
That might seem fine (both are in the for body, and thus both have the same lifetime, i.e., they both live inside the for body and therefore none outlive each other), but this doesn't happen because the line input.insert(vec) is moving vec. What this means is that vec now becomes owned by input and hence it lives as long as input (as far as I understand). Now, because input is outside the for body, the moved vec lives as long as input and therefore outlives the record[i]s.
There are a few solutions, but all of them try to remove the dependency between record and input:
If the record is an array of primitive values, or something that implements the Copy trait, you can simply omit the borrow and the value will be copied into the vector: vec.push(record[i]).
Clone the record value into the vector: vec.push(record[i].clone()). This forces the creation of a clone, which as above, the vec becomes the owner, avoiding the borrow.
If the elements in the record array don't implement Copy nor Clone, you have to move it. Because the value is in an array, you have to move the array fully (it can't have elements that haven't been removed). One solution is to transform it into an iterator that moves out the values one by one, and then push them into the vector:
for element in record.into_iter().take(13) {
vec.push(element)
}
Replace the record value with a different value. One final solution in order to move only parts of the array is to replace the element in the array with something else. This means that although you remove an element from the array, you replace it with something else, and the array continues to be valid.
for i in 0..13 {
vec.push(std::mem::replace(&record[i], Default::default()));
}
You can replace Default::default() with another value if you want to.
I hope this helps. I'm still a noob in Rust, so improvements and critique on the answer are accepted :)

Related

Immutable access in rust

I am new to rust from python and have used the functional style in python extensively.
What I am trying to do is to take in a string (slice) (or any iterable) and iterate with a reference to the current index and the next index. Here is my attempt:
fn main() {
// intentionally immutable, this should not change
let x = "this is a
multiline string
with more
then 3 lines.";
// initialize multiple (mutable) iterators over the slice
let mut lineiter = x.chars();
let mut afteriter = x.chars();
// to have some reason to do this
afteriter.skip(1);
// zip them together, comparing the current line with the next line
let mut zipped = lineiter.zip(afteriter);
for (char1, char2) in zipped {
println!("{:?} {:?}", char1, char2);
}
}
I think it should be possible to get different iterators that have different positions in the slice but are referring to the same parts of memory without having to copy the string, but the error I get is as follows:
error[E0382]: use of moved value: `afteriter`
--> /home/alex/Documents/projects/simple-game-solver/src/src.rs:15:35
|
10 | let afteriter = x.chars();
| --------- move occurs because `afteriter` has type `std::str::Chars<'_>`, which does not implement the `Copy` trait
11 | // to have some reason to do this
12 | afteriter.skip(1);
| --------- value moved here
...
15 | let mut zipped = lineiter.zip(afteriter);
| ^^^^^^^^^ value used here after move
I also get a warning telling me that zipped does not need to be mutable.
Is it possible to instantiate multiple iterators over a single variable and if so how can it be done?
Is it possible to instantiate multiple iterators over a single variable and if so how can it be done?
If you check the signature and documentation for Iterator::skip:
fn skip(self, n: usize) -> Skip<Self>
Creates an iterator that skips the first n elements.
After they have been consumed, the rest of the elements are yielded. Rather than overriding this method directly, instead override the nth method.
You can see that it takes self by value (consumes the input iterator) and returns a new iterator. This is not a method which consumes the first n elements of the iterator in-place, it's one which converts the existing iterator into one which skips the first n elements.
So instead of:
let mut afteriter = x.chars();
afteriter.skip(1);
you just write:
let mut afteriter = x.chars().skip(1);
I also get a warning telling me that zipped does not need to be mutable.
That's because Rust for loop uses the IntoIterator trait, which moves the iterable into the loop. It's not creating a mutable reference, it's just consuming whatever the RHS is.
Therefore it doesn't care what the mutability of the variable. You do need mut if you iterate explicitly, or if you call some other "terminal" method (e.g. nth or try_fold or all), or if you want to iterate on the mutable reference (that's mostly useful for collections though), but not to hand off iterators to some other combinator method, or to a for loop.
A for loop takes self, if you will. Just as for_each does in fact.
Thanks to #Stargateur for giving me the solution. The .skip(1) takes ownership of afteriter and returns ownership to a version without the first element. What was happening before was ownership was lost on the .skip and so the variable could not be mutated anymore (I am pretty sure)

Rust - create hashmap which uses part of the data it's storing from an iterator

I'm new to rust and am trying to figure out how to create a HashMap of borrowed values from a Vec of data but when I try to do it I get a Vec into a HashMap the ownership model fights me. I don't know how to accomplish this, maybe I'm just trying something that is against the Rust mentality.
For Example:
struct Data{
id: String,
other_value: String,
}
//inside a method somewhere
let data_array = load_data(); // returns a Vec<Data>
let mut hash = HashMap::new(); // HashMap<&String, &Data>
for item in data_array {
hash.insert(&item.id, &item);
}
As far As I know there should be a way to populate this data in this way as the HashMap would be storing references to the original data. Or maybe I've just flat out misunderstood the docs... ¯_(ツ)_/¯
There key issue here is that you are consuming the Vec. for loops in Rust work over things that implement IntoIter. IntoIter moves the Vec into an iterator - the Vec itself no longer exists once this is done.
Therefore the items that you are looping though disappear at the end of each iteration., so those references end up referencing nonexistent data (dangling references). If you tried to using them, Bad Things Would Happen. Rust prevents you shooting yourself in the foot like that, so you get an error telling you that the reference does not live long enough. The solution to make your code compile is very easy. Just add .iter() to the end of the loop, which will iterate through references rather than consume the Vec.
for item in data_array.iter() {
hash.insert(&item.id, item); //Note we don't need an `&` in front of item
}
I'm still relatively new to Rust, so this might not be right, but I think it does what you want, but once and for all -- i.e. a function that makes it easy to turn a collection into a map using a closure to generate the keys:
fn map_by<I,K,V>(iterable: I, f: impl Fn(&V) -> K) -> HashMap<K,V>
where I: IntoIterator<Item = V>,
K: Eq + Hash
{
iterable.into_iter().map(|v| (f(&v), v)).collect()
}
Allowing you to say
map_by(data_array.iter(), |item| &item.id)
Here it is in the playground:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=87c0e4d1e68ccb6dd3f2c43ac9f318c7
Please nudge me in the right direction if I have this wrong.
Is there a function like this lying around in std?
So turns out you can borrow the iterator value by borrowing the collection (Vec). So the example above turns into:
for item in &data_array {
hash.insert(&item.id, item);
}
Notice the &data_array which turns item from Data type to &Data and allows you to use the borrowed value.

How can I iterate over a delimited string, accumulating state from previous iterations without explicitly tracking the state?

I want to produce an iterator over a delimited string such that each substring separated by the delimiter is returned on each iteration with the substring from the previous iteration, including the delimiter.
For example, given the string "ab:cde:fg", the iterator should return the following:
"ab"
"ab:cde"
"ab:cde:fg"
Simple Solution
A simple solution is to just iterate over collection returned from splitting on the delimiter, keeping track of the previous path:
let mut state = String::new();
for part in "ab:cde:fg".split(':') {
if !state.is_empty() {
state.push_str(":");
}
state.push_str(part);
dbg!(&state);
}
The downside here is the need to explicitly keep track of the state with an extra mutable variable.
Using scan
I thought scan could be used to hide the state:
"ab:cde:fg"
.split(":")
.scan(String::new(), |state, x| {
if !state.is_empty() {
state.push_str(":");
}
state.push_str(x);
Some(&state)
})
.for_each(|x| { dbg!(x); });
However, this fails with the error:
cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
What is the problem with the scan version and how can it be fixed?
Why even build a new string?
You can get the indices of the : and use slices to the original string.
fn main() {
let test = "ab:cde:fg";
let strings = test
.match_indices(":") // get the positions of the `:`
.map(|(i, _)| &test[0..i]) // get the string to that position
.chain(std::iter::once(test)); // let's not forget about the entire string
for substring in strings {
println!("{:?}", substring);
}
}
(Permalink to the playground)
First of all, let us cheat and get your code to compile, so that we can inspect the issue at hand. We can do so by cloning the state. Also, let's add some debug message:
fn main() -> () {
"ab:cde:fg"
.split(":")
.scan(String::new(), |state, x| { // (1)
if !state.is_empty() {
state.push_str(":");
}
state.push_str(x);
eprintln!(">>> scan with {} {}", state, x);
Some(state.clone())
})
.for_each(|x| { // (2)
dbg!(x);
});
}
This results in the following output:
scan with ab ab
[src/main.rs:13] x = "ab"
scan with ab:cde cde
[src/main.rs:13] x = "ab:cde"
scan with ab:cde:fg fg
[src/main.rs:13] x = "ab:cde:fg"
Note how the eprintln! and dbg! outputs are interleaved? That's the result of Iterator's laziness. However, in practice, this means that our intermediate String is borrowed twice:
in the anonymous function |state, x| in state (1)
in the anonymous function |x| in, well, x (2)
However, this would lead to duplicate borrows, even though at least one of them is mutable. The mutable borrow therefore enforces the lifetime of our String to be bound to the anonymous function, whereas the latter function still needs an alive String. Even if we somehow managed to annotate lifetimes, we would just end up with an invalid borrow in (2), as the value is still borrowed as mutable.
The easy way out is a clone. The smarter way out uses match_indices and string slices.

Is there any way to borrow a RefCell immutably and mutably at the same time?

I have a piece of code which needs to operate on a list. This list contains items which come from another source and need to be processed and eventually removed. The list is also passed along to multiple functions which decide whether to add or remove an item. I created an example code which reflects my issue:
use std::{cell::RefCell, rc::Rc};
pub fn foo() {
let list: Rc<RefCell<Vec<Rc<RefCell<String>>>>> = Rc::new(RefCell::new(Vec::new()));
list.borrow_mut()
.push(Rc::new(RefCell::new(String::from("ABC"))));
while list.borrow().len() > 0 {
let list_ref = list.borrow();
let first_item = list_ref[0].borrow_mut();
//item processing, needed as mutable
list.borrow_mut().remove(0);
}
}
This panics at runtime:
thread 'main' panicked at 'already borrowed: BorrowMutError', src/libcore/result.rs:997:5
I think I understand the problem: I have two immutable borrows and then a third which is mutable. According to the Rust docs, this is not allowed: either many immutable borrows or a single mutable one. Is there any way to get around this issue?
I have no idea what you are actually trying to achieve as you have failed to provide a minimal reproducible example, but I think you just mixed up the borrows of the list and the item in your data structure and that confused you in the first place.
Nonetheless the following code (which you can run in the playground) does what you have described above.
use std::{cell::RefCell, rc::Rc};
pub fn foo() {
let list = Rc::new(RefCell::new(Vec::new()));
let mut list = list.borrow_mut();
let item = Rc::new(RefCell::new(String::from("ABC")));
list.push(item);
println!("list: {:?}", list);
while let Some(item) = list.pop() {
println!("item: {:?}", item);
item.borrow_mut().push_str("DEF");
println!("item: {:?}", item);
}
println!("list: {:?}", list);
}
fn main() {
foo();
}
There are two tricks which I used here.
I borrowed the list only once and that borrow was a mutable one, which allowed me to add and remove items from it.
Because your description said you want to remove the items from the list anyway, I was able to iterate over the Vec with the pop or the remove methods (depending on the order you wish to get the items from the list). This means I didn't have to borrow the Vec for the scope of the loop (which you would otherwise do if you would iterate over it).
There are other ways to remove an element based on some predicate. For example: Removing elements from a Vec based on some condition.
To actually answer your original question: there is no way to have an immutable and a mutable borrow at the same time safely. That is one of the core principles of Rust which makes it memory safe. Think about it, what kind of guarantee would immutability be if at the same time, under the hood, the data could actually change?

How to get a slice from an Iterator?

I started to use clippy as a linter. Sometimes, it shows this warning:
writing `&Vec<_>` instead of `&[_]` involves one more reference and cannot be
used with non-Vec-based slices. Consider changing the type to `&[...]`,
#[warn(ptr_arg)] on by default
I changed the parameter to a slice but this adds boilerplate on the call side. For instance, the code was:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
but now it is:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect::<Vec<_>>();
function(&names);
otherwise, I get the following error:
error: the trait `core::marker::Sized` is not implemented for the type
`[collections::string::String]` [E0277]
So I wonder if there is a way to convert an Iterator to a slice or avoid having to specify the collected type in this specific case.
So I wonder if there is a way to convert an Iterator to a slice
There is not.
An iterator only provides one element at a time, whereas a slice is about getting several elements at a time. This is why you first need to collect all the elements yielded by the Iterator into a contiguous array (Vec) before being able to use a slice.
The first obvious answer is not to worry about the slight overhead, though personally I would prefer placing the type hint next to the variable (I find it more readable):
let names: Vec<_> = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
Another option would be for function to take an Iterator instead (and an iterator of references, at that):
let names = args.arguments.iter().map(|arg| &arg.name);
function(names);
After all, iterators are more general, and you can always "realize" the slice inside the function if you need to.
So I wonder if there is a way to convert an Iterator to a slice
There is. (in applicable cases)
Got here searching "rust iter to slice", for my use-case, there was a solution:
fn main() {
// example struct
#[derive(Debug)]
struct A(u8);
let list = vec![A(5), A(6), A(7)];
// list_ref passed into a function somewhere ...
let list_ref: &[A] = &list;
let mut iter = list_ref.iter();
// consume some ...
let _a5: Option<&A> = iter.next();
// now want to eg. return a slice of the rest
let slice: &[A] = iter.as_slice();
println!("{:?}", slice); // [A(6), A(7)]
}
That said, .as_slice is defined on an iter of an existing slice, so the previous answerer was correct in that if you've got, eg. a map iter, you would need to collect it first (so there is something to slice from).
docs: https://doc.rust-lang.org/std/slice/struct.Iter.html#method.as_slice

Resources