Dealing with clone - rust

I'm trying to iterate over a VCF file and create vectors with the data to build a DataFrame.
However, the rust compilator is raising an error saying that the borrowed value does not live enough.
I'm cloning the value because, in that case, I'm borrowing the vcf record as immutable and as mutable at the same time.
I don't know how to proceed. Below there are the snippet of my code and the error.
use flate2::read::MultiGzDecoder;
use std::fs::File;
use std::io::BufReader;
mod vcf_reader;
use vcf::{VCFReader, U8Vec, VCFHeaderFilterAlt, VCFError, VCFRecord};
use vcf_reader::reader::VCFSamples;
use polars_core::prelude::*;
use std::time::Instant;
fn main() -> Result<()> {
let now = Instant::now();
let mut reader = VCFReader::new(BufReader::new(File::open(
"C:\\Users\\Desktop\\rust_lectures\\vcf_reader\\vcf_reader\\*.vcf"
)?)).unwrap();
// prepare VCFRecord object
let mut vcf_record = VCFRecord::new(reader.header());
// read one record
let mut chromosome = Vec::new();
let mut position = Vec::new();
let mut id = Vec::new();
let mut reference = Vec::new();
let mut alternative = Vec::new();
let result: bool = reader.next_record(&mut vcf_record).unwrap();
loop {
let row = vcf_record.clone();
if result == false {
let df = df!(
"Chromosome" => chromosome,
"Position" => position,
"Id" => id,
"Reference" => reference,
"Alternative" => alternative,
);
println!("{:?}", df);
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
return Ok(())
} else {
chromosome.push(String::from_utf8_lossy(&row.chromosome));
position.push(&row.position);
id.push(String::from_utf8_lossy(&row.id[0]));
reference.push(String::from_utf8_lossy(&row.reference));
alternative.push(String::from_utf8_lossy(&row.alternative[0]));
}
reader.next_record(&mut vcf_record).unwrap();
}
error[E0597]: `row.chromosome` does not live long enough
--> src\main.rs:44:53
|
44 | chromosome.push(String::from_utf8_lossy(&row.chromosome));
| ----------------------------------------^^^^^^^^^^^^^^^^^^^^--
| | |
| | borrowed value does not live long enough
| borrow later used here
...
50 | }
| - `row.chromosome` dropped here while still borrowed

I'm cloning the value because
The problem is that you're cloning the record but then you're storing references into your arrays. Since those are references to the cloned record, they only live until the end of the block.
So either:
move the attributes out of the clone (essentially explode it)
or rather than working with a copy of the record, copy individual fields out of the base vcf_record
Either way you're also misusing from_utf8_lossy: it always returns a reference-ish, because it avoids allocating if the input is valid utf8 (in that case it essentially just returns a reference to the original data).

Related

Read file with BufReader line by line and put in HashMap error borrowed value does not live long enough

I would like to read a file line by line and then process the words. I use HashMap and the entry API for that. However I get a 'borrowed value does not live long enough' error and am puzzled how to fix this.
1 use std::fs::File;
2 use std::io::{BufRead, BufReader};
3 use std::collections::HashMap;
4
5 fn main() {
6
7 let mut wmap: HashMap<_, i32> = HashMap::new();
8 let file = File::open("book1.txt").unwrap();
9 let reader = BufReader::new(file);
10 for (_index, line) in reader.lines().enumerate() {
11 let line = line.unwrap(); // Ignore errors.
12 let words = line.split_whitespace();
13 for word in words {
14 println!("{}.:.{}", _index, word);
15 *wmap.entry(word).or_insert(0) += 1;
16 }
17 }
18
19 }
The error I get is
error[E0597]: `line` does not live long enough
--> example-words.rs:12:17
|
12 | let words = line.split_whitespace();
| ^^^^^^^^^^^^^^^^^^^^^^^ borrowed value does not live long enough
...
15 | *wmap.entry(word).or_insert(0) += 1;
| ---------------- borrow later used here
16 | }
17 | }
| - `line` dropped here while still borrowed
error: aborting due to previous error
For more information about this error, try `rustc --explain E0597`.
I am aware that this is very similar to Borrowed Value Using BufReader and Lines in Extra Function. However I tried to do it all in one main function whereas the other example uses the extra function
read_lines(filename: &str) -> Result<Lines<BufReader<File>>, Error>
Thanks for any help
You are passing a borrowed string slice (&str) to a HashMap that "lives longer" than the borrowed value word. For this to work the borrowed value would need to have the same lifetime as your HashMap OR the HashMap needs to have ownership of the value inside of word. Here's an example:
use std::io;
use std::collections::HashMap;
fn main() {
let mut db = HashMap::new(); //initialize mutable hashmap outside of the loop
loop{
//I start a loop to take in multiple key val arguments from the
//command line but this means each iteration of the loop will
//clean up heap variables and any &str borrowing from these
//variables will be invalid after each iteration and the rust
// borrow checker will let us know if we are trying to access
// these invalid references, hence the compiler error
let mut string = String::new();
io::stdin().read_line(&mut string).unwrap();
let command: Vec<&str> = string.trim().split(" ").collect();
db.insert(command[0], command[1]);
}
}
I end up with the same compiler error:
error[E0597]: `string` does not live long enough
--> main.rs:9:30
|
9 | let command: Vec<&str> = string.trim().split(" ").coll...
| ^^^^^^^^^^^^^ borrowed value does not live long enough
10 | db.insert(command[0], command[1]);
| --------------------------------- borrow later used here
11 | }
| - `string` dropped here while still borrowed
This is because on every iteration of the loop the string slice I intend my HashMap to borrow gets dropped (goes out of scope and is no longer valid) and rust keeps us from having dangling references. Instead change the db.insert(command[0], command[1]) to db.insert(command[0].to_string(), command[2].to_string()). This will convert the &str -> String which will then be "owned" by the HashMap instance and survive for the remainder of the running
program. In your case:
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::collections::HashMap;
fn main() {
let mut wmap: HashMap<_, i32> = HashMap::new();
let file = File::open("book1.txt").unwrap();
let reader = BufReader::new(file);
for (_index, line) in reader.lines().enumerate() {
let line = line.unwrap(); // Ignore errors.
let words = line.split_whitespace();
for word in words {
println!("{}.:.{}", _index, word);
*wmap.entry(word.to_string()).or_insert(0) += 1;
}
}
}
this will compile and run :)
Hope that helps!
As commented by #cdhowie, you need to own the string using word.to_owned().
While it is not an error, Rust naming conventions say that an underscore in front of a variable implies that it is not used, so I renamed _index to index as well.
use std::collections::HashMap;
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let mut wmap: HashMap<_, i32> = HashMap::new();
let file = File::open("book1.txt").unwrap();
let reader = BufReader::new(file);
for (index, line) in reader.lines().enumerate() {
let line = line.unwrap(); // Ignore errors.
let words = line.split_whitespace();
for word in words {
println!("{}.:.{}", index, word);
*wmap.entry(word.to_owned()).or_insert(0) += 1;
}
}
}

Consume Vector inside closure without cloning

I have this data structure.
let bucket = HashMap<&str, Vec<&str>>
Given
let cluster = Vec<&str>
I want to expand it from the Vecs on Bucket and I can guarantee that I will just access each key value pair once and the &str in cluster are always a key in bucket.
use std::collections::HashMap;
fn main() {
let mut bucket: HashMap<&str, Vec<&str>> = HashMap::new();
bucket.insert("a", vec!["hello", "good morning"]);
bucket.insert("b", vec!["bye", "ciao"]);
bucket.insert("c", vec!["good"]);
let cluster = vec!["a", "b"];
let cluster2 = vec!["c"];
let mut clusters = [cluster, cluster2];
clusters.iter_mut().for_each(|cluster| {
// I don't like this clone
let tmp = cluster.clone();
let tmp = tmp.iter().flat_map(|seq| bucket[seq].
clone() // I really don't like this other clone
);
cluster.extend(tmp);
});
println!("{:?}", clusters);
}
This compiles but what I really want to do is drain the vector on bucket since I know I won't access it again.
let tmp = tmp.iter().flat_map(|seq| bucket.get_mut(seq).
unwrap().drain(..)
);
That gives me a compiler error:
error: captured variable cannot escape `FnMut` closure body
--> src/main.rs:13:45
|
4 | let mut bucket: HashMap<&str, Vec<&str>> = HashMap::new();
| ---------- variable defined here
...
13 | let tmp = tmp.iter().flat_map(|seq| bucket.get_mut(seq).
| - ^-----
| | |
| ___________________________________________|_variable captured here
| | |
| | inferred to be a `FnMut` closure
14 | | unwrap().drain(..)
| |______________________________^ returns a reference to a captured variable which escapes the closure body
|
= note: `FnMut` closures only have access to their captured variables while they are executing...
= note: ...therefore, they cannot allow references to captured variables to escape
Do I need to go unsafe? How? And more importantly, is it reasonable to want to remove that clone?
You can eliminate bucket[seq].clone() using std::mem::take():
let tmp = tmp.iter().flat_map(
|seq| std::mem::take(bucket.get_mut(seq).unwrap()),
);
That will transfer ownership of the existing Vec and leave an empty one in the hash map. Since the map remains in a well-defined state, this is 100% safe. Since the empty vector doesn't allocate, it is also efficient. And finally, since you can guarantee that you no longer access that key, it is correct. (Playground.)
As pointed out in the comments, an alternative is to remove the vector from the hash map, which also transfer the ownership of the vector:
let tmp = tmp.iter().flat_map(|seq| bucket.remove(seq).unwrap());
The outer cluster.clone() cannot be replaced with take() because you need the old contents. The issue here is that you cannot extend the vector you are iterating over, which Rust doesn't allow in order to implement efficient pointer-based iteration. A simple and effective solution here would be to use indices instead of iteration (playground):
clusters.iter_mut().for_each(|cluster| {
let initial_len = cluster.len();
for ind in 0..initial_len {
let seq = cluster[ind];
cluster.extend(std::mem::take(bucket.get_mut(seq).unwrap()));
}
});
Of course, with indexing you pay the price of indirection and bound checks, but rustc/llvm is pretty good at removing both when it is safe to do so, and even if it doesn't, indexed access might still be more efficient than cloning. The only way to be sure whether this improves on your original code is to benchmark both versions on production data.

error[E0597]: borrowed value does not live long enough in While loop

I am really new to Rust, I am having trouble solving this error, but it only happens if I comment out the while statement , basicly I am asking values from the console and storing it in a HashMap:
use std::collections::HashMap;
use std::io;
fn main() {
let mut customers = HashMap::new();
let mut next_customer = true;
while next_customer {
let mut input_string = String::new();
let mut temp_vec = Vec::with_capacity(3);
let mut vec = Vec::with_capacity(2);
println!("Insert new customer f.e = customer id,name,address:");
io::stdin().read_line(&mut input_string);
input_string = input_string.trim().to_string();
for s in input_string.split(",") {
temp_vec.push(s);
}
vec.push(temp_vec[1]);
vec.push(temp_vec[2]);
let mut key_value = temp_vec[0].parse::<i32>().unwrap();
customers.insert(key_value, vec);
next_customer = false;
}
println!("DONE");
}
The code results in the error
error[E0597]: `input_string` does not live long enough
--> src/main.rs:14:18
|
14 | for s in input_string.split(",") {
| ^^^^^^^^^^^^ borrowed value does not live long enough
...
20 | customers.insert(key_value, vec);
| --------- borrow later used here
21 | next_customer = false;
22 | }
| - `input_string` dropped here while still borrowed
As others have said the problem lies with the lifetime and/or type of the values getting put into the customers map.
customers.insert(key_value, vec);
| --------- borrow later used here
Often this happens when the compiler has decided to give an object a type that you didn't expect. To find out what it's doing you can force the type, and see how it complains. Changing the code to:
let mut customers: HashMap<(),()> = HashMap::new();
Gives us two relevant errors:
20 | customers.insert(key_value, vec);
| ^^^^^^^^^ expected `()`, found `i32`
...
20 | customers.insert(key_value, vec);
| ^^^ expected `()`, found struct `std::vec::Vec`
|
= note: expected unit type `()`
found struct `std::vec::Vec<&str>`
So the type that the compiler wants to give our customers object is HashMap<i32, Vec<&str>>
The problem is that the &str lifetime has got to be inside the block as we don't store the Strings anywhere, and they can't have 'static lifetime since they're user input.
This means we probably want a HashMap<i32,Vec<String>>.
Changing the code to use one of those gives us an error about vec not having the right type: It's getting deduced as a Vec<&str>, but we want a Vec<String>.
We have two options.
Convert the vec to the right type just before we insert it into the map using customers.insert(key_value, vec.iter().map(|s| s.to_string()).collect()). (Though you may want to extract it to a variable for clarity).
Explicitly change the type of vec to Vec<String>
Option 1 "just works". While option 2 leads us down a path of making similar changes closer and closer to the read_line call.
Once you've decided on the fix in option 1, you can remove the manual type annotations that were added to work out the fix, if you find them overly noisy.
The issue is that you are passing around reference to underlying &str values that will get dropped. One way is to take the input string, trim and split it, then clone it going into the other vector.
let temp_vec: Vec<String> = input_string.trim().split(",").map(|t| t.to_string()).collect();
vec.push(temp_vec[1].clone());
vec.push(temp_vec[2].clone());

swapping two entries of a HashMap

i have a simple HashMap; say HashMap<char, char>.
is there a way to swap two elements in this hashmap using std::mem::swap (or any other method)?
Of course there is the simple way getting the values with get and then replace them with insert - but that would trigger the hasher twice (once for getting then for inserting) and i was looking for a way to side-step the second hasher invocation (more out of curiosity than for performance).
what i tried is this (in several versions; none of which worked - and as remarked in the comments: entry would not do what i expect even if i got this past the compiler):
use std::collections::HashMap;
use std::mem::swap;
let mut hash_map: HashMap<char, char> = HashMap::default();
hash_map.insert('A', 'Z');
hash_map.insert('B', 'Y');
swap(&mut hash_map.entry('A'), &mut hash_map.entry('B'));
now the compiler complains (an i understand why it should)
error[E0499]: cannot borrow `hash_map` as mutable more than once at a time
--> tests.rs:103:42
|
103 | swap(&mut hash_map.entry('A'), &mut hash_map.entry('B'));
| ---- -------- ^^^^^^^^ second mutable borrow occurs here
| | |
| | first mutable borrow occurs here
| first borrow later used by call
also just getting the two values this way fails in more or less the same way:
let mut a_val = hash_map.get_mut(&'A').expect("failed to get A value");
let mut b_val = hash_map.get_mut(&'B').expect("failed to get B value");
swap(&mut a_val, &mut b_val);
is there a way to simply swap two entries of a HashMap?
I can't see any safe way to do it:
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.insert('A', 'Z');
map.insert('B', 'Y');
let a = map.get_mut(&'A').unwrap() as *mut char;
let b = map.get_mut(&'B').unwrap() as *mut char;
unsafe {
std::ptr::swap(a, b);
}
assert_eq!(map.get(&'A'), Some(&'Y'));
assert_eq!(map.get(&'B'), Some(&'Z'));
}
There is one completely safe way I can think of to do this safely, but it's super inefficient: what you want is to get two &mut values, which means borrowck needs to know they're nonoverlapping. Missing a builtin along the lines of split_mut (or the collection being handled specially), the only way I see is to mutably iterate the entire collection, keep refs to the items you're interested in, and swap that:
let mut h = HashMap::new();
h.insert("a", "a");
h.insert("b", "b");
let mut it = h.iter_mut();
let e0 = it.next().unwrap();
let e1 = it.next().unwrap();
std::mem::swap(e0.1, e1.1);
println!("{:?}", h);
It requires a linear traversal of the map until you've found the entries whose values you want to swap though. So even though this has the advantage of not hashing at all edwardw's is answer is probably more practical.

Why does a variable holding the result of Vec::get_mut not need to be mutable?

I have the following code:
fn main() {
let mut vec = Vec::new();
vec.push(String::from("Foo"));
let mut row = vec.get_mut(0).unwrap();
row.push('!');
println!("{}", vec[0])
}
It prints out "Foo!", but the compiler tells me:
warning: variable does not need to be mutable
--> src/main.rs:4:9
|
4 | let mut row = vec.get_mut(0).unwrap();
| ----^^^
| |
| help: remove this `mut`
Surprisingly, removing the mut works. This raises a few questions:
Why does this work?
Why doesn't this work when I use vec.get instead of vec.get_mut, regardless of whether I use let or let mut?
Why doesn't vec work in the same way, i.e. when I use let vec = Vec::new(), why can't I call vec.push()?
vec.get_mut(0) returns an Option<&mut String>, so when you unwrap that value you will have a mutable borrow of a String. Remember, that a let statement's left side is using pattern matching, so when your pattern is just a variable name you essentially say match whatever is on the right and call it name. Thus row matches against &mut String so it already is mutable.
Here's a much simpler and more straightforward example to illustrate the case (which you can try in the playground):
fn main() {
let mut x = 55i32;
dbg!(&x);
let y = &mut x; // <-- y's type is `&mut i32`
*y = 12;
dbg!(&x);
}

Resources