Cursor of HashMap records with RwLockGuard in Rust - rust

I am new to Rust, and I am trying to implement a simple, thread-safe memory key-value store, using a HashMap protected within a RwLock. My code looks like this:
use std::sync::{ Arc, RwLock, RwLockReadGuard };
use std::collections::HashMap;
use std::collections::hash_map::Iter;
type SimpleCollection = HashMap<String, String>;
struct Store(Arc<RwLock<SimpleCollection>>);
impl Store {
fn new() -> Store { return Store(Arc::new(RwLock::new(SimpleCollection::new()))) }
fn get(&self, key: &str) -> Option<String> {
let map = self.0.read().unwrap();
return map.get(&key.to_string()).map(|s| s.clone());
}
fn set(&self, key: &str, value: &str) {
let mut map = self.0.write().unwrap();
map.insert(key.to_string(), value.to_string());
}
}
So far, this code works OK. The problem is that I am trying to implement a scan() function, which returns a Cursor object that can be used to iterate over all the records. I want the Cursor object to hold a RwLockGuard, which is not released until the cursor itself is released (basically I don't want to allow modifications while a Cursor is alive).
I tried this:
use ...
type SimpleCollection = HashMap<String, String>;
struct Store(Arc<RwLock<SimpleCollection>>);
impl Store {
...
fn scan(&self) -> Cursor {
let guard = self.0.read().unwrap();
let iter = guard.iter();
return Cursor { guard, iter };
}
}
struct Cursor<'l> {
guard: RwLockReadGuard<'l, SimpleCollection>,
iter: Iter<'l, String, String>
}
impl<'l> Cursor<'l> {
fn next(&mut self) -> Option<(String, String)> {
return self.iter.next().map(|r| (r.0.clone(), r.1.clone()));
}
}
But that did not work, as I got this compilation error:
error[E0597]: `guard` does not live long enough
--> src/main.rs:24:20
|
24 | let iter = guard.iter();
| ^^^^^ borrowed value does not live long enough
25 | return Cursor { guard, iter };
26 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the anonymous lifetime #1 defined on the method body at 22:5...
--> src/main.rs:22:5
|
22 | / fn scan(&self) -> Cursor {
23 | | let guard = self.0.read().unwrap();
24 | | let iter = guard.iter();
25 | | return Cursor { guard, iter };
26 | | }
| |_____^
Any ideas?

As mentioned in the comments, the problem is that structs generally can't be self-referential in Rust. The Cursor struct you are trying to construct contains both the MutexGuard and the iterator borrowing the MutexGuard, which is not possible (for good reasons – see the linked question).
The easiest fix in this case is to introduce a separate struct storing the MutexGuard, e.g.
struct StoreLock<'a> {
guard: RwLockReadGuard<'a, SimpleCollection>,
}
On the Store, we can then introduce a method returning a StoreLock
fn lock(&self) -> StoreLock {
StoreLock { guard: self.0.read().unwrap() }
}
and the StoreLock can expose the actual scan() method (and possibly others requiring a persistent lock):
impl<'a> StoreLock<'a> {
fn scan(&self) -> Cursor {
Cursor { iter: self.guard.iter() }
}
}
The Cursor struct itself only contains the iterator:
struct Cursor<'a> {
iter: Iter<'a, String, String>,
}
Client code first needs to obtain the lock, then get the cursor:
let lock = s.lock();
let cursor = lock.scan();
This ensures that the lock lives long enough to finish scanning.
Full code on the playground

Related

Hashmap multiple mutable borrow issue after reference drop

I am trying to pass around a HashMap which stores values through a set of nested enums/structs. The problem of multiple mutability happens during iteration, even all references should be dropped.
The general idea is to have a vector of values, iterate through them and simplify them, keeping track of them within the HashMap. There are two stages of simplification.
The general flow looks something like
run(Vec<ComplexVal>)
-for each val->
val.fix_complex(holder)
-for each `smp` SimpleVal in val->
basicval = Simplifier::step(smp, holder)
holder.insert("name", basicval)
But the problem is that the holder is borrowed mutably in each stage, and there isn't supposed to be any reference from the ComplexVal to the holder and since the borrowchecker doesn't like multiple borrows, it fails.
Full playground snippet: here
It happens in this snippet:
pub fn run(&mut self, mut vals: Vec<ComplexVal>) {
let mut holder = Holder{hold:HashMap::new()};
// .. setup holder code omitted
let len = vals.len();
for _ in 0..len {
let mut val = vals.remove(0); // remove from vec, should drop after running
println!("Running {:?}", val);
match val {
ComplexVal::Cmplx1(mut c) => {
c.fix_complex(&mut holder)
},
//... more cases of different types of values omitted for simplicity
}
// val *should* be dropped here, and therefore the mutable borrow of holder?
}
println!("Holder: {:?}", holder);
}
}
The only thing I can think of is that it somehow is related to the BasicVal::Ref(&BasicVal) value when created.
I need to return a reference of type &BasicVal so I can't use a regular fn() -> &BasicVal as the reference would be dangling, so I pass a ret value which is to be modified and used as the storage for the return value.
I have also tried just returning the enum BasicVal::Ref(&BasicVal), but run into the same mutability issues.
The example below is a much more simple version which (sort of) demonstrates the same error, just thought I'd include this context in case someone has another idea on how to implement this which wouldn't have these issues
Code (edited)
Updated playground link
Edit: I made a mistake in not needing the lifetimes of both holder and ret to explicitly be the same, so I have made an updated example for it
use std::borrow::BorrowMut;
///////////////////////////////
use std::cell::{RefCell, RefMut};
use std::collections::HashMap;
#[derive(Debug)]
enum BasicVal<'a> {
Ref(&'a BasicVal<'a>),
Val1(BasicStruct),
}
#[derive(Debug)]
struct Holder<'b> {
hold: HashMap<String, RefCell<BasicVal<'b>>>,
}
#[derive(Debug)]
struct BasicStruct {
val: i32,
}
impl<'a> BasicVal<'a> {
pub fn empty() -> Self { BasicVal::Val1(BasicStruct { val: 0 }) }
}
// must match sig of modify_val_ref
fn modify_val<'f>(holder: &'f mut Holder<'f>, mut ret: RefMut<BasicVal<'f>>) {
*ret = BasicVal::Val1(BasicStruct { val: 5 });
}
// must match sig of modify_val
fn modify_val_ref<'f>(holder: &'f mut Holder<'f>, mut ret: RefMut<BasicVal<'f>>) {
ret = holder.hold.get("reference_val").unwrap().borrow_mut();
}
fn do_modify<'f>(holder: &'f mut Holder<'f>) {
let mut v = RefCell::new(BasicVal::empty());
println!("Original {:?}", v);
modify_val(holder, v.borrow_mut());
holder.hold.insert("Data".to_string(), v);
println!("Modified {:?}", holder.hold.get("Data"));
}
pub fn test_dropborrow() {
let mut holder = Holder { hold: HashMap::new() };
holder.hold.insert(
"reference_val".to_string(),
RefCell::new(BasicVal::Val1(BasicStruct { val: 8 })),
);
do_modify(&mut holder);
}
pub fn main() {
test_dropborrow();
}
Edit: Using just the holder for a temp return value gives me a multiple mutable borrow issue, so that workaround doesn't work. I have also tried it with a RefCell with the same issue.
fn modify_val<'f>(holder: &'f mut Holder<'f>) {
holder.hold.insert("$return".to_string(), BasicVal::Val1(BasicStruct{val: 5}));
}
fn do_modify<'f>(holder: &'f mut Holder<'f>) {
modify_val(holder);
let mut v = holder.hold.remove("$return").unwrap();
holder.hold.insert("Data".to_string(), v);
println!("Modified {:?}", v);
}
Error:
935 | fn do_modify<'f>(holder: &'f mut Holder<'f>) {
| -- lifetime `'f` defined here
936 |
937 | modify_val(holder);
| ------------------
| | |
| | first mutable borrow occurs here
| argument requires that `*holder` is borrowed for `'f`
938 | let mut v = holder.hold.remove("$return").unwrap();
| ^^^^^^^^^^^ second mutable borrow occurs here
Any help is greatly appreciated!!!
Figured it out, essentially the BasicVal<'a> was causing Holder to mutably borrow itself in successive iterations of the loop, so removing the lifetime was pretty much the only solution

Borrowing errors whilst mutating in for loop

I am having issues with the borrow checker and temporary values in rust.
I'm hoping to find a solution to this specific problem as well as better learn how to handle this kind of situation in the future. I originally started off with a for_each but ran into issues terminating early.
I considered moving the logic of check_foo into update_foo, however this wouldn't work well for my real world solution; the MRE focuses on the compilation issues vs what I'm trying to achieve in the whole program.
Edit: Is there a way for me to achieve this in a purely functional approach?
I want to iterate over a range of numbers, updating a Vec<Foo> and potentially returning early with a value. Below is a minimal reproducible example of my code with the same errors:
I tried tried implementing as:
fn run<'a>(mut foos: Vec<Foo>) -> Vec<&'a u32> {
let mut bar: Vec<&u32> = vec![];
for num in 0..10 {
for foo in &mut foos {
update_foo(foo);
let checked_foo = check_foo(&foo);
if checked_foo.is_empty() {
bar = checked_foo;
break;
}
}
}
bar
}
/* `fn update_foo` and `fn check_foo` definitions same as below */
but this resulted in:
21 | for foo in &mut foos {
| ^^^^^^^^^ `foos` was mutably borrowed here in the previous iteration of the loop
To overcome this I added the use of Rc and RefCell to allow me to iterate over a reference whilst still being able to mutate:
#[derive(Clone, Debug, PartialEq)]
pub struct Foo {
updated: bool,
}
fn run<'a>(foos: Vec<Rc<RefCell<Foo>>>) -> Vec<&'a u32> {
let mut bar: Vec<&u32> = vec![];
for num in 0..10 {
for foo in &foos {
update_foo(&mut foo.borrow_mut());
let checked_foo = check_foo(&foo.borrow());
if checked_foo.is_empty() {
bar = checked_foo;
break;
}
}
}
bar
}
fn update_foo(foo: &mut Foo) {
foo.updated = true
}
fn check_foo(foo: &Foo) -> Vec<&u32> {
if foo.updated {
vec![&0, &1, &2]
} else {
vec![]
}
}
which results in:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:33:5
|
26 | let checked_foo = check_foo(&foo.borrow());
| ------------ temporary value created here
...
33 | bar
| ^^^ returns a value referencing data owned by the current function
error[E0515]: cannot return value referencing function parameter `foos`
--> src/main.rs:33:5
|
23 | for foo in &foos {
| ----- `foos` is borrowed here
...
33 | bar
| ^^^ returns a value referencing data owned by the current function
For more information about this error, try `rustc --explain E0515`.
I'm not entirely sure what you plan to do with this, but it seems to me like a few of the references you're using should be owned. Here's what I came up with.
#[derive(Clone, Debug, PartialEq)]
pub struct Foo {
updated: bool,
}
fn run(foos: &mut Vec<Foo>) -> Vec<u32> {
let mut bar: Vec<u32> = vec![];
for num in 0..10 {
for foo in foos.iter_mut() {
update_foo(foo);
let checked_foo = check_foo(&foo);
if checked_foo.is_empty() {
bar = checked_foo;
break;
}
}
}
bar
}
fn update_foo(foo: &mut Foo) {
foo.updated = true
}
fn check_foo(foo: &Foo) -> Vec<u32> {
if foo.updated {
vec![0, 1, 2]
} else {
vec![]
}
}
References should be used when you expect some other struct to own the objects you are referring to, but here you're constructing new vectors with new data, so you should keep the elements owned.

rust E0597: borrowed value does not live lnog enough

I am trying to rewrite an algorithm from javascript to rust. In the following code, I get borrowed value does not live long enough error at line number 17.
[dependencies]
scraper = "0.11.0"
use std::fs;
fn get_html(fname: &str) -> String {
fs::read_to_string(fname).expect("Something went wrong reading the file")
}
pub mod diff_html {
use scraper::{element_ref::ElementRef, Html};
pub struct DiffNode<'a> {
node_ref: ElementRef<'a>,
}
impl<'a> DiffNode<'a> {
fn from_html(html: &str) -> Self {
let doc = Self::get_doc(&html);
let root_element = doc.root_element().to_owned();
let diffn = Self {
node_ref: root_element,
};
diffn
}
fn get_doc(html: &str) -> Html {
Html::parse_document(html).to_owned()
}
}
pub fn diff<'a>(html1: &str, _html2: &str) -> DiffNode<'a> {
let diff1 = DiffNode::from_html(&html1);
diff1
}
}
fn main() {
//read strins
let filename1: &str = "test/test1.html";
let filename2: &str = "test/test2.html";
let html1: &str = &get_html(filename1);
let html2: &str = &get_html(filename2);
let diff1 = diff_html::diff(html1, html2);
//write html
//fs::write("test_outs/testx.html", html1).expect("unable to write file");
//written output file.
}
warning: unused variable: `diff1`
--> src\main.rs:43:9
|
43 | let diff1 = diff_html::diff(html1, html2);
| ^^^^^ help: if this is intentional, prefix it with an underscore: `_diff1`
|
= note: `#[warn(unused_variables)]` on by default
error[E0597]: `doc` does not live long enough
--> src\main.rs:17:32
|
14 | impl<'a> DiffNode<'a> {
| -- lifetime `'a` defined here
...
17 | let root_element = doc.root_element().to_owned();
| ^^^--------------------------
| |
| borrowed value does not live long enough
| assignment requires that `doc` is borrowed for `'a`
...
22 | }
| - `doc` dropped here while still borrowed
I want a detailed explanation/solution if possible.
root_element which is actually an ElementRef has reference to objects inside doc, not the actual owned object. The object doc here is created in from_html function and therefore owned by the function. Because doc is not returned, it is dropped / deleted from memory at the end of from_html function block.
ElementRef needs doc, the thing it is referencing to, to be alive when it is returned from the memory.
pub mod diff_html {
use scraper::{element_ref::ElementRef, Html};
pub struct DiffNode<'a> {
node_ref: ElementRef<'a>,
}
impl<'a> DiffNode<'a> {
fn from_html(html: &'a scraper::html::Html) -> Self {
Self {
node_ref: html.root_element(),
}
}
}
pub fn diff<'a>(html1_string: &str, _html2_string: &str) {
let html1 = Html::parse_document(&html1_string);
let diff1 = DiffNode::from_html(&html1);
// do things here
// at the end of the function, diff1 and html1 is dropped together
// this way the compiler doesn't yell at you
}
}
More or less you need to do something like this with diff function to let the HTML and ElementRef's lifetime to be the same.
This behavior is actually Rust's feature to guard values in memory so that it doesn't leak or reference not referencing the wrong memory address.
Also if you want to feel like operating detachable objects and play with reference (like java, javascript, golang) I suggest reading this https://doc.rust-lang.org/book/ch15-05-interior-mutability.html

Iterate through a whole file one character at a time

I'm new to Rust and I'm struggle with the concept of lifetimes. I want to make a struct that iterates through a file a character at a time, but I'm running into issues where I need lifetimes. I've tried to add them where I thought they should be but the compiler isn't happy. Here's my code:
struct Advancer<'a> {
line_iter: Lines<BufReader<File>>,
char_iter: Chars<'a>,
current: Option<char>,
peek: Option<char>,
}
impl<'a> Advancer<'a> {
pub fn new(file: BufReader<File>) -> Result<Self, Error> {
let mut line_iter = file.lines();
if let Some(Ok(line)) = line_iter.next() {
let char_iter = line.chars();
let mut advancer = Advancer {
line_iter,
char_iter,
current: None,
peek: None,
};
// Prime the pump. Populate peek so the next call to advance returns the first char
let _ = advancer.next();
Ok(advancer)
} else {
Err(anyhow!("Failed reading an empty file."))
}
}
pub fn next(&mut self) -> Option<char> {
self.current = self.peek;
if let Some(char) = self.char_iter.next() {
self.peek = Some(char);
} else {
if let Some(Ok(line)) = self.line_iter.next() {
self.char_iter = line.chars();
self.peek = Some('\n');
} else {
self.peek = None;
}
}
self.current
}
pub fn current(&self) -> Option<char> {
self.current
}
pub fn peek(&self) -> Option<char> {
self.peek
}
}
fn main() -> Result<(), Error> {
let file = File::open("input_file.txt")?;
let file_buf = BufReader::new(file);
let mut advancer = Advancer::new(file_buf)?;
while let Some(char) = advancer.next() {
print!("{}", char);
}
Ok(())
}
And here's what the compiler is telling me:
error[E0515]: cannot return value referencing local variable `line`
--> src/main.rs:37:13
|
25 | let char_iter = line.chars();
| ---- `line` is borrowed here
...
37 | Ok(advancer)
| ^^^^^^^^^^^^ returns a value referencing data owned by the current function
error[E0597]: `line` does not live long enough
--> src/main.rs:49:34
|
21 | impl<'a> Advancer<'a> {
| -- lifetime `'a` defined here
...
49 | self.char_iter = line.chars();
| -----------------^^^^--------
| | |
| | borrowed value does not live long enough
| assignment requires that `line` is borrowed for `'a`
50 | self.peek = Some('\n');
51 | } else {
| - `line` dropped here while still borrowed
error: aborting due to 2 previous errors
Some errors have detailed explanations: E0515, E0597.
For more information about an error, try `rustc --explain E0515`.
error: could not compile `advancer`.
Some notes:
The Chars iterator borrows from the String it was created from. So you can't drop the String while the iterator is alive. But that's what happens in your new() method, the line variable owning the String disappears while the iterator referencing it is stored in the struct.
You could also try storing the current line in the struct, then it would live long enough, but that's not an option – a struct cannot hold a reference to itself.
Can you make a char iterator on a String that doesn't store a reference into the String? Yes, probably, for instance by storing the current position in the string as an integer – it shouldn't be the index of the char, because chars can be more than one byte long, so you'd need to deal with the underlying bytes yourself (using e.g. is_char_boundary() to take the next bunch of bytes starting from your current index that form a char).
Is there an easier way? Yes, if performance is not of highest importance, one solution is to make use of Vec's IntoIterator instance (which uses unsafe magic to create an object that hands out parts of itself) :
let char_iter = file_buf.lines().flat_map(|line_res| {
let line = line_res.unwrap_or(String::new());
line.chars().collect::<Vec<_>>()
});
Note that just returning line.chars() would have the same problem as the first point.
You might think that String should have a similar IntoIterator instance, and I wouldn't disagree.

Create an iterator and put it into a new struct without bothering the borrow-checker [duplicate]

This question already has answers here:
Is there any way to return a reference to a variable created in a function?
(5 answers)
Closed 3 years ago.
I am trying to create a lexical analyzer which uses itertools::PutBack to make an iterator over the characters in a String. I intend to store the pushback iterator in a struct and delegate methods to it so that I can categorize the characters by an enum, which will then be passed to a state machine at the core of the lexical analyzer (not yet written).
The borrow-checker is not happy with me. Method ParserEventIterator::new near the bottom of the listing causes the error. How do I define the lifetimes or borrowing so that I can get this to compile? Or what Rustic data structure design should I use in its stead?
Ultimately, I would like this to implement the appropriate traits to become a proper iterator. (Newbie to Rust. Prior to this, I have programmed in 28 languages, but this one has me stumped.)
Here is a code sample:
extern crate itertools;
use itertools::put_back;
use std::fmt::Display;
use std::fmt::Formatter;
use std::fmt::Result;
pub enum ParserEvent {
Letter(char),
Digit(char),
Other(char),
}
impl ParserEvent {
fn new(c: char) -> ParserEvent {
match c {
'a'...'z' | 'A'...'Z' => ParserEvent::Letter(c),
'0'...'9' => ParserEvent::Digit(c),
_ => ParserEvent::Other(c),
}
}
}
impl Display for ParserEvent {
fn fmt(&self, f: &mut Formatter) -> Result {
let mut _ctos = |c: char| write!(f, "{}", c.to_string());
match self {
ParserEvent::Letter(letter) => _ctos(*letter),
ParserEvent::Digit(digit) => _ctos(*digit),
ParserEvent::Other(o) => _ctos(*o),
}
}
}
// ParserEventIterator
// Elements ('e) must have lifetime longer than the iterator ('i).
pub struct ParserEventIterator<'i, 'e: 'i> {
char_iter: &'i mut itertools::PutBack<std::str::Chars<'e>>,
}
impl<'i, 'e: 'i> ParserEventIterator<'i, 'e> {
fn new(s: &'e std::string::String) -> ParserEventIterator<'i, 'e> {
// THIS NEXT LINE IS THE LINE WITH THE PROBLEM!!!
ParserEventIterator {
char_iter: &mut put_back(s.chars()),
}
}
fn put_back(&mut self, e: ParserEvent) -> () {
if let Some(c) = e.to_string().chars().next() {
self.char_iter.put_back(c);
}
}
}
impl<'i, 'e: 'i> Iterator for ParserEventIterator<'i, 'e> {
type Item = ParserEvent;
fn next(&mut self) -> Option<ParserEvent> {
match self.char_iter.next() {
Some(c) => Some(ParserEvent::new(c)),
None => None,
}
}
}
fn main() {
let mut _i = ParserEventIterator::new(&String::from("Hello World"));
}
On the Rust Playground
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:43:9
|
43 | / ParserEventIterator {
44 | | char_iter: &mut put_back(s.chars()),
| | ------------------- temporary value created here
45 | | }
| |_________^ returns a value referencing data owned by the current function
Well, the compiler is almost telling you the solution by reflecting to the obvious problem: you can't have a borrow which doesn't live long enough, i.e. the borrow would point to a nonexistent location after the stack memory of the function has been destroyed.
This would happen because the borrow is referencing an object (in this case an itertools::struct::PutBack instance) that has been newly created within the function body. This instance gets destroyed at the end of the function along with all the references to it. So the compiler is preventing you to have a so called dangling pointer.
Thus, instead of borrowing you should move the PutBack instance into your struct:
// ...
pub struct ParserEventIterator<'e> {
char_iter: itertools::PutBack<std::str::Chars<'e>>
}
impl<'e> ParserEventIterator<'e> {
fn new(s: &'e std::string::String) -> ParserEventIterator<'e> {
ParserEventIterator { char_iter: put_back(s.chars()) }
}
// ...
}

Resources