Several managed closures with shared state? - rust

I am trying to implement a Rust wrapper for Expat XML parser. I wrapped the start_element, end_element callbacks and they work fine in simple cases (e.g. just counting XML elements) as follows:
struct Expat {
parser: expat::XML_Parser
}
type StartHandler = #fn(tag: &str, attrs: &[~str]);
type EndHandler = #fn(tag: &str);
type TextHandler = #fn(text: &str);
struct Handlers {
start_handler: StartHandler,
end_handler: EndHandler,
text_handler: TextHandler
}
impl Expat {
pub fn handlers(&self, start_handler: StartHandler, end_handler: EndHandler, text_handler: TextHandler) {
let handlers = #Handlers {
start_handler: start_handler,
end_handler: end_handler,
text_handler: text_handler
};
// How to do this properly?
unsafe { cast::bump_box_refcount(handlers) };
expat::XML_SetUserData(self.parser, unsafe { cast::transmute(&*handlers) });
}
I can pass simple managed closures to handlers() and have them update #mut uint values.
Now I want to maintain the current XPath across callbacks and am having problems:
let mut xpath: ~[~str] = ~[];
let xpath_start_handler: #fn(&str, &[~str]) = |tag: &str, _attrs: &[~str]| {
vec::push(&xpath, tag.to_owned());
println(fmt!(" start: %?", xpath));
};
let xpath_end_handler: #fn(&str) = |tag: &str| {
println(fmt!(" end: %?", xpath));
let top = vec::pop(&xpath);
if top != tag.to_owned() {
fail!(fmt!("expected end tag: %s, received end tag: %s", top, tag));
}
};
let xpath_text_handler: #fn(&str) = |_text: &str| {
};
expat.handlers(xpath_start_handler, xpath_end_handler, xpath_text_handler);
The compiler says that the unique vector xpath was moved into the xpath_start_handler closure and cannot be accessed in xpath_end_closure.
So my question is what is the best way to maintain mutable state across many managed closures?

Shared boxes should be managed, not unique:
let state: #mut ~[~str] = #mut ~[];
let push: #fn(~str) = |x| {
vec::push(state, x);
};
let pop: #fn() -> ~str = || {
vec::pop(state)
};
let count: #fn() -> uint = || {
(&*state).len()
};
push(~"ho");
push(~"hey");
println(fmt!("%?", count()));
println(pop());
println(pop());
println(fmt!("%?", count()));
Also, mutable unique boxes work a bit differently than mutable managed boxes.

Related

Iterating over a vector gives me a different value than what is inside the vector Rust

I have been using Petgraph recently to make simple graphs with Structs for nodes and custom edges, but I have come across a problem which I am unsure if it comes from the library or Rust.
I have a graph, in which I have multiple nodes, each nodes have a name. I then put all of the index of the node (with type NodeIndex) in a vector, since Petgraph doesn't have a function to give all the nodes from a graph. I want to then create a function that given a string, it returns the index of the node that matches the name.
My problem is that somehow the type in the vector containing the nodes seems to change. I store it as NodeIndex yet the types somehow change by themselves to u32 without me changing anything. Since it changes automatically, I can't pass the values inside Petgraph functions since they require NodeIndex as inputs and not u32.
The code following is what I have so far and the problem arises in the function find_node_index_with_name where the types seem to change even though I pass a vector of NodeIndex as input so when I iterate over it, I should also get NodeIndex back.
use petgraph::adj::NodeIndex;
use petgraph::stable_graph::StableGraph;
use petgraph::dot::Dot;
#[derive(Clone,Debug,Default)]
struct ControlBloc
{
name:String,
value:u32,
}
fn create_bloc(name:String,value:u32) -> ControlBloc
{
ControlBloc
{
name,
value,
}
}
fn find_node_index_with_name(gr:StableGraph<ControlBloc,u32> , nodes:Vec<NodeIndex> , name_search:String) -> Option<NodeIndex>
{
for i in 0..nodes.len()
{
if gr.node_weight(nodes[i]).unwrap().name == name_search
{
return nodes[i];
}
}
return None;
}
fn main() {
let mut graph = StableGraph::<ControlBloc,u32>::new();
let m = create_bloc(String::from("Main"),10);
let b1 = create_bloc(String::from("sub1"),20);
let b2 = create_bloc(String::from("sub2"),30);
let main = graph.add_node(m);
let sub1 = graph.add_node(b1);
let sub2 = graph.add_node(b2);
let all_nodes = vec![main,sub1,sub2];
println!("{:?}",find_node_index_with_name(graph, all_nodes, String::from("Main")));
}
I am a bit stumped as to why the types change.
Thank you for any inputs!
graph.add_node() returns a petgraph::graph::NodeIndex.
But you used petgraph::adj::NodeIndex which appears to be a different type (don't ask me why), thus the type mismatch.
I took the liberty to change a bit your code in order to use references where you used owned values.
use petgraph::graph::NodeIndex; // graph not adj
use petgraph::stable_graph::StableGraph;
#[derive(Clone, Debug, Default)]
struct ControlBloc {
name: String,
value: u32,
}
fn create_bloc(
name: String,
value: u32,
) -> ControlBloc {
ControlBloc { name, value }
}
fn find_node_index_with_name(
gr: &StableGraph<ControlBloc, u32>,
nodes: &[NodeIndex],
name_search: &str,
) -> Option<NodeIndex> {
nodes
.iter()
.map(|n| *n)
.find(|n| gr.node_weight(*n).unwrap().name == name_search)
/*
for i in 0..nodes.len() {
if gr.node_weight(nodes[i]).unwrap().name == name_search {
return Some(nodes[i]);
}
}
None
*/
}
fn main() {
let mut graph = StableGraph::<ControlBloc, u32>::new();
let m = create_bloc(String::from("Main"), 10);
let b1 = create_bloc(String::from("sub1"), 20);
let b2 = create_bloc(String::from("sub2"), 30);
let main = graph.add_node(m);
let sub1 = graph.add_node(b1);
let sub2 = graph.add_node(b2);
let all_nodes = vec![main, sub1, sub2];
for n in ["Main", "sub1", "sub2"] {
println!("{:?}", find_node_index_with_name(&graph, &all_nodes, n));
}
}
/*
Some(NodeIndex(0))
Some(NodeIndex(1))
Some(NodeIndex(2))
*/

Maintaining a mutable reference to struct in HashMap

Is it possible to borrow a mutable reference to the contents of a HashMap and use it for an extended period of time without impeding read-only access?
This is for trying to maintain a window into the state of various components in a system that are running independently (via Tokio) and need to be monitored.
As an example:
use std::sync::Arc;
use std::collections::HashMap;
struct Container {
running : bool,
count : u8
}
impl Container {
fn run(&mut self) {
for i in 1..100 {
self.count = i;
}
self.running = false;
}
}
fn main() {
let mut map = HashMap::new();
let mut container = Arc::new(
Box::new(
Container {
running: true,
count: 0
}
)
);
map.insert(0, container.clone());
container.run();
map.remove(&0);
}
This is for a Tokio-driven program where multiple operations will be happening asynchronously and visibility into the overall state of them is required.
There's this question where a temporary mutable reference can be borrowed, but that won't work as the run() function needs time to complete.
Based on suggestions from Jmb and Stargateur reworked this to use a RwLock internally. These internals could be reworked by having methods that manipulate them, but the basics are here:
use std::sync::Arc;
use std::sync::RwLock;
use std::collections::HashMap;
#[derive(Debug)]
struct ContainerState {
running : bool,
count : u8
}
struct Container {
state : Arc<RwLock<ContainerState>>
}
impl Container {
fn run(&self) {
for i in 1..100 {
let mut state = self.state.write().unwrap();
state.count = i;
}
{
let mut state = self.state.write().unwrap();
state.running = false;
}
}
}
fn main() {
let mut map = HashMap::new();
let state = Arc::new(
RwLock::new(
ContainerState {
running: true,
count: 0
}
)
);
map.insert(0, state);
let container = Container {
state: map[&0].clone()
};
container.run();
println!("Final state: {:?}", map[&0]);
map.remove(&0);
}
Where the key thing I was missing is you can have a mutable reference or multiple immutable references, and they're mutually exclusive. My initial understanding was that these two limits were independent.

Rust: Implement AVL Tree and error: thread 'main' panicked at 'already borrowed: BorrowMutError'

I have the following tree structure:
use std::cell::RefCell;
use std::rc::Rc;
use std::cmp;
use std::cmp::Ordering;
type AVLTree<T> = Option<Rc<RefCell<TreeNode<T>>>>;
#[derive(Debug, PartialEq, Clone)]
struct TreeSet<T: Ord> {
root: AVLTree<T>,
}
impl<T: Ord> TreeSet<T> {
fn new() -> Self {
Self {
root: None
}
}
fn insert(&mut self, value: T) -> bool {
let current_tree = &mut self.root;
while let Some(current_node) = current_tree {
let node_key = &current_node.borrow().key;
match node_key.cmp(&value) {
Ordering::Less => { let current_tree = &mut current_node.borrow_mut().right; },
Ordering::Equal => {
return false;
}
Ordering::Greater => { let current_tree = &mut current_node.borrow_mut().left; },
}
}
*current_tree = Some(Rc::new(RefCell::new(TreeNode {
key: value,
left: None,
right: None,
parent: None
})));
true
}
}
#[derive(Clone, Debug, PartialEq)]
struct TreeNode<T: Ord> {
pub key: T,
pub parent: AVLTree<T>,
left: AVLTree<T>,
right: AVLTree<T>,
}
fn main() {
let mut new_avl_tree: TreeSet<u32> = TreeSet::new();
new_avl_tree.insert(3);
new_avl_tree.insert(5);
println!("Tree: {:#?}", &new_avl_tree);
}
Building with cargo build is fine, but when I run cargo run, I got the below error:
thread 'main' panicked at 'already borrowed: BorrowMutError', src\libcore\result.rs:1165:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace. error: process didn't
exit successfully: target\debug\avl-tree.exe (exit code: 101)
If i just call insert(3), it will be fine and my tree gets printed correctly. However, if I insert(5) after insert(3), I will get that error.
How do I fix that?
Manually implementing data structures such as linked list, tree, graph are not task for novices, because of memory safety rules in language. I suggest you to read Too Many Linked Lists tutorial, which discusses how to implement safe and unsafe linked lists in Rust right way.
Also read about name shadowing.
Your error is that inside a cycle you try to borrow mutable something which is already borrowed as immutable.
let node_key = &current_node.borrow().key; // Borrow as immutable
match node_key.cmp(&value) {
Ordering::Less => { let current_tree = &mut current_node.borrow_mut().right; }, // Create a binding which will be immediately deleted and borrow as mutable.
And I recommend you to read Rust book to learn rust.
First let us correct your algorithm. The following lines are incorrect:
let current_tree = &mut current_node.borrow_mut().right;
...
let current_tree = &mut current_node.borrow_mut().left;
Both do not reassign a value to current_tree but create a new (unused) one (#Inline refers to it as Name shadowing). Remove the let and make current_tree mut.
Now we get a compiler error temporary value dropped while borrowed. Probably the compiler error message did mislead you. It tells you to use let to increase the lifetime, and this would be right if you used the result in the same scope, but no let can increase the lifetime beyond the scope.
The problem is that you cannot pass out a reference to a value owned by a loop (as current_node.borrow_mut.right). So it would be better to use current_tree as owned variable. Sadly this means that many clever tricks in your code will not work any more.
Another problem in the code is the multiple borrow problem (your original runtime warning is about this). You cannot call borrow() and borrow_mut() on the same RefCell without panic(that is the purpose of RefCell).
So after finding the problems in your code, I got interested in how I would write the code. And now that it is written, I thought it would be fair to share it:
fn insert(&mut self, value: T) -> bool {
if let None = self.root {
self.root = TreeSet::root(value);
return true;
}
let mut current_tree = self.root.clone();
while let Some(current_node) = current_tree {
let mut borrowed_node = current_node.borrow_mut();
match borrowed_node.key.cmp(&value) {
Ordering::Less => {
if let Some(next_node) = &borrowed_node.right {
current_tree = Some(next_node.clone());
} else {
borrowed_node.right = current_node.child(value);
return true;
}
}
Ordering::Equal => {
return false;
}
Ordering::Greater => {
if let Some(next_node) = &borrowed_node.left {
current_tree = Some(next_node.clone());
} else {
borrowed_node.left = current_node.child(value);
return true;
}
}
};
}
true
}
//...
trait NewChild<T: Ord> {
fn child(&self, value: T) -> AVLTree<T>;
}
impl<T: Ord> NewChild<T> for Rc<RefCell<TreeNode<T>>> {
fn child(&self, value: T) -> AVLTree<T> {
Some(Rc::new(RefCell::new(TreeNode {
key: value,
left: None,
right: None,
parent: Some(self.clone()),
})))
}
}
One will have to write the two methods child(value:T) and root(value:T) to make this compile.

Struct property accessable from method but not from outside

I'm trying to build a basic web crawler in Rust, which I'm trying to port to html5ever. As of right now, I have a function with a struct inside that is supposed to return a Vec<String>. It gets this Vec from the struct in the return statement. Why does it always return an empty vector? (Does it have anything to do with the lifetime parameters?)
fn find_urls_in_html<'a>(
original_url: &Url,
raw_html: String,
fetched_cache: &Vec<String>,
) -> Vec<String> {
#[derive(Clone)]
struct Sink<'a> {
original_url: &'a Url,
returned_vec: Vec<String>,
fetched_cache: &'a Vec<String>,
}
impl<'a> TokenSink for Sink<'a> {
type Handle = ();
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> {
trace!("token {:?}", token);
match token {
TagToken(tag) => {
if tag.kind == StartTag && tag.attrs.len() != 0 {
let _attribute_name = get_attribute_for_elem(&tag.name);
if _attribute_name == None {
return TokenSinkResult::Continue;
}
let attribute_name = _attribute_name.unwrap();
for attribute in &tag.attrs {
if &attribute.name.local != attribute_name {
continue;
}
trace!("element {:?} found", tag);
add_urls_to_vec(
repair_suggested_url(
self.original_url,
(&attribute.name.local, &attribute.value),
),
&mut self.returned_vec,
&self.fetched_cache,
);
}
}
}
ParseError(error) => {
warn!("error parsing html for {}: {:?}", self.original_url, error);
}
_ => {}
}
return TokenSinkResult::Continue;
}
}
let html = Sink {
original_url: original_url,
returned_vec: Vec::new(),
fetched_cache: fetched_cache,
};
let mut byte_tendril = ByteTendril::new();
{
let tendril_push_result = byte_tendril.try_push_bytes(&raw_html.into_bytes());
if tendril_push_result.is_err() {
warn!("error pushing bytes to tendril: {:?}", tendril_push_result);
return Vec::new();
}
}
let mut queue = BufferQueue::new();
queue.push_back(byte_tendril.try_reinterpret().unwrap());
let mut tok = Tokenizer::new(html.clone(), std::default::Default::default()); // default default! default?
let feed = tok.feed(&mut queue);
return html.returned_vec;
}
The output ends with no warning (and a panic, caused by another function due to this being empty). Can anyone help me figure out what's going on?
Thanks in advance.
When I initialize the Tokenizer, I use:
let mut tok = Tokenizer::new(html.clone(), std::default::Default::default());
The problem is that I'm telling the Tokenizer to use html.clone() instead of html. As such, it is writing returned_vec to the cloned object, not html. Changing a few things, such as using a variable with mutable references, fixes this problem.

What's an idiomatic way to delete a value from HashMap if it is empty?

The following code works, but it doesn't look nice as the definition of is_empty is too far away from the usage.
fn remove(&mut self, index: I, primary_key: &Rc<K>) {
let is_empty;
{
let ks = self.data.get_mut(&index).unwrap();
ks.remove(primary_key);
is_empty = ks.is_empty();
}
// I have to wrap `ks` in an inner scope so that we can borrow `data` mutably.
if is_empty {
self.data.remove(&index);
}
}
Do we have some ways to drop the variables in condition before entering the if branches, e.g.
if {ks.is_empty()} {
self.data.remove(&index);
}
Whenever you have a double look-up of a key, you need to think Entry API.
With the entry API, you get a handle to a key-value pair and can:
read the key,
read/modify the value,
remove the entry entirely (getting the key and value back).
It's extremely powerful.
In this case:
use std::collections::HashMap;
use std::collections::hash_map::Entry;
fn remove(hm: &mut HashMap<i32, String>, index: i32) {
if let Entry::Occupied(o) = hm.entry(index) {
if o.get().is_empty() {
o.remove_entry();
}
}
}
fn main() {
let mut hm = HashMap::new();
hm.insert(1, String::from(""));
remove(&mut hm, 1);
println!("{:?}", hm);
}
I did this in the end:
match self.data.entry(index) {
Occupied(mut occupied) => {
let is_empty = {
let ks = occupied.get_mut();
ks.remove(primary_key);
ks.is_empty()
};
if is_empty {
occupied.remove();
}
},
Vacant(_) => unreachable!()
}

Resources