Dealing with a mutable counter variable in closure - rust

I have written a program to visit directories which is based
on the example on this page.
When I compile it, the compile displays the following 'note':
previous borrow of file_counter occurs here due to use in closure;
How can I display file_counter's value?
Is there a better (ie, more functional-like) way to count displayed files, in this program,
perhaps a non-mutable variable and/or recursion?
Many thanks.
fn main() {
let mut file_counter = 0i;
let display_path_closure = |path: &Path| {
file_counter += 1;
println!("{}) path = {}", file_counter, path.display());
};
let path = Path::new("z:/abc");
let _ = match visit_dirs(&path, display_path_closure) {
Err(e) => println!("error: {}", e),
Ok(_) => println!("Counter: {}", file_counter)
};
}
fn visit_dirs(dir: &Path, cb: |&Path|) -> io::IoResult<()> {
if dir.is_dir() {
let contents = try!(fs::readdir(dir));
for entry in contents.iter() {
if entry.is_dir() {
try!(visit_dirs(entry, |p| cb(p)));
} else {
cb(entry);
}
}
Ok(())
} else {
Err(io::standard_error(io::InvalidInput))
}
}

You can get around this by slight restructuring, putting the closure into an inner block:
fn main() {
let path = Path::new("z:/abc");
let mut file_counter = 0i;
let result = {
let display_path_closure = |path: &Path| {
file_counter += 1;
println!("{}) path = {}", file_counter, path.display());
};
visit_dirs(&path, display_path_closure)
};
let _ = match result {
Err(e) => println!("error: {}", e),
Ok(_) => println!("Counter: {}", file_counter)
};
}
As for why it happens, it is because closure captures all its environment by unique reference (mutable in your case), as if it is declared like this (pretending for a moment that closures capture their environment by value; in fact that's how unboxed closures work):
let mut file_counter = 0i;
let file_counter_ref = &mut file_counter;
// file_counter_ref is a plain pointer so it is copied into the closure,
// not taken by reference itself
let display_path_closure = |path: &Path| {
*file_counter_ref += 1;
println!("{}) path = {}", *file_counter_ref, path.display());
};
So file_counter_ref reference lasts to the end of the block in which closure is defined. In your case it is the whole main function starting from the closure declaration. I agree, this may be surprising and I certainly would also think that closure environment borrows die with the closure (e.g. when the closure is moved into the function and this function returns), but that's how things are now.
The situation with closures in Rust is currently unstable: unboxed closures have just been added to the language, so old boxed closures (like the one in this example) will soon go away; moreover, borrow checker is also being improved. These features may interact in complex ways, so maybe your original example will become possible soon :) (just a speculation, of course)

Related

Sharing state between threads with notify-rs

i'm new to rust.
I'm trying to write a file_sensor that will start a counter after a file is created. The plan is that after an amount of time, if a second file is not received the sensor will exit with a zero exit code.
I could write the code to continue that work but i feel the code below illustrates the problem (i have also missed for example the post function referred to)
I have been struggling with this problem for several hours, i've tried Arc and mutex's and even global variables.
The Timer implementation is Ticktock-rs
I need to be able to either get heartbeat in the match body for EventKind::Create(CreateKind::Folder) or file_count in the loop
The code i've attached here runs but file_count is always zero in the loop.
use std::env;
use std::path::Path;
use std::{thread, time};
use std::process::ExitCode;
use ticktock::Timer;
use notify::{
Watcher,
RecommendedWatcher,
RecursiveMode,
Result,
event::{EventKind, CreateKind, ModifyKind, Event}
};
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let mut file_count = 0;
let args = Args::parse();
let REQUEST_SENSOR_PATH = env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher(move|res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count += 1;
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
},
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
},
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now){
println!("Heartbeat: {}, fileCount: {}", n, file_count);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
Ok(())
}
Your compiler warnings show you the problem:
warning: unused variable: `file_count`
--> src/main.rs:31:25
|
31 | file_count += 1;
| ^^^^^^^^^^
|
= note: `#[warn(unused_variables)]` on by default
= help: did you mean to capture by reference instead?
The problem here is that you use file_count inside of a move || closure. file_count is an i32, which is Copy. Using it in a move || closure actually creates a copy of it, which does no longer update the original variable if you assign to it.
Either way, it's impossible to modify a variable in main() from an event handler. Event handlers require 'static lifetime if they reference things, because Rust cannot guarantee that the event handler lives shorter than main.
One solution for this problem is to use reference counters and interior mutability. In this case, I will use Arc for reference counters and AtomicI32 for interior mutability. Note that notify::recommended_watcher requires thread safety, otherwise instead of an Arc<AtomicI32> we could have used an Rc<Cell<i32>>, which is the same thing but only for single-threaded environments, with a little less overhead.
use notify::{
event::{CreateKind, Event, EventKind},
RecursiveMode, Result, Watcher,
};
use std::time;
use std::{env, sync::atomic::Ordering};
use std::{path::Path, sync::Arc};
use std::{process::ExitCode, sync::atomic::AtomicI32};
use ticktock::Timer;
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let file_count = Arc::new(AtomicI32::new(0));
let REQUEST_SENSOR_PATH =
env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher({
let file_count = Arc::clone(&file_count);
move |res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count.fetch_add(1, Ordering::AcqRel);
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
}
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
}
}
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now) {
println!(
"Heartbeat: {}, fileCount: {}",
n,
file_count.load(Ordering::Acquire)
);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
}
Also, note that the ExitCode::from(101); gives you a warning. It does not actually exit the program, it only creates an exit code variable and then discards it again. You probably intended to write std::process::exit(101);. Although I would discourage it, because it does not properly clean up (does not call any Drop implementations). I'd use panic here, instead. This is the exact usecase panic is meant for.

How do I tackle lifetimes in Rust?

I am having issues with the concept of lifetimes in rust. I am trying to use the crate bgpkit_parser to read in a bz2 file via url link and then create a radix trie.
One field extracted from the file is the AS Path which I have named path in my code within the build_routetable function. I am having trouble as to why rust does not like let origin = clean_path.last() which takes the last element in the vector.
fn as_parser(element: &BgpElem) -> Vec<u32> {
let x = &element.as_path.as_ref().unwrap().segments[0];
let mut as_vec = &Vec::new();
let mut as_path: Vec<u32> = Vec::new();
if let AsPathSegment::AsSequence(value) = x {
as_vec = value;
}
for i in as_vec {
as_path.push(i.asn);
}
return as_path;
}
fn prefix_parser(element: &BgpElem) -> String {
let subnet_id = element.prefix.prefix.ip().to_string().to_owned();
let prefix_id = element.prefix.prefix.prefix().to_string().to_owned();
let prefix = format!("{}/{}", subnet_id, prefix_id);//.as_str();
return prefix;
}
fn get_aspath(raw_aspath: Vec<u32>) -> Vec<u32> {
let mut as_path = Vec::new();
for i in raw_aspath {
if i < 64511 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
else if 65535 < i && i < 4000000000 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
}
return as_path;
}
fn build_routetable(mut trie4: Trie<String, Option<&u32>>, mut trie6: Trie<String, Option<&u32>>) {
let url: &str = "http://archive.routeviews.org/route-views.chile/\
bgpdata/2022.06/RIBS/rib.20220601.0000.bz2";
let parser = BgpkitParser::new(url).unwrap();
let mut count = 0;
for elem in parser {
if elem.elem_type == bgpkit_parser::ElemType::ANNOUNCE {
let record_timestamp = &elem.timestamp;
let record_type = "A";
let peer = &elem.peer_ip;
let prefix = prefix_parser(&elem);
let path = as_parser(&elem);
let clean_path = get_aspath(path);
// Issue is on the below line
// `clean_path` does not live long enough
// borrowed value does not live long
// enough rustc E0597
// main.rs(103, 9): `clean_path` dropped
// here while still borrowed
// main.rs(77, 91): let's call the
// lifetime of this reference `'1`
// main.rs(92, 17): argument requires
// that `clean_path` is borrowed for `'1`
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
else {
trie4.insert(prefix, origin);
}
count+=1;
if count >= 10000 {
println!("{:?} | {:?} | {:?} | {:?} | {:?}",
record_type, record_timestamp, peer, prefix, path);
count=0
}
};
}
println!("Trie4 size: {:?} prefixes", trie4.len());
println!("Trie6 size: {:?} prefixes", trie6.len());
}
Short answer: you're "inserting" a reference. But what's being referenced doesn't outlive what it's being inserted into.
Longer: The hint is your trie4 argument, the signature of which is this:
mut trie4: Trie<String, Option<&u32>>
So that lives beyond the length of the loop where things are declared. This is all in the loop:
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
While origin is a Vec<u32> and that's fine, the insert method is no doubt taking a String and either an Option<&u32> or a &u32. Obviously a key/value pair. But here's your problem: the value has to live as long as the collection, but your value is the last element contained in the Vec<u32>, which goes away! So you can't put something into it that will not live as long as the "container" object! Rust has just saved you from dangling references (just like it's supposed to).
Basically, your containers should be Trie<String, Option<u32>> without the reference, and then this'll all just work fine. Your problem is that the elements are references, and not just contained regular values, and given the size of what you're containing, it's actually smaller to contain a u32 than a reference (pointer size (though actually, it'll likely be the same either way, because alignment issues)).
Also of note: trie4 and trie6 will both be gone at the end of this function call, because they were moved into this function (not references or mutable references). I hope that's what you want.

How to pass &mut str and change the original mut str without a return?

I'm learning Rust from the Book and I was tackling the exercises at the end of chapter 8, but I'm hitting a wall with the one about converting words into Pig Latin. I wanted to see specifically if I could pass a &mut String to a function that takes a &mut str (to also accept slices) and modify the referenced string inside it so the changes are reflected back outside without the need of a return, like in C with a char **.
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
Also what do you think of the way I handled the chars and indexing inside strings?
use std::io::{self, Write};
fn main() {
let v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
for s in &v {
// cannot borrow `s` as mutable, as it is not declared as mutable
// cannot borrow data in a `&` reference as mutable
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
io::stdout().flush().unwrap();
}
fn to_pig_latin(mut s: &mut str) {
let first = s.chars().nth(0).unwrap();
let mut pig;
if "aeiouAEIOU".contains(first) {
pig = format!("{}-{}", s, "hay");
s = &mut pig[..]; // `pig` does not live long enough
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
pig = format!("{}-{}{}", word, first.to_lowercase(), "ay");
s = &mut pig[..]; // `pig` does not live long enough
}
}
Edit: here's the fixed code with the suggestions from below.
fn main() {
// added mut
let mut v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
// added mut
for mut s in &mut v {
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
println!();
}
// converted into &mut String
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay");
} else {
// added code to make the new first letter uppercase
let second = s.chars().nth(1).unwrap();
*s = format!(
"{}{}-{}ay",
second.to_uppercase(),
// the slice starts at the third char of the string, as if &s[2..]
&s[first.len_utf8() * 2..],
first.to_lowercase()
);
}
}
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
What you're trying to do can't work: with a mutable reference you can update the referee in-place, but this is extremely limited here:
a &mut str can't change length or anything of that matter
a &mut str is still just a reference, the memory has to live somewhere, here you're creating new Strings inside your function then trying to use these as the new backing buffers for the reference, which as the compiler tells you doesn't work: the String will be deallocated at the end of the function
What you could do is take an &mut String, that lets you modify the owned string itself in-place, which is much more flexible. And, in fact, corresponds exactly to your request: an &mut str corresponds to a char*, it's a pointer to a place in memory.
A String is also a pointer, so an &mut String is a double-pointer to a zone in memory.
So something like this:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
*s = format!("{}-{}", s, "hay");
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
*s = format!("{}-{}{}", word, first.to_lowercase(), "ay");
}
}
You can also likely avoid some of the complete string allocations by using somewhat finer methods e.g.
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
s.replace_range(first.len_utf8().., "");
write!(s, "-{}ay", first.to_lowercase()).unwrap();
}
}
although the replace_range + write! is not very readable and not super likely to be much of a gain, so that might as well be a format!, something along the lines of:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
*s = format!("{}-{}ay", &s[first.len_utf8()..], first.to_lowercase());
}
}

Why does the Rust compiler complain that I use a moved value when I've replaced it with a new value?

I am working on two singly linked lists, named longer and shorter. The length of the longer one is guaranteed to be no less than the shorter one.
I pair the lists element-wise and do something to each pair. If the longer list has more unpaired elements, process the rest of them:
struct List {
next: Option<Box<List>>,
}
fn drain_lists(mut shorter: Option<Box<List>>, mut longer: Option<Box<List>>) {
// Pair the elements in the two lists.
while let (Some(node1), Some(node2)) = (shorter, longer) {
// Actual work elided.
shorter = node1.next;
longer = node2.next;
}
// Process the rest in the longer list.
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}
However, the compiler complains on the second while loop that
error[E0382]: use of moved value
--> src/lib.rs:13:20
|
5 | fn drain_lists(mut shorter: Option<Box<List>>, mut longer: Option<Box<List>>) {
| ---------- move occurs because `longer` has type `std::option::Option<std::boxed::Box<List>>`, which does not implement the `Copy` trait
6 | // Pair the elements in the two lists.
7 | while let (Some(node1), Some(node2)) = (shorter, longer) {
| ------ value moved here
...
13 | while let Some(node) = longer {
| ^^^^ value used here after move
However, I do set a new value for shorter and longer at the end of the loop, so that I will never use a moved value of them.
How should I cater to the compiler?
I think that the problem is caused by the tuple temporary in the first loop. Creating a tuple moves its components into the new tuple, and that happens even when the subsequent pattern matching fails.
First, let me write a simpler version of your code. This compiles fine:
struct Foo(i32);
fn main() {
let mut longer = Foo(0);
while let Foo(x) = longer {
longer = Foo(x + 1);
}
println!("{:?}", longer.0);
}
But if I add a temporary to the while let then I'll trigger a compiler error similar to yours:
fn fwd<T>(t: T) -> T { t }
struct Foo(i32);
fn main() {
let mut longer = Foo(0);
while let Foo(x) = fwd(longer) {
longer = Foo(x + 1);
}
println!("{:?}", longer.0);
// Error: ^ borrow of moved value: `longer`
}
The solution is to add a local variable with the value to be destructured, instead of relying on a temporary. In your code:
struct List {
next: Option<Box<List>>
}
fn drain_lists(shorter: Option<Box<List>>,
longer: Option<Box<List>>) {
// Pair the elements in the two lists.
let mut twolists = (shorter, longer);
while let (Some(node1), Some(node2)) = twolists {
// Actual work elided.
twolists = (node1.next, node2.next);
}
// Process the rest in the longer list.
let (_, mut longer) = twolists;
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}
Other than getting rid of the tuple (shown by others), you can capture a mutable reference to the nodes:
while let (&mut Some(ref mut node1), &mut Some(ref mut node2)) = (&mut shorter, &mut longer) {
shorter = node1.next.take();
longer = node2.next.take();
}
The use of take() enables this to work: shorter = node1.next would complain of moving a field out of a reference, which is not allowed (it would leave the node in an undefined state). But takeing it is ok because it leaves None in the next field.
Looks like the destructuring on line 7 moves the value even when the block afterwards is not evaluated. (Edit: as #Sven Marnach pointed out in the comments, a temporary tuple gets created here which causes the move)
I've uglyfied your code to prove that point :)
struct List {
next: Option<Box<List>>
}
fn drain_lists(mut shorter: Option<Box<List>>,
mut longer: Option<Box<List>>) {
// Pair the elements in the two lists.
match(shorter, longer) {
(Some(node1), Some(node2)) => {
shorter = node1.next;
longer = node2.next;
},
(_, _) => return // without this you get the error
}
// Process the rest in the longer list.
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}
When I added the return for the default case, the code compiled.
One solution is to avoid the tuple and consequently the move of longer into the tuple.
fn actual_work(node1: &Box<List>, node2: &Box<List>) {
// Actual work elided
}
fn drain_lists(mut shorter: Option<Box<List>>, mut longer: Option<Box<List>>) {
while let Some(node1) = shorter {
if let Some(node2) = longer.as_ref() {
actual_work(&node1, node2);
}
shorter = node1.next;
longer = longer.map_or(None, move |l| {
l.next
});
}
// Process the rest in the longer list.
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}

Borrowing the mutable member used inside the loop

The problem I want to solve is:
Given the recursively nested data structure, eg. a JSON tree, and a path pointing to (possibly non-existent) element inside it, return the mutable reference of the element, that's the closest to given path.
Example: if we have JSON document in form { a: { b: { c: "foo" } } } and a path a.b.d, we want to have a mutable pointer to value stored under key "b".
This is a code snippet, what I've got so far:
use std::collections::HashMap;
enum Json {
Number(i64),
Bool(bool),
String(String),
Array(Vec<Json>),
Object(HashMap<String, Json>)
}
struct Pointer<'a, 'b> {
value: &'a mut Json,
path: Vec<&'b str>,
position: usize
}
/// Return a mutable pointer to JSON element having shared
/// the nearest common path with provided JSON.
fn nearest_mut<'a,'b>(obj: &'a mut Json, path: Vec<&'b str>) -> Pointer<'a,'b> {
let mut i = 0;
let mut current = obj;
for &key in path.iter() {
match current {
Json::Array(array) => {
match key.parse::<usize>() {
Ok(index) => {
match array.get_mut(index) {
Some(inner) => current = inner,
None => break,
}
},
_ => break,
}
} ,
Json::Object(map) => {
match map.get_mut(key) {
Some(inner) => current = inner,
None => break
}
},
_ => break,
};
i += 1;
}
Pointer { path, position: i, value: current }
}
The problem is that this doesn't pass through Rust's borrow checker, as current is borrowed as mutable reference twice, once inside match statement and once at the end of the function, when constructing the pointer method.
I've tried a different approaches, but not figured out how to achieve the goal (maybe going the unsafe path).
I completely misread your question and I owe you an apology.
You cannot do it in one pass - you're going to need to do a read-only pass to find the nearest path (or exact path), and then a read-write pass to actually extract the reference, or pass a mutator function in the form of a closure.
I've implemented the two-pass method for you. Do note that it is still pretty performant:
fn nearest_mut<'a, 'b>(obj: &'a mut Json, path: Vec<&'b str>) -> Pointer<'a, 'b> {
let valid_path = nearest_path(obj, path);
exact_mut(obj, valid_path).unwrap()
}
fn exact_mut<'a, 'b>(obj: &'a mut Json, path: Vec<&'b str>) -> Option<Pointer<'a, 'b>> {
let mut i = 0;
let mut target = obj;
for token in path.iter() {
i += 1;
// borrow checker gets confused about `target` being mutably borrowed too many times because of the loop
// this once-per-loop binding makes the scope clearer and circumvents the error
let target_once = target;
let target_opt = match *target_once {
Json::Object(ref mut map) => map.get_mut(*token),
Json::Array(ref mut list) => match token.parse::<usize>() {
Ok(t) => list.get_mut(t),
Err(_) => None,
},
_ => None,
};
if let Some(t) = target_opt {
target = t;
} else {
return None;
}
}
Some(Pointer {
path,
position: i,
value: target,
})
}
/// Return a mutable pointer to JSON element having shared
/// the nearest common path with provided JSON.
fn nearest_path<'a, 'b>(obj: &'a Json, path: Vec<&'b str>) -> Vec<&'b str> {
let mut i = 0;
let mut target = obj;
let mut valid_paths = vec![];
for token in path.iter() {
// borrow checker gets confused about `target` being mutably borrowed too many times because of the loop
// this once-per-loop binding makes the scope clearer and circumvents the error
let target_opt = match *target {
Json::Object(ref map) => map.get(*token),
Json::Array(ref list) => match token.parse::<usize>() {
Ok(t) => list.get(t),
Err(_) => None,
},
_ => None,
};
if let Some(t) = target_opt {
target = t;
valid_paths.push(*token)
} else {
return valid_paths;
}
}
return valid_paths
}
The principle is simple - I reused the method I wrote in my initial question in order to get the nearest valid path (or exact path).
From there, I feed that straight into the function that I had in my original answer, and since I am certain the path is valid (from the prior function call) I can safely unwrap() :-)

Resources