Directory traversal in vanilla Rust - rust

I'm new to Rust and trying to understand basic directory traversal. Nearly all the examples I have found utilize the walkdir or glob library, which I've had good success with. However, I'm trying to do this now with just the std lib.
There is a primitive example in the standard lib docs listing the following function:
fn visit(path: &Path, cb: &dyn Fn(&PathBuf)) -> io::Result<()> {
for e in read_dir(path)? {
let e = e?;
let path = e.path();
if path.is_dir() {
visit(&path, cb)?;
} else if path.is_file() {
cb(&path);
}
}
Ok(())
}
The part I'm confused about is how to access the cb function in the context of a closure. I'm having a hard time finding an example.
For instance, I want to do something basic like collect the resulting paths into a Vec. Obviously, this does not work:
fn main() {
// create a new path
let path = Path::new(PATH);
let mut files = Vec::new();
visit(path, |e| {
files.push(e);
});
}
The error I'm receiving is:
expected reference `&dyn for<'r> std::ops::Fn(&'r std::path::PathBuf)`
found closure `[closure#src/main.rs:24:17: 26:6 files:_]
So my question is, how can I return a Fn and process the result in a closure context?

There are multiple issues with your code, but the first one that you are getting the error message for is because &dyn Fn(&PathBuf) expects a reference to a function. You can resolve that error by following the suggestion from the error message: help: consider borrowing here: '&|e| files.push(e)'
That turns your call into:
visit(path, &|e| files.push(e));
However, this code is still incorrect and results in yet another error:
error[E0596]: cannot borrow `files` as mutable, as it is a captured variable in a `Fn` closure
--> playground\src\main.rs:48:22
|
48 | visit(path, &|e| files.push(e));
| ^^^^^ cannot borrow as mutable
This time, it's because you're mutating files inside a Fn (immutable closure). To fix that, you need to change your function type to FnMut (see Closures As Input Parameters for more information):
fn visit(path: &Path, cb: &dyn FnMut(&PathBuf))
But you're still not done. There is now another error, but like the first, it comes with a suggestion for what needs to be changed:
error[E0596]: cannot borrow `*cb` as mutable, as it is behind a `&` reference
--> playground\src\main.rs:39:13
|
32 | fn visit(path: &Path, cb: &dyn FnMut(&PathBuf)) -> io::Result<()> {
| -------------------- help: consider changing this to be a mutable reference: `&mut dyn for<'r> std::ops::FnMut(&'r std::path::PathBuf)`
...
39 | cb(&path);
| ^^ `cb` is a `&` reference, so the data it refers to cannot be borrowed as mutable
In order for your closure to mutably borrow the data it uses, you also have to take a mutable reference to the closure itself, and you'll need to update your visit() call to match:
fn visit(path: &Path, cb: &mut dyn FnMut(&PathBuf))
...
visit(path, &mut |e| files.push(e));
Almost there, but there is one final error to resolve:
error[E0521]: borrowed data escapes outside of closure
--> playground\src\main.rs:48:26
|
47 | let mut files = Vec::new();
| --------- `files` declared here, outside of the closure body
48 | visit(path, &mut |e| files.push(e));
| - ^^^^^^^^^^^^^ `e` escapes the closure body here
| |
| `e` is a reference that is only valid in the closure body
You've defined your closure to take a reference to a PathBuf (&PathBuf), but you're trying to push those references into a Vec that is outside of the closure, which won't work because those references will be invalid once the closure goes out of scope. Instead, you should use an owned value -- simply PathBuf. You'll also need to update your usage of the closure to pass the PathBuf instead of a reference:
fn visit(path: &Path, cb: &mut dyn FnMut(PathBuf))
...
cb(path);
It finally works! Here is what the full program looks like now. Note that you should also unwrap() your call to visit() since it returns a Result. I've also added a simple for loop to print out the file names.
use std::path::{Path, PathBuf};
use std::fs::*;
use std::io;
fn visit(path: &Path, cb: &mut dyn FnMut(PathBuf)) -> io::Result<()> {
for e in read_dir(path)? {
let e = e?;
let path = e.path();
if path.is_dir() {
visit(&path, cb)?;
} else if path.is_file() {
cb(path);
}
}
Ok(())
}
fn main() {
let path = Path::new("./your/path/here");
let mut files = Vec::new();
visit(path, &mut |e| files.push(e)).unwrap();
for file in files {
println!("{:?}", file);
}
}

Related

Rust cloning HashMap<String, Object> without moving into closure

I am trying to make my own programming language in rust, and most features are done, so I thought I could add to the UnknownIdentifier error the ability to find the closest match to whatever the user wanted
However before I even got to finding the closest match I found out that cloning HashMap<String, Object> moves it into the closure
ErrorGenerator::error function:
#[allow(non_snake_case)]
mod ErrorGenerator {
pub fn error(name: &str, explanation: &str, line: usize, col: usize, file: String, after_f: Box<dyn Fn() -> ()>) -> ! {
eprintln!("\n[ERROR] {}, Line {:?}, Column {:?}", file, line, col);
eprintln!(" {}: {}", name, explanation);
after_f();
exit(1);
}
}
ErrorGenerator::error(
"UnknownIdentifier",
&format!("unknown identifier: `{}`, this identifier could not be found", tokenc.repr()),
tokenc.line,
tokenc.col,
tokenc.file,
Box::new(||{
let mut hashc: Vec<String> = hashs.clone().into_keys().collect();
hashc.sort();
}),
);
This is the error it gives:
error[E0597]: `hashs` does not live long enough
--> src/runtime/runtime.rs:960:70
|
959 | Box::new(||{
| - -- value captured here
| _____________________________________|
| |
960 | | let mut hashc: Vec<String> = hashs.clone().into_keys().collect();
| | ^^^^^ borrowed value does not live long enough
961 | | hashc.sort();
962 | | }),
| |______________________________________- cast requires that `hashs` is borrowed for `'static`
...
1203 | }
| - `hashs` dropped here while still borrowed
The problem's solution is probably either:
A way to borrow in 'static lifetime a mutable variable created in a method into a closure
or
A way to clone HashMap<String, Object> without moving it into the closure
You can find the full code in https://github.com/kaiserthe13th/tr-lang/tree/unknown-id-err-impl
What happens is that the compiler doesn't clone hashs then passes the clone to your callback; instead, it passes a reference to hashs to your callback and clones it inside the callback.
However, the callback is required to be 'static, and if it holds a reference to the containing function it is not! So the compiler is complaining.
What you want is to clone the hashmap before, then pass the clone to the callback. Like:
ErrorGenerator::error(
"UnknownIdentifier",
&format!("unknown identifier: `{}`, this identifier could not be found", tokenc.repr()),
tokenc.line,
tokenc.col,
tokenc.file,
{
let hashc = hashs.clone();
Box::new(|| {
let mut hashc: Vec<String> = hashc.into_keys().collect();
hashc.sort();
})
},
);
If you'll do that, you'll also recognize that the closure needs to be FnOnce() since you're moving out of hashc (.into_keys()). So after_f: Box<dyn FnOnce()>.

Why do some while-let assignments inside a loop fail to compile with "cannot borrow as mutable more than once at a time"? [duplicate]

Suppose I have several structures like in the following example, and in the next() method I need to pull the next event using a user-provided buffer, but if this event is a comment, and ignore comments flag is set to true, I need to pull the next event again:
struct Parser {
ignore_comments: bool,
}
enum XmlEvent<'buf> {
Comment(&'buf str),
Other(&'buf str),
}
impl Parser {
fn next<'buf>(&mut self, buffer: &'buf mut String) -> XmlEvent<'buf> {
let result = loop {
buffer.clear();
let temp_event = self.parse_outside_tag(buffer);
match temp_event {
XmlEvent::Comment(_) if self.ignore_comments => {}
_ => break temp_event,
}
};
result
}
fn parse_outside_tag<'buf>(&mut self, _buffer: &'buf mut String) -> XmlEvent<'buf> {
unimplemented!()
}
}
This code, however, gives a double borrow error, even when I have #![feature(nll)] enabled:
error[E0499]: cannot borrow `*buffer` as mutable more than once at a time
--> src/main.rs:14:13
|
14 | buffer.clear();
| ^^^^^^ second mutable borrow occurs here
15 |
16 | let temp_event = self.parse_outside_tag(buffer);
| ------ first mutable borrow occurs here
|
note: borrowed value must be valid for the lifetime 'buf as defined on the method body at 12:5...
--> src/main.rs:12:5
|
12 | fn next<'buf>(&mut self, buffer: &'buf mut String) -> XmlEvent<'buf> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error[E0499]: cannot borrow `*buffer` as mutable more than once at a time
--> src/main.rs:16:53
|
16 | let temp_event = self.parse_outside_tag(buffer);
| ^^^^^^ mutable borrow starts here in previous iteration of loop
|
note: borrowed value must be valid for the lifetime 'buf as defined on the method body at 12:5...
--> src/main.rs:12:5
|
12 | fn next<'buf>(&mut self, buffer: &'buf mut String) -> XmlEvent<'buf> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: aborting due to 2 previous errors
I can (approximately at least) understand why an error could happen here with the NLL feature turned off, but I don't understand why it happens with NLL.
Anyway, my end goal is to implement this without flags, so I also tried doing this (it is recursive, which is really unfortunate, but all non-recursive versions I came up with could not possibly work without NLL):
fn next<'buf>(&mut self, buffer: &'buf mut String) -> XmlEvent<'buf> {
buffer.clear();
{
let temp_event = self.parse_outside_tag(buffer);
match temp_event {
XmlEvent::Comment(_) if self.ignore_comments => {}
_ => return temp_event,
}
}
self.next(buffer)
}
Here I tried to confine the borrow inside a lexical block, and nothing from this block leaks to the outside. However, I'm still getting an error:
error[E0499]: cannot borrow `*buffer` as mutable more than once at a time
--> src/main.rs:23:19
|
15 | let temp_event = self.parse_outside_tag(buffer);
| ------ first mutable borrow occurs here
...
23 | self.next(buffer)
| ^^^^^^ second mutable borrow occurs here
24 | }
| - first borrow ends here
error: aborting due to previous error
And again, NLL does not fix it.
It has been a long time since I encountered a borrow checking error which I don't understand, so I'm hoping it is actually something simple which I'm overlooking for some reason :)
I really suspect that the root cause is somehow connected with the explicit 'buf lifetime (in particular, errors with the NLL flag turned on have these notes about it), but I can't understand what exactly is wrong here.
This is a limitation of the current implementation of non-lexical lifetimes This can be shown with this reduced case:
fn next<'buf>(buffer: &'buf mut String) -> &'buf str {
loop {
let event = parse(buffer);
if true {
return event;
}
}
}
fn parse<'buf>(_buffer: &'buf mut String) -> &'buf str {
unimplemented!()
}
fn main() {}
This limitation prevents NLL case #3: conditional control flow across functions
In compiler developer terms, the current implementation of non-lexical lifetimes is "location insensitive". Location sensitivity was originally available but it was disabled in the name of performance.
I asked Niko Matsakis about this code:
In the context of your example: the value event only has to have the lifetime 'buf conditionally — at the return point which may or may not execute. But when we are "location insensitive", we just track the lifetime that event must have anywhere, without considering where that lifetime must hold. In this case, that means we make it hold everywhere, which is why you get a compilation failure.
One subtle thing is that the current analysis is location sensitive in one respect — where the borrow takes place. The length of the borrow is not.
The good news is that adding this concept of location sensitivity back is seen as an enhancement to the implementation of non-lexical lifetimes. The bad news:
That may or may not be before the [Rust 2018] edition.
(Note: it did not make it into the initial release of Rust 2018)
This hinges on a (even newer!) underlying implementation of non-lexical lifetimes that improves the performance. You can opt-in to this half-implemented version using -Z polonius:
rustc +nightly -Zpolonius --edition=2018 example.rs
RUSTFLAGS="-Zpolonius" cargo +nightly build
Because this is across functions, you can sometimes work around this by inlining the function.
I posted a question (A borrow checker problem with a loop and non-lexical lifetimes) that was answered by the answer of this question.
I'll document here a workaround that also answers the question. Let's say you have code like this, that only compiles with Polonius:
struct Inner;
enum State<'a> {
One,
Two(&'a ()),
}
fn get<'s>(_inner: &'s mut Inner) -> State<'s> {
unimplemented!()
}
struct Outer {
inner: Inner,
}
impl Outer {
pub fn read<'s>(&'s mut self) -> &'s () {
loop {
match get(&mut self.inner) {
State::One => (), // In this case nothing happens, the borrow should end and the loop should continue
State::Two(a) => return a, // self.inner ought to be borrowed for 's, that's just to be expected
}
}
}
}
As it was said in the another answer:
One subtle thing is that the current analysis is location sensitive in one respect — where the borrow takes place. The length of the borrow is not.
Indeed, borrowing the needed reference again inside the conditional branch makes it compile! Of course, this makes the assumption that get is referentially transparent, so your mileage may vary, but borrowing again seems like an easy enough workaround.
struct Inner;
enum State<'a> {
One,
Two(&'a ()),
}
fn get<'s>(_inner: &'s mut Inner) -> State<'s> {
unimplemented!()
}
struct Outer {
inner: Inner,
}
impl Outer {
pub fn read<'s>(&'s mut self) -> &'s () {
loop {
match get(&mut self.inner) {
State::One => (), // In this case nothing happens, the borrow should end and the loop should continue
State::Two(a) => {
return match get(&mut self.inner) { // Borrowing again compiles!
State::Two(a) => a,
_ => unreachable!(),
}
}, // self.inner ought to be borrowed for 's, that's just to be expected
}
}
}
}
fn main() {
println!("Hello, world!");
}

Why can I not borrow a variable as mutable more than once at a time with a &mut Box<T> while &mut T works?

I'm trying to implement a linked list in Rust and I'm having some trouble understanding the difference between these two functions:
enum List<T> {
Nil,
Cons(T, Box<List<T>>)
}
fn foo<T>(list: &mut Box<List<T>>) {
match **list {
List::Nil => return,
List::Cons(ref mut head, ref mut tail) => {
// ...
}
}
}
fn bar<T>(list: &mut List<T>) {
match *list {
List::Nil => return,
List::Cons(ref mut head, ref mut tail) => {
// ...
}
}
}
foo fails to compile, with the following error:
error[E0499]: cannot borrow `list` (via `list.1`) as mutable more than once at a time
--> src/main.rs:66:34
|
66 | List::Cons(ref mut head, ref mut rest) => {
| ------------ ^^^^^^^^^^^^ second mutable borrow occurs here (via `list.1`)
| |
| first mutable borrow occurs here (via `list.0`)
...
69 | }
| - first borrow ends here
However, bar compiles and runs perfectly. Why does bar work, but not foo? I am using Rust version 1.25.
This can be simplified to
fn foo(v: &mut Box<(i32, i32)>) {
match **v {
(ref mut head, ref mut tail) => {}
}
}
or
fn foo(v: &mut Box<(i32, i32)>) {
let (ref mut head, ref mut tail) = **v;
}
The problem is that Box is a a strange, in-between type.
Way back in Rust's history, Box was special-cased by the compiler; it knew a lot of the details of Box, but this meant that it was "magic" and no one else could implement something that worked like Box.
RFC 130 proposed changing that; making Box "just another type". Unfortunately, this still hasn't been fully transitioned.
The details are nuanced, but basically the current borrow checker handles pattern-matching syntactically, not semantically. It needs to do this to prevent some unsoundness issues.
In the future, non-lexical lifetimes (NLL) just magically fix this; you don't have to to anything (hooray!).
Until then, you can explicitly get back to a &mut T with this ugly blob:
match *&mut **list {
Or call DerefMut explicitly:
match *std::ops::DerefMut::deref_mut(list) {
However, there's very little reason to accept a &mut Box<T>.
See also:
Destructuring boxes into multiple mutable references seems broken #30104
Bad / misleading error message with auto deref and mutable borrows of multiple fields #32930
Why can I not borrow a boxed vector content as mutable?
Confused by move semantics of struct fields inside a Box
Moving out of boxed tuple

Why can't I reuse a &mut reference after passing it to a function that accepts a generic type?

Why doesn't this code compile:
fn use_cursor(cursor: &mut io::Cursor<&mut Vec<u8>>) {
// do some work
}
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(data);
use_cursor(&mut buf);
}
data.len();
}
fn produce_data() {
let mut data = Vec::new();
take_reference(&mut data);
data.len();
}
The error in this case is:
error[E0382]: use of moved value: `*data`
--> src/main.rs:14:5
|
9 | let mut buf = io::Cursor::new(data);
| ---- value moved here
...
14 | data.len();
| ^^^^ value used here after move
|
= note: move occurs because `data` has type `&mut std::vec::Vec<u8>`, which does not implement the `Copy` trait
The signature of io::Cursor::new is such that it takes ownership of its argument. In this case, the argument is a mutable reference to a Vec.
pub fn new(inner: T) -> Cursor<T>
It sort of makes sense to me; because Cursor::new takes ownership of its argument (and not a reference) we can't use that value later on. At the same time it doesn't make sense: we essentially only pass a mutable reference and the cursor goes out of scope afterwards anyway.
In the produce_data function we also pass a mutable reference to take_reference, and it doesn't produce a error when trying to use data again, unlike inside take_reference.
I found it possible to 'reclaim' the reference by using Cursor.into_inner(), but it feels a bit weird to do it manually, since in normal use-cases the borrow-checker is perfectly capable of doing it itself.
Is there a nicer solution to this problem than using .into_inner()? Maybe there's something else I don't understand about the borrow-checker?
Normally, when you pass a mutable reference to a function, the compiler implicitly performs a reborrow. This produces a new borrow with a shorter lifetime.
When the parameter is generic (and is not of the form &mut T), the compiler doesn't do this reborrowing automatically1. However, you can do it manually by dereferencing your existing mutable reference and then referencing it again:
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(&mut *data);
use_cursor(&mut buf);
}
data.len();
}
1 — This is because the current compiler architecture only allows a chance to do a coercion if both the source and target types are known at the coercion site.

Why does calling a method on a mutable reference involve "borrowing"?

I'm learning Rust and I'm trying to cargo-cult this code into compiling:
use std::vec::Vec;
use std::collections::BTreeMap;
struct Occ {
docnum: u64,
weight: f32,
}
struct PostWriter<'a> {
bytes: Vec<u8>,
occurrences: BTreeMap<&'a [u8], Vec<Occ>>,
}
impl<'a> PostWriter<'a> {
fn new() -> PostWriter<'a> {
PostWriter {
bytes: Vec::new(),
occurrences: BTreeMap::new(),
}
}
fn add_occurrence(&'a mut self, term: &[u8], occ: Occ) {
let occurrences = &mut self.occurrences;
match occurrences.get_mut(term) {
Some(x) => x.push(occ),
None => {
// Add the term bytes to the big vector of all terms
let termstart = self.bytes.len();
self.bytes.extend(term);
// Create a new occurrences vector
let occs = vec![occ];
// Take the appended term as a slice to use as a key
// ERROR: cannot borrow `*occurrences` as mutable more than once at a time
occurrences.insert(&self.bytes[termstart..], occs);
}
}
}
}
fn main() {}
I get an error:
error[E0499]: cannot borrow `*occurrences` as mutable more than once at a time
--> src/main.rs:34:17
|
24 | match occurrences.get_mut(term) {
| ----------- first mutable borrow occurs here
...
34 | occurrences.insert(&self.bytes[termstart..], occs);
| ^^^^^^^^^^^ second mutable borrow occurs here
35 | }
36 | }
| - first borrow ends here
I don't understand... I'm just calling a method on a mutable reference, why would that line involve borrowing?
I'm just calling a method on a mutable reference, why would that line involve borrowing?
When you call a method on an object that's going to mutate the object, you can't have any other references to that object outstanding. If you did, your mutation could invalidate those references and leave your program in an inconsistent state. For example, say that you had gotten a value out of your hashmap and then added a new value. Adding the new value hits a magic limit and forces memory to be reallocated, your value now points off to nowhere! When you use that value... bang goes the program!
In this case, it looks like you want to do the relatively common "append or insert if missing" operation. You will want to use entry for that:
use std::collections::BTreeMap;
fn main() {
let mut map = BTreeMap::new();
{
let nicknames = map.entry("joe").or_insert(Vec::new());
nicknames.push("shmoe");
// Using scoping to indicate that we are done with borrowing `nicknames`
// If we didn't, then we couldn't borrow map as
// immutable because we could still change it via `nicknames`
}
println!("{:?}", map)
}
Because you're calling a method that borrows as mutable
I had a similar question yesterday about Hash, until I noticed something in the docs. The docs for BTreeMap show a method signature for insert starting with fn insert(&mut self..
So when you call .insert, you're implicitly asking that function to borrow the BTreeMap as mutable.

Resources