In my application (a compiler), I'd like to create data cyclic data structures of various kinds throughout my program's execution that all have the same lifetime (in my case, lasting until the end of compilation). In addition,
I don't need to worry about multi-threading
I only need to append information - no need to delete or garbage collect
I only need immutable references to my data
This seemed like a good use case for an Arena, but I saw that this would require passing the arena around to every function in my program, which seemed like a large overhead.
So instead I found a macro called thread_local! that I can use to define global data. Using this, I thought I might be able to define a custom type that wraps an index into the array, and implement Deref on that type:
use std::cell::RefCell;
enum Floop {
CaseA,
CaseB,
CaseC(FloopRef),
CaseD(FloopRef),
CaseE(Vec<FloopRef>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
pub struct FloopRef(usize);
impl std::ops::Deref for FloopRef {
type Target = Floop;
fn deref(&self) -> &Self::Target {
return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
}
}
pub fn main() {
// initialize some data
FLOOP_ARRAY.with(|floops| {
floops.borrow_mut().push(Box::new(Floop::CaseA));
let idx = floops.borrow_mut().len();
floops.borrow_mut().push(Box::new(Floop::CaseC(FloopRef(idx))));
});
}
Unfortunately I run into lifetime errors:
error: lifetime may not live long enough
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ------- ^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is &'2 Box<Floop>
| has type `&'1 RefCell<Vec<Box<Floop>>>`
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ^---------------^^^^^^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
What I'd like to tell the compiler is that I promise I'm never going to remove entries from the Array and that I'm not going to share values across threads and that the array will last until the end of the program so that I can in essence just return a &'static reference to a Floop object. But Rust doesn't seem to be convinced this is safe.
Is there any kind of Rust helper library that would let me do something like this? Or are there safety holes even when I guarantee I only append / only use data with a single thread?
If you would have a reference, you could send the data to another thread, then watch it after it has been dropped because the creating thread was finished.
Even if you would solve this problem, this would still require unsafe code, as the compiler can't be convinced that growing the Vec won't invalidate existing references. This is true in this case since you're using Box, but the compiler cannot know that.
If you pinky promise to never touch the data after the creating thread has finished, you can use the following code. Note that this code is technically UB as when the Vec will grow, we will move all Boxes, and at least currently, moving a Box invalidates all references deriven from it:
enum Floop {
CaseA,
CaseB,
CaseC(&'static Floop),
CaseD(&'static Floop),
CaseE(Vec<&'static Floop>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
fn alloc_floop(floop: Floop) -> &'static mut Floop {
FLOOP_ARRAY.with(|floops| {
let mut floops = floops.borrow_mut();
floops.push(Box::new(floop));
let floop = &mut **floops.last_mut().unwrap() as *mut Floop;
// SAFETY: We never access the data after it has been dropped, and we are
// the only who access this `Box` as we access a `Box` only immediately
// after pushing it.
unsafe { &mut *floop }
})
}
fn main() {
let floop_a = alloc_floop(Floop::CaseA);
let floop_b = alloc_floop(Floop::CaseC(floop_a));
}
A better solution would be something like a thread-safe arena that you can use in a static, but sadly, I found no crate that implements that.
This code compiles:
use std::thread;
struct Foo;
fn use_foo(foo: &Foo) {}
fn main() {
let foo = Foo {};
thread::spawn(move || {
use_foo(&foo);
});
}
but this code does not:
use std::thread;
struct Foo;
struct Bar<'a> {
foo: &'a Foo,
}
fn use_bar(bar: Bar) {}
fn main() {
let foo = Foo {};
let bar = Bar { foo: &foo };
thread::spawn(move || {
use_bar(bar);
});
}
Instead failing with
error[E0597]: `foo` does not live long enough
--> src/main.rs:15:26
|
15 | let bar = Bar { foo: &foo };
| ^^^^ borrowed value does not live long enough
16 | / thread::spawn(move || {
17 | | use_bar(bar);
18 | | });
| |______- argument requires that `foo` is borrowed for `'static`
19 | }
| - `foo` dropped here while still borrowed
This bears some similarity to this issue, however here there is only a single reference to foo. Is there any way to get around this indirection introduced by bar and move foo into the closure?
Playground
There are a few things that prevent the second example from working:
The compiler will not move variables into the closure if they aren't used by the closure. The fact that foo is used through bar is not a factor.
You cannot move a value that you were only given a shared reference to. In the general sense, other references may exist and would be very put off by even mutating the value.
Even if you were to move the value foo, any references derived from it would be invalidated. The compiler prevents this from happening. This means even if you did use both foo and bar in the closure it still wouldn't work. You'd need to recreate bar with a fresh reference to the moved foo.
So in summary, no, you cannot in general take a structure that contains references and convert it or wrap it up into an owned version so that you can send it to another thread.
You can perhaps avoid references by using indexes (if appropriate for your real use-case) or perhaps you could use shared ownership (by using Arc instead of references). Or of course simply delay creating your referencing structure until after it has been moved to the thread.
I am trying to store into a HashMap the result of a parsing operation on a text file (parsed with nom). The result is comprised of a Vec buffer and some slices over that buffer. The goal is to store those together in a tuple or struct as a value in the hash map (with String key). But I can't work around the lifetime issues.
Context
The parsing itself takes an &[u8] and returns some data structure containing slices over that same input, e.g.:
struct Cmd<'a> {
pub name: &'a str
}
fn parse<'a>(input: &'a [u8]) -> Vec<Cmd<'a>> {
[...]
}
Now, because the parsing operates on slices without storage, I need to first store the input text in a Vec so that the output slices remain valid, so something like:
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
Then I would ideally store this Entry into a HashMap. This is were troubles arise. I tried two different approaches:
Attempt A: Store then parse
Create the HashMap entry first with the input, parse referencing the HashMap entry directly, and then update it.
pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let mut entry = Entry{ input_data: buffer, parsed_result: vec![] };
let cmds = parse(&entry.input_data[..]);
entry.parsed_result = cmds;
map.insert(filename.to_string(), entry);
}
This doesn't work because the borrow checker complains that &entry.input_data[..] borrows with the same lifetime as entry, and therefore cannot be moved into map as there's an active borrow.
error[E0597]: `entry.input_data` does not live long enough
--> src\main.rs:26:23
|
23 | pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
...
26 | let cmds = parse(&entry.input_data[..]);
| ^^^^^^^^^^^^^^^^ borrowed value does not live long enough
27 | entry.parsed_result = cmds;
28 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `entry.input_data` is borrowed for `'1`
29 | }
| - `entry.input_data` dropped here while still borrowed
error[E0505]: cannot move out of `entry` because it is borrowed
--> src\main.rs:28:38
|
26 | let cmds = parse(&entry.input_data[..]);
| ---------------- borrow of `entry.input_data` occurs here
27 | entry.parsed_result = cmds;
28 | map.insert(filename.to_string(), entry);
| ------ ^^^^^ move out of `entry` occurs here
| |
| borrow later used by call
Attempt B: Parse then store
Parse first, then try to store both the Vec buffer and the data slices into it all together into the HashMap.
pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
map.insert(filename.to_string(), entry);
}
This doesn't work because the borrow checker complains that cmds has same lifetime as &buffer[..] but buffer will be dropped by the end of the function. It ignores the fact that cmds and buffer have the same lifetime, and are both (I wish) moved into entry, which is itself moved into map, so there should be no lifetime issue here.
error[E0597]: `buffer` does not live long enough
--> src\main.rs:33:21
|
31 | pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
32 | let buffer: Vec<u8> = load_from_file(filename);
33 | let cmds = parse(&buffer[..]);
| ^^^^^^ borrowed value does not live long enough
34 | let entry = Entry{ input_data: buffer, parsed_result: cmds };
35 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `buffer` is borrowed for `'1`
36 | }
| - `buffer` dropped here while still borrowed
error[E0505]: cannot move out of `buffer` because it is borrowed
--> src\main.rs:34:34
|
31 | pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
32 | let buffer: Vec<u8> = load_from_file(filename);
33 | let cmds = parse(&buffer[..]);
| ------ borrow of `buffer` occurs here
34 | let entry = Entry{ input_data: buffer, parsed_result: cmds };
| ^^^^^^ move out of `buffer` occurs here
35 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `buffer` is borrowed for `'1`
Minimal (non-)working example
use std::collections::HashMap;
#[derive(Debug, PartialEq)]
struct Cmd<'a> {
name: &'a str
}
fn parse<'a>(input: &'a [u8]) -> Vec<Cmd<'a>> {
Vec::new()
}
fn load_from_file(filename: &str) -> Vec<u8> {
Vec::new()
}
#[derive(Debug, PartialEq)]
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
// pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
// let buffer: Vec<u8> = load_from_file(filename);
// let mut entry = Entry{ input_data: buffer, parsed_result: vec![] };
// let cmds = parse(&entry.input_data[..]);
// entry.parsed_result = cmds;
// map.insert(filename.to_string(), entry);
// }
pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
map.insert(filename.to_string(), entry);
}
fn main() {
println!("Hello, world!");
}
Edit: Attempt with 2 maps
As Kevin pointed, and this is what threw me off the first time (above attempts), the borrow checker doesn't understand that moving a Vec doesn't invalidate the slices because the heap buffer of the Vec is not touched. Fair enough.
Side note: I am ignoring the parts of Kevin's answer related to using indexes (the Rust documentation explicitly states slices are a better replacement for indices, so I feel this is working against the language) and the use of external crates (which also are explicitly working against the language). I am trying to learn and understand how to do this "the Rust way", not at all costs.
So my immediate reaction to that was to change the data structure: first insert the storage Vec into a first HashMap, and once it's there call the parse() function to create the slices directly pointing into the HashMap value. Then store those into a second HashMap, which would naturally dissociate the two. However that also doesn't work as soon as I put all of that in a loop, which is the broader goal of this code:
fn two_maps<'a>(
filename: &str,
input_map: &'a mut HashMap<String, Vec<u8>>,
cmds_map: &mut HashMap<String, Vec<Cmd<'a>>>,
queue: &mut Vec<String>) {
{
let buffer: Vec<u8> = load_from_file(filename);
input_map.insert(filename.to_string(), buffer);
}
{
let buffer = input_map.get(filename).unwrap();
let cmds = parse(&buffer[..]);
for cmd in &cmds {
// [...] Find further dependencies to load and parse
queue.push("...".to_string());
}
cmds_map.insert(filename.to_string(), cmds);
}
}
fn main() {
let mut input_map = HashMap::new();
let mut cmds_map = HashMap::new();
let mut queue = Vec::new();
queue.push("file1.txt".to_string());
while let Some(path) = queue.pop() {
println!("Loading file: {}", path);
two_maps(&path[..], &mut input_map, &mut cmds_map, &mut queue);
}
}
The problem here is that once the input buffer is in the first map input_map, referencing it binds the lifetime of each new parsed result to the entry of that HashMap, and therefore the &'a mut reference (the 'a lifetime added). Without this, the compiler complains that data flows from input_map into cmds_map with unrelated lifetimes, which is fair enough. But with this, the &'a mut reference to input_map becomes locked on the first loop iteration and never released, and the borrow checker chokes on the second iteration, quite rightfully so.
So I am stuck again. Is what I am trying to do completely unreasonable and impossible in Rust? How can I approach the problem (algorithms, data structures) to make things work lifetime-wise? I really don't see what's the "Rust way" here to store a collection of buffers and slices over those buffers. Is the only solution (that I want to avoid) to first load all files, and then parse them? This is very impractical in my case because most files contain references to other files, and I want to load the minimum chain of dependencies (likely < 10 files), not the entire collection (which is something like 3000+ files), and I can only access dependencies by parsing each file.
It seems the core of the issue is that storing the input buffers into any kind of data structure requires a mutable reference to said data structure for the duration of the insert operation, which is incompatible with having long-lived immutable references to each single buffer (for the slices) because those references need to have the same lifetime as per the HashMap definition. Is there any other data structure (maybe immutable ones) that lifts this? Or am I completely on the wrong track?
Now, because the parsing operates on slices without storage, I need to first store the input text in a Vec so that the output slices remain valid, so something like:
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
What you are attempting here is a “self-referential structure”, where parsed_result refers to input_data. There is an incidental and a fundamental reason why this cannot work as written.
The incidental reason is that this struct declaration contains the lifetime parameter 'a, but actually the lifetime you're attempting to give parsed_result is the lifetime of the struct itself, and there is no Rust syntax to specify that lifetime.
The fundamental reason is that Rust allows structs (and other values) to be moved to other locations in memory, and references are just statically checked pointers. So, when you write
map.insert(filename.to_string(), entry);
you're causing the value of entry to be moved from the stack frame to the HashMap's storage. That move invalidates any references into entry, whether or not entry contains those references itself. That's what the error "cannot move out of entry because it is borrowed" means; the borrow checker is not allowing the move to happen.
In your Attempt B,
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
the problem is that you're moving buffer (into the Entry) while cmds borrows it. Again, that means the references (just fancy pointers!) into buffer would become invalid, so it's not allowed.
(Now, since Vec stores its actual data in a heap-allocated vector that will stay put while the Vec is moved, this might actually be safe, but the Rust borrow checker doesn't care about that.)
Solutions
The simplest solution (from a language perspective) is to have each Cmd store indices into input_data instead of references. Indices don't become invalid when the object is moved since they're relative. The disadvantage of this is of course that you have to slice the input data every time — code has to carry around the Entry as well as the Cmd.
However, there are tools available to make self-referential structures, without even needing to write any unsafe code. The crates ouroboros and rental both allow you to define self-referential structs, at the price of having to use special functions to access the struct fields.
For example, your code might look something like this using ouroboros (I haven't tested this):
use ouroboros::self_referencing;
#[self_referencing]
struct Entry {
input_data: Vec<u8>,
#[borrows(input_data)]
parsed_result: Vec<Cmd<'this>> // 'this is a special lifetime name provided by ouroboros
}
fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let entry = EntryBuilder { // EntryBuilder is defined by ouroboros to help construct Entry
input_data: load_from_file(filename),
// Note that instead of giving a value for parsed_result, we give
// a function to compute it.
parsed_result_builder: |input_data: &[u8]| parse(input_data),
}.build();
map.insert(filename.to_string(), entry);
}
fn do_something_with_entry(entry: &Entry) {
entry.with_parsed_result(|cmds| {
// cmds is a reference to `self.parsed_result` which only lives as
// long as this lambda and therefore can't be invalidated by a move.
});
}
ouroboros (and rental) provide a fairly odd interface for accessing fields. If, like me, you don't want to expose that interface to your users (or the rest of your code), you can write a wrapper struct around the self-referential struct whose impl contains methods designed for how you want the structure to be used, so all of the odd field access methods can remain private.
fn main() {
let mut a = String::from("dd");
let mut x = move || {
a.push_str("string: &str");
};
x();
x();
}
I have added move here to capture a but I am still able to call the x closure twice. Is a still borrowed as a mutable reference here? Why doesn't move force a move?
The variable a has indeed been moved into the closure:
fn main() {
let mut a = String::from("dd");
let mut x = move || {
a.push_str("string: &str");
};
x();
x();
a.len();
}
error[E0382]: borrow of moved value: `a`
--> src/main.rs:9:5
|
2 | let mut a = String::from("dd");
| ----- move occurs because `a` has type `std::string::String`, which does not implement the `Copy` trait
3 | let mut x = move || {
| ------- value moved into closure here
4 | a.push_str("string: &str");
| - variable moved due to use in closure
...
9 | a.len();
| ^ value borrowed here after move
It's unclear why you think that the closure x would become invalid after calling it, but it doesn't. No more than the same applied to a struct:
struct ClosureLike {
a: String,
}
impl ClosureLike {
fn call(&mut self) {
self.a.push_str("string: &str");
}
}
fn main() {
let a = String::from("dd");
let mut x = ClosureLike { a };
x.call();
x.call();
}
The question came from my wrong understanding of closures. The way it is documented in the Rust book also contributed to the confusion (I am not saying the book is bad). If anyone else had this same confusion, here is what I found.
Closures do not just store the scope and run it when its called. It captures the environment in the preferred way. The environment which contains a is stored in the closure. How the values are captured from the environment decides the trait.
The value of a persists until the closure exists, unless some operation moves it, such as if the closure returns a or a method consumes a. Here, nothing moves a out of the closure so the closure can be called as many times as I want.
A better understanding can be obtained from the FnOnce, FnMut, and Fn traits. These traits are decided by how the variables are captured by the closure, not by how the variables are moved into the closure. FnMut can be implemented on a closure where a value is moved .
Why doesn't this code compile:
fn use_cursor(cursor: &mut io::Cursor<&mut Vec<u8>>) {
// do some work
}
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(data);
use_cursor(&mut buf);
}
data.len();
}
fn produce_data() {
let mut data = Vec::new();
take_reference(&mut data);
data.len();
}
The error in this case is:
error[E0382]: use of moved value: `*data`
--> src/main.rs:14:5
|
9 | let mut buf = io::Cursor::new(data);
| ---- value moved here
...
14 | data.len();
| ^^^^ value used here after move
|
= note: move occurs because `data` has type `&mut std::vec::Vec<u8>`, which does not implement the `Copy` trait
The signature of io::Cursor::new is such that it takes ownership of its argument. In this case, the argument is a mutable reference to a Vec.
pub fn new(inner: T) -> Cursor<T>
It sort of makes sense to me; because Cursor::new takes ownership of its argument (and not a reference) we can't use that value later on. At the same time it doesn't make sense: we essentially only pass a mutable reference and the cursor goes out of scope afterwards anyway.
In the produce_data function we also pass a mutable reference to take_reference, and it doesn't produce a error when trying to use data again, unlike inside take_reference.
I found it possible to 'reclaim' the reference by using Cursor.into_inner(), but it feels a bit weird to do it manually, since in normal use-cases the borrow-checker is perfectly capable of doing it itself.
Is there a nicer solution to this problem than using .into_inner()? Maybe there's something else I don't understand about the borrow-checker?
Normally, when you pass a mutable reference to a function, the compiler implicitly performs a reborrow. This produces a new borrow with a shorter lifetime.
When the parameter is generic (and is not of the form &mut T), the compiler doesn't do this reborrowing automatically1. However, you can do it manually by dereferencing your existing mutable reference and then referencing it again:
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(&mut *data);
use_cursor(&mut buf);
}
data.len();
}
1 — This is because the current compiler architecture only allows a chance to do a coercion if both the source and target types are known at the coercion site.