How to produce static references from append-only arena?

How to produce static references from append-only arena? - rust

In my application (a compiler), I'd like to create data cyclic data structures of various kinds throughout my program's execution that all have the same lifetime (in my case, lasting until the end of compilation). In addition,
I don't need to worry about multi-threading
I only need to append information - no need to delete or garbage collect
I only need immutable references to my data
This seemed like a good use case for an Arena, but I saw that this would require passing the arena around to every function in my program, which seemed like a large overhead.
So instead I found a macro called thread_local! that I can use to define global data. Using this, I thought I might be able to define a custom type that wraps an index into the array, and implement Deref on that type:
use std::cell::RefCell;
enum Floop {
CaseA,
CaseB,
CaseC(FloopRef),
CaseD(FloopRef),
CaseE(Vec<FloopRef>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
pub struct FloopRef(usize);
impl std::ops::Deref for FloopRef {
type Target = Floop;
fn deref(&self) -> &Self::Target {
return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
}
}
pub fn main() {
// initialize some data
FLOOP_ARRAY.with(|floops| {
floops.borrow_mut().push(Box::new(Floop::CaseA));
let idx = floops.borrow_mut().len();
floops.borrow_mut().push(Box::new(Floop::CaseC(FloopRef(idx))));
});
}
Unfortunately I run into lifetime errors:
error: lifetime may not live long enough
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ------- ^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is &'2 Box<Floop>
| has type `&'1 RefCell<Vec<Box<Floop>>>`
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ^---------------^^^^^^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
What I'd like to tell the compiler is that I promise I'm never going to remove entries from the Array and that I'm not going to share values across threads and that the array will last until the end of the program so that I can in essence just return a &'static reference to a Floop object. But Rust doesn't seem to be convinced this is safe.
Is there any kind of Rust helper library that would let me do something like this? Or are there safety holes even when I guarantee I only append / only use data with a single thread?

If you would have a reference, you could send the data to another thread, then watch it after it has been dropped because the creating thread was finished.
Even if you would solve this problem, this would still require unsafe code, as the compiler can't be convinced that growing the Vec won't invalidate existing references. This is true in this case since you're using Box, but the compiler cannot know that.
If you pinky promise to never touch the data after the creating thread has finished, you can use the following code. Note that this code is technically UB as when the Vec will grow, we will move all Boxes, and at least currently, moving a Box invalidates all references deriven from it:
enum Floop {
CaseA,
CaseB,
CaseC(&'static Floop),
CaseD(&'static Floop),
CaseE(Vec<&'static Floop>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
fn alloc_floop(floop: Floop) -> &'static mut Floop {
FLOOP_ARRAY.with(|floops| {
let mut floops = floops.borrow_mut();
floops.push(Box::new(floop));
let floop = &mut **floops.last_mut().unwrap() as *mut Floop;
// SAFETY: We never access the data after it has been dropped, and we are
// the only who access this `Box` as we access a `Box` only immediately
// after pushing it.
unsafe { &mut *floop }
})
}
fn main() {
let floop_a = alloc_floop(Floop::CaseA);
let floop_b = alloc_floop(Floop::CaseC(floop_a));
}
A better solution would be something like a thread-safe arena that you can use in a static, but sadly, I found no crate that implements that.

Related

Having problem with mutability of Rc pointers

I'm trying to implement a simple tree structure with Rc pointers:
use std::rc::Rc;
fn main() {
println!("Hello, world!");
}
enum Expr {
B(i128),
A(Rc<Expr>, Rc<Expr>),
O(Rc<Expr>, Rc<Expr>),
}
struct Node {
data: Rc<Expr>,
parent: Option<Rc<Node>>,
children: Vec<Rc<Node>>,
}
impl Node {
fn add_to_children(mut self, node: &Rc<Node>) {
self.children.push(Rc::clone(node))
}
fn set_as_parent(mut self, node: &Rc<Node>) {
self.parent = Some(Rc::clone(node))
}
fn link_parent_child(parent: &mut Rc<Node>, child: &mut Rc<Node>) {
println!("eheeh");
parent.add_to_children(&child);
child.set_as_parent(&parent);
}
}
This won't compile however:
error[E0507]: cannot move out of an `Rc`
--> src/main.rs:32:9
|
32 | parent.add_to_children(&child);
| ^^^^^^^-----------------------
| | |
| | value moved due to this method call
| move occurs because value has type `Node`, which does not implement the `Copy` trait
|
note: this function takes ownership of the receiver `self`, which moves value
--> src/main.rs:21:28
|
21 | fn add_to_children(mut self, node: &Rc<Node>) {
What's the better way of implementing this type of tree? It is my signature that's wrong?

Your add_to_children and set_as_parent methods take mut self, which means they consume self and try to move out of the Rc. That's not allowed, as there may be other references to the object.
The methods should take &mut self... but you'll run into another issue: Rc only exposes an immutable reference. Because, again, there may be multiple references.
The way to solve that issue is interior mutability. In your case, RefCell is the easiest - it's essentially a single-threaded lock, allowing only one place mutable access at a time. It is not a pointer and does not allocate on the heap by itself - it simply wraps the underlying value.
There's also another issue: Because both your parent and children refer to each other via Rc, you end up with a circular reference, meaning your nodes won't free memory when you drop them. Using std::rc::Weak for the parent references will fix that.
As Chaymin Friedman points out, Rust's rules about mutability can make implementing a tree structure somewhat difficult, especially when it contains parent references. There are many crates out on crates.io that have implemented such tree structures, using a variety of techniques.

Rust borrow checker in a battl simulation engine

Okay, I have Combatants which battle on a Battlefield. My intuition for which things go on which place is a little off. It's pretty close to being a game, but I am now stuck.
I want the Battlefield to have a tick() function, which allows all Combatants to take a decision, like attacking another of the opposite team or moving closing to one if no one is in range. I'm having issues making the borrow checker happy in doing this.
Here's a minimal version which has all the problems of my code.
struct Combatant{
current_health: i16,
max_health: i16
}
struct Battlefield{
combatants: Vec<Combatant>
}
impl Combatant {
fn attack(&self, other: &mut Combatant) {
other.current_health -= 3;
}
}
impl Battlefield {
fn tick(&mut self) {
let target = &mut self.combatants[0];
for combatant in &self.combatants {
combatant.attack(target);
}
}
}
cargo check returns
error[E0502]: cannot borrow `self.combatants` as immutable because it is also borrowed as mutable
--> src/main.rs:20:26
|
19 | let target = &mut self.combatants[0];
| --------------- mutable borrow occurs here
20 | for combatant in &self.combatants {
| ^^^^^^^^^^^^^^^^ immutable borrow occurs here
21 | combatant.attack(target);
| ------ mutable borrow later used here
How can I design this function (or more like, this whole scenario, heh) to make it work in Rust?

Since in your scenario you need to simultaneously have a mutable reference and an immutable reference on two elements in the same container, I think you need the help of interior mutability.
This will check at run-time that the same element is not accessed simultaneously through a mutable (.borrow_mut()) and immutable (.borrow()) reference (otherwise it panics).
Obviously, you have to ensure that by yourself (which is quite ugly since we have to compare pointers!).
Apparently, it is necessary to reach pointers because references cannot be compared (the self argument of std::cmp::PartialEq::eq() will be dereferenced). The documentation of std::ptr::eq() (which should probably be used here) shows the difference between comparing references and comparing pointers.
struct Combatant {
current_health: i16,
max_health: i16,
}
struct Battlefield {
combatants: Vec<std::cell::RefCell<Combatant>>,
}
impl Combatant {
fn attack(
&self,
other: &mut Combatant,
) {
other.current_health -= 3;
}
}
impl Battlefield {
fn tick(&mut self) {
let target_cell = &self.combatants[0];
let target = &*target_cell.borrow();
for combatant_cell in &self.combatants {
let combatant = &*combatant_cell.borrow();
// if combatant as *const _ != target as *const _ {
if !std::ptr::eq(combatant, target) {
let target_mut = &mut *target_cell.borrow_mut();
combatant.attack(target_mut);
}
}
}
}
Note that this interior mutability was bothering me at first, and seemed like « bending the rules » because I was reasoning essentially in terms of « immutable becoming suddenly mutable » (like const-casting in C++), and the shared/exclusive aspect was only the consequence of that.
But the answer to the linked question explains that we should think at first in terms of shared/exclusive access to our data, and then the immutable/mutable aspect is just the consequence.
Back to your example, the shared access to the Combatants in the vector seems essential because, at any time, any of them could access any other.
Because the consequence of this choice (shared aspect) is that mutable accesses become almost impossible, we need the help of interior mutability.
This is not « bending the rules » because strict checking is done on .borrow()/.borrow_mut() (small overhead at this moment) and the obtained Ref/RefMut have lifetimes allowing usual (static) borrow-checking in the portion of code where they appear.
It is much safer than free immutable/mutable accesses we could perform with other programming languages.
For example, even in C++ where we could consider target as const (reference/pointer to const) while iterating on the non-const combatants vector, one iteration can accidentally mutate the target that we consider as const (reference/pointer to const means « I won't mutate it », not « it cannot be mutated by anyone »), which could be misleading. And with other languages where const/mut do not even exist, anything can be mutated at any time (except for objects which are strictly immutable, like str in Python, but it becomes difficult to manage objects with states that could change over time, like current_health in your example).

The problem is this: When you iterate over the combatants, that requires an immutable borrow of all the combatants in that Vec. However, one of them, combatants[0] is already borrowed, and it's a mutable borrow.
You cannot, at the same time, have a mutable and an immutable borrow to the same thing.
This prevents a lot of logic errors. For example, in your code, if the borrow was actually allowed, you'd actually have combatants[0] attack itself!
So what to do? In the specific example above, one thing you could do is use the split_first_mut method of vec, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_first_mut
let (first, rest) = self.combatants.split_first_mut();
if let Some(first) = first {
if let Some(rest) = rest {
for combatant in rest {
combatant.attack(first);
}
}
}

You can also use split_at_mut to only iterate on the other elements:
fn tick(&mut self) {
let idx = 0;
let (before, after) = self.combatants.split_at_mut (idx);
let (target, after) = after.split_at_mut (1);
let target = &mut target[0];
for combatant in before {
combatant.attack(target);
}
for combatant in after {
combatant.attack(target);
}
}
Playground
Note that this will panic if idx >= len (self.combatants).

Use regular reference instead of `Box` in recursive data structures

I am new to Rust. When I read chapter 15 of The Rust Programming Language, I failed to know why one should use Boxes in recursive data structures instead of regular references. 15.1 of the book explains that indirection is required to avoid infinite-sized structures, but it does not explain why to use Box.
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn main() {
let list = Cons(1, &Cons(2, &Cons(3, &Nil)));
println!("{:?}", list);
}
The code above compiles and produces the desired output. It seems that using FunctionalList to store a small amount of data on stack works perfectly well. Does this code cause troubles?

It is true that the FunctionalList works in this simple case. However, we will run into some difficulties if we try to use this structure in other ways. For instance, suppose we tried to construct a FunctionalList and then return it from a function:
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list(x: u32) -> FunctionalList {
return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
}
fn main() {
let list = make_list(1);
println!("{:?}", list);
}
This results in the following compile error:
error[E0106]: missing lifetime specifier
--> src/main.rs:9:25
|
9 | fn make_list(x: u32) -> FunctionalList {
| ^^^^^^^^^^^^^^ help: consider giving it an explicit bounded or 'static lifetime: `FunctionalList + 'static`
If we follow the hint and add a 'static lifetime, then we instead get this error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:10:12
|
10 | return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
| ^^^^^^^^^^^^^^^^^^^^^^-----------------^^
| | |
| | temporary value created here
| returns a value referencing data owned by the current function
The issue is that the inner FunctionalList values here are owned by implicit temporary variables whose scope ends at the end of the make_list function. These values would thus be dropped at the end of the function, leaving dangling references to them, which Rust disallows, hence the borrow checker rejects this code.
In contrast, if FunctionalList had been defined to Box its FunctionalList component, then ownership would have been moved from the temporary value into the containing FunctionalList, and we would have been able to return it without any problem.
With your original FunctionalList, the thing we have to think about is that every value in Rust has to have an owner somewhere; and so if, as in this case, the FunctionaList is not the owner of its inner FunctionalLists, then that ownership has to reside somewhere else. In your example, that owner was an implicit temporary variable, but in more complex situations we could use a different kind of external owner. Here's an example of using a TypedArena (from the typed-arena crate) to own the data, so that we can still implement a variation of the make_list function:
use typed_arena::Arena;
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list<'a>(x: u32, arena: &'a Arena<FunctionalList<'a>>) -> &mut FunctionalList<'a> {
let l0 = arena.alloc(Nil);
let l1 = arena.alloc(Cons(x + 2, l0));
let l2 = arena.alloc(Cons(x + 1, l1));
let l3 = arena.alloc(Cons(x, l2));
return l3;
}
fn main() {
let arena = Arena::new();
let list = make_list(1, &arena);
println!("{:?}", list);
}
In this case, we adapted the return type of make_list to return only a mutable reference to a FunctionalList, instead of returning an owned FunctionalList, since now the ownership resides in the arena.

Rust, how to return reference to something in a struct that lasts as long as the struct?

I am porting a compiler I wrote to Rust. In it, I have an enum Entity which represents things like functions and variables:
pub enum Entity<'a> {
Variable(VariableEntity),
Function(FunctionEntity<'a>)
// Room for more later.
}
I then have a struct Scope which is responsible for holding on to these entities in a hash map, where the key is the name given by the programmer to the entity. (For example, declaring a function named sin would put an Entity into the hash map at the key sin.)
pub struct Scope<'a> {
symbols: HashMap<String, Entity<'a>>,
parent: Option<&'a Scope<'a>>
}
I would like to be able to get read-only references to the objects in the HashMap so that I can refer to it from other data structures. For example, when I parse a function call, I want to be able to store a reference to the function that is being called instead of just storing the name of the function and having to look up the reference every time I need the actual Entity object corresponding to the name. To do so, I have made this method:
impl<'a> Scope<'a> {
pub fn lookup(&self, symbol: &str) -> Option<&'a Entity<'a>> {
let result = self.symbols.get(symbol);
match result {
Option::None => match self.parent {
Option::None => Option::None,
Option::Some(parent) => parent.lookup(symbol),
},
Option::Some(_value) => result
}
}
}
However, this results in a compilation error:
error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
--> src/vague/scope.rs:29:31
|
29 | let result = self.symbols.get(symbol);
| ^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 28:3...
--> src/vague/scope.rs:28:3
|
28 | / pub fn lookup(&self, symbol: &str) -> Option<&'a Entity<'a>> {
29 | | let result = self.symbols.get(symbol);
30 | | match result {
31 | | Option::None => match self.parent {
... |
36 | | }
37 | | }
| |___^
note: ...so that reference does not outlive borrowed content
--> src/vague/scope.rs:29:18
|
29 | let result = self.symbols.get(symbol);
| ^^^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the impl at 9:6...
--> src/vague/scope.rs:9:6
|
9 | impl<'a> Scope<'a> {
| ^^
= note: ...so that the expression is assignable:
expected std::option::Option<&'a vague::entity::Entity<'a>>
found std::option::Option<&vague::entity::Entity<'_>>
Things I Tried
There are several ways to make the compilation error go away, but none of them give the behavior I want. First, I can do this:
pub fn lookup(&self, symbol: &str) -> Option<&Entity<'a>> {
But this means the reference will not live long enough, so I can't put it into a struct or any other kind of storage that will outlive the scope that lookup is called from. Another solution was this:
pub fn lookup(&self, symbol: &str) -> Option<&'a Entity> {
Which I do not understand why it could compile. As part of the struct definition, things inside Entity objects in the hash map must live at least as long as the scope, so how can the compiler allow the return type to be missing that? Additionally, why would the addition of <'a> result in the previous compiler error, since the only place the function is getting Entitys from is from the hash map, which is defined as having a value type of Entity<'a>. Another bad fix I found was:
pub fn lookup(&'a self, symbol: &str) -> Option<&'a Entity<'a>> {
Which would mean that lookup can only be called once, which is obviously a problem. My previous understanding was incorrect, but the problem still remains that requiring the reference to self to have the same lifetime as the whole object severely restricts the code in that I can't call this method from a reference with any shorter lifetime, e.g. one passed in as a function argument or one created in a loop.
How can I go about fixing this? Is there some way I can fix the function as I have it now, or do I need to implement the behavior I'm looking for in an entirely different way?

Here's the signature you want:
pub fn lookup(&self, symbol: &str) -> Option<&'a Entity<'a>>
Here's why it can't work: it returns a reference that borrows an Entity for longer than lookup initially borrowed the Scope. This isn't illegal, but it means that the reference lookup returns can't be derived from the self reference. Why? Because given the above signature, this is valid code:
let sc = Scope { ... };
let foo = sc.lookup("foo");
drop(sc);
do_something_with(foo);
This code compiles because it has to: there is no lifetime constraint that the compiler could use to prove it wrong, because the lifetime of foo isn't coupled to the borrow of sc. But clearly, if lookup were implemented the way you first tried, foo would contain a dangling pointer after drop(sc), which is why the compiler rejected it.
You must redesign your data structures to make the given signature for lookup work. It's not clear how best to do this given the code in the question, but here are some ideas:
Decouple the lifetimes in Scope so that the parent is borrowed for a different lifetime than the symbols. Then have lookup take &'parent self. This probably will not work by itself, depending on what you need to do with the Entitys, but you may need to do it anyway if you need to distinguish between the lifetimes of different data.
pub struct Scope<'parent, 'sym> {
symbols: HashMap<String, Entity<'sym>>,
parent: Option<&'parent Scope<'parent, 'sym>>,
}
impl<'parent, 'sym> Scope<'parent, 'sym> {
pub fn lookup(&'parent self, symbol: &str) -> Option<&'parent Entity<'sym>> {
/* ... */
}
}
Store your Scopes and/or your Entitys in an arena. An arena can give out references that outlive the self-borrow, as long as they don't outlive the arena data structure itself. The tradeoff is that nothing in the arena will be deallocated until the whole arena is destroyed. It's not a substitute for garbage collection.
Use Rc or Arc to store your Scopes and/or your Entitys and/or whatever data Entity stores that contains references. This is one way to get rid of the lifetime parameter completely, but it comes with a small runtime cost.

How to avoid mutex borrowing problems when using it's guard

I want my method of struct to perform in a synchronized way. I wanted to do this by using Mutex (Playground):
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
map: BTreeMap<String, String>,
mutex: Mutex<()>,
}
impl A {
pub fn new() -> A {
A {
map: BTreeMap::new(),
mutex: Mutex::new(()),
}
}
}
impl A {
fn synchronized_call(&mut self) {
let mutex_guard_res = self.mutex.try_lock();
if mutex_guard_res.is_err() {
return
}
let mut _mutex_guard = mutex_guard_res.unwrap(); // safe because of check above
let mut lambda = |text: String| {
let _ = self.map.insert("hello".to_owned(),
"d".to_owned());
};
lambda("dd".to_owned());
}
}
Error message:
error[E0500]: closure requires unique access to `self` but `self.mutex` is already borrowed
--> <anon>:23:26
|
18 | let mutex_guard_res = self.mutex.try_lock();
| ---------- borrow occurs here
...
23 | let mut lambda = |text: String| {
| ^^^^^^^^^^^^^^ closure construction occurs here
24 | if let Some(m) = self.map.get(&text) {
| ---- borrow occurs due to use of `self` in closure
...
31 | }
| - borrow ends here
As I understand when we borrow anything from the struct we are unable to use other struct's fields till our borrow is finished. But how can I do method synchronization then?

The closure needs a mutable reference to the self.map in order to insert something into it. But closure capturing works with whole bindings only. This means, that if you say self.map, the closure attempts to capture self, not self.map. And self can't be mutably borrowed/captured, because parts of self are already immutably borrowed.
We can solve this closure-capturing problem by introducing a new binding for the map alone such that the closure is able to capture it (Playground):
let mm = &mut self.map;
let mut lambda = |text: String| {
let _ = mm.insert("hello".to_owned(), text);
};
lambda("dd".to_owned());
However, there is something you overlooked: since synchronized_call() accepts &mut self, you don't need the mutex! Why? Mutable references are also called exclusive references, because the compiler can assure at compile time that there is only one such mutable reference at any given time.
Therefore you statically know, that there is at most one instance of synchronized_call() running on one specific object at any given time, if the function is not recursive (calls itself).
If you have mutable access to a mutex, you know that the mutex is unlocked. See the Mutex::get_mut() method for more explanation. Isn't that amazing?

Rust mutexes do not work the way you are trying to use them. In Rust, a mutex protects specific data relying on the borrow-checking mechanism used elsewhere in the language. As a consequence, declaring a field Mutex<()> doesn't make sense, because it is protecting read-write access to the () unit object that has no values to mutate.
As Lukas explained, your call_synchronized as declared doesn't need to do synchronization because its signature already requests an exclusive (mutable) reference to self, which prevents it from being invoked from multiple threads on the same object. In other words, you need to change the signature of call_synchronized because the current one does not match the functionality it is intended to provide.
call_synchronized needs to accept a shared reference to self, which will signal to Rust that it can be called from multiple threads in the first place. Inside call_synchronized a call to Mutex::lock will simultaneously lock the mutex and provide a mutable reference to the underlying data, carefully scoped so that the lock is held for the duration of the reference:
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
synced_map: Mutex<BTreeMap<String, String>>,
}
impl A {
pub fn new() -> A {
A {
synced_map: Mutex::new(BTreeMap::new()),
}
}
}
impl A {
fn synchronized_call(&self) {
let mut map = self.synced_map.lock().unwrap();
// omitting the lambda for brevity, but it would also work
// (as long as it refers to map rather than self.map)
map.insert("hello".to_owned(), "d".to_owned());
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to produce static references from append-only arena? - rust

Related

Having problem with mutability of Rc pointers

Rust borrow checker in a battl simulation engine

Use regular reference instead of `Box` in recursive data structures

Rust, how to return reference to something in a struct that lasts as long as the struct?

How to avoid mutex borrowing problems when using it's guard

Categories

Resources