This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Implement graph-like data structure in Rust
(3 answers)
What is the right smart pointer to have multiple strong references and allow mutability?
(1 answer)
Mutating one field while iterating over another immutable field
(3 answers)
How can I borrow from a HashMap to read and write at the same time?
(2 answers)
Closed 3 years ago.
I am trying to implement the JVM in Rust as a fun project but I am struggling with an issue involving references. When classes are loaded from class files, their references to other classes are represented as Strings. After the loading phase, a JVM should link classes with actual references to each other. I cannot figure out how to do this with Rust's references. Here is a stripped down version of what I am trying to achieve. (an MCV example)
use ClassRef::{Symbolic, Static};
use std::collections::HashMap;
fn main() {
let mut loader = ClassLoader { classes: HashMap::new() };
loader.load_class("java/lang/String", "java/lang/Object");
// java.lang.Object is the only class without a direct superclass
loader.load_class("java/lang/Object", "");
loader.link_classes();
}
struct ClassLoader<'a> {
classes: HashMap<String, Class<'a>>
}
impl<'a> ClassLoader<'a> {
pub fn load_class(&mut self, name: &str, super_name: &str) {
self.classes.insert(name.to_owned(), Class::new(name, super_name));
}
pub fn link_classes(&mut self) {
// Issue 1: Editing the stored class requires a mutable borrow
for (name, class) in self.classes.iter_mut() {
// Issue 2: I am not allowed to move this value to itself
class.super_class = match class.super_class {
// Issue 3: Storing a reference to another value in the map
Symbolic(ref super_name) => Static(self.classes.get(super_name)),
a => a
}
}
}
}
struct Class<'a> {
super_class: ClassRef<'a>,
name: String
}
impl<'a> Class<'a> {
pub fn new(name: &str, super_name: &str) -> Self {
Class {
name: name.to_owned(),
super_class: Symbolic(super_name.to_owned())
}
}
}
enum ClassRef<'a> {
Symbolic(String),
Static(Option<&'a Class<'a>>)
}
Currently, there are three issues with the process that I am unable to solve.
First, when I iterate through the list of loaded classes, I need to be able to change the field Class#super_class, so I need a mutable iterator. This means that I have a mutable borrow on self.classes. I also need an immutable borrow on self.classes during the iteration to look up a reference to the superclass. The way I feel like this should be solved is having a way of proving to the compiler that I am not mutating the map itself, but only a key. I feel like it has to do with std::cell::Cell as shown in Choosing your Guarantees, but the examples seem so simple I don't know how to correlate it to what I am trying to do here.
Second, I need the previous value of Class#super_class in order to calculate the new one. Essentially, I am moving the previous value out, changing it, and then moving it back in. I feel like this could be solved with some sort of higher order Map function to work on the value in place, but once again I'm not sure.
Third, when I give one class a reference to another class, there is no way of proving to the compiler that the other class won't be removed from the hashmap, causing the reference to point to deallocated memory. When classes are loaded, one class name will always point to the same class and classes are never unloaded, so I know that the value will always be there. The way I think this should be solved is with a Hashmap implementation which doesn't allow values to be overwritten or removed. I tried to poke around for such a thing but I couldn't find it.
EDIT: Solution
The duplicates linked were certainly helpful, but I ended up not using reference counting pointers (Rc<>) because the relationships between classes can sometimes be cyclic, so trying to drop the ClassLoader would end up leaking memory. The solution I settled on is using an immutable reference to a
Typed-Arena allocator. This allocator is an insertion only allocator which ensures the references it distributes are valid for the entire life of the reference to the allocator, as long as said reference is immutable. I then used the hashmap to store the references. That way everything is dropped when I drop the ClassLoader. Hopefully anyone that trying to implement a relationship similar to this finds the following code helpful.
extern crate typed_arena;
use ClassRef::{Symbolic, Static};
use std::collections::HashMap;
use std::cell::RefCell;
use core::borrow::BorrowMut;
use typed_arena::Arena;
fn main() {
let mut loader = ClassLoader { class_map: HashMap::new(), classes: &Arena::new() };
loader.load_class("java/lang/String", "java/lang/Object");
// java.lang.Object is the only class without a direct superclass
loader.load_class("java/lang/Object", "");
loader.link_classes();
}
struct ClassLoader<'a> {
classes: &'a Arena<RefCell<Class<'a>>>,
class_map: HashMap<String, &'a RefCell<Class<'a>>>
}
impl<'a> ClassLoader<'a> {
pub fn load_class(&mut self, name: &str, super_name: &str) {
let super_opt = if super_name.len() > 0 {
Some(super_name)
} else {
None
};
let class = Class::new(name, super_opt);
let class_ref = self.classes.alloc(RefCell::new(class));
self.class_map.insert(name.to_owned(), class_ref);
}
pub fn link_classes(&mut self) {
for (_name, class_ref) in &self.class_map {
let mut class = (*class_ref).borrow_mut();
if let Some(Symbolic(super_name)) = &class.super_class {
let super_class = self.class_map.get(super_name);
let super_class = super_class.map(|c| Static(c.clone()));
*class.super_class.borrow_mut() = super_class;
}
}
}
}
struct Class<'a> {
super_class: Option<ClassRef<'a>>,
name: String
}
impl<'a> Class<'a> {
pub fn new(name: &str, super_name: Option<&str>) -> Self {
Class {
name: name.to_owned(),
super_class: super_name.map(|name| Symbolic(name.to_owned()))
}
}
}
enum ClassRef<'a> {
Symbolic(String),
Static(&'a RefCell<Class<'a>>)
}
Related
I'm quite new to Rust programming, and I'm trying to convert a code that I had in js to Rust.
A plain concept of it is as below:
fn main() {
let mut ds=DataSource::new();
let mut pp =Processor::new(&mut ds);
}
struct DataSource {
st2r: Option<&Processor>,
}
struct Processor {
st1r: &DataSource,
}
impl DataSource {
pub fn new() -> Self {
DataSource {
st2r: None,
}
}
}
impl Processor {
pub fn new(ds: &mut DataSource) -> Self {
let pp = Processor {
st1r: ds,
};
ds.st2r = Some(&pp);
pp
}
}
As you can see I have two main modules in my system that are inter-connected to each other and I need a reference of each in another.
Well, this code would complain about lifetimes and such stuff, of course 😑. So I started throwing lifetime specifiers around like a madman and even after all that, it still complains that in "Processor::new" I can't return something that has been borrowed. Legit. But I can't find any solution around it! No matter how I try to handle the referencing of each other, it ends with this borrowing error.
So, can anyone point out a solution for this situation? Is my app's structure not valid in Rust and I should do it in another way? or there's a trick to this that my inexperienced mind can't find?
Thanks.
What you're trying to do can't be expressed with references and lifetimes because:
The DataSource must live longer than the Processor so that pp.st1r is guaranteed to be valid,
and the Processor must live longer than the DataSource so that ds.st2r is guaranteed to be valid. You might think that since ds.st2r is an Option and since the None variant doesn't contain a reference this allows a DataSource with a None value in st2r to outlive any Processors, but unfortunately the compiler can't know at compile-time whether st2r contains Some value, and therefore must assume it does.
Your problem is compounded by the fact that you need a mutable reference to the DataSource so that you can set its st2r field at a time when you also have an immutable outstanding reference inside the Processor, which Rust won't allow.
You can make your code work by switching to dynamic lifetime and mutability tracking using Rc (for dynamic lifetime tracking) and RefCell (for dynamic mutability tracking):
use std::cell::RefCell;
use std::rc::{ Rc, Weak };
fn main() {
let ds = Rc::new (RefCell::new (DataSource::new()));
let pp = Processor::new (Rc::clone (&ds));
}
struct DataSource {
st2r: Weak<Processor>,
}
struct Processor {
st1r: Rc<RefCell<DataSource>>,
}
impl DataSource {
pub fn new() -> Self {
DataSource {
st2r: Weak::new(),
}
}
}
impl Processor {
pub fn new(ds: Rc::<RefCell::<DataSource>>) -> Rc<Self> {
let pp = Rc::new (Processor {
st1r: ds,
});
pp.st1r.borrow_mut().st2r = Rc::downgrade (&pp);
pp
}
}
Playground
Note that I've replaced your Option<&Processor> with a Weak<Processor>. It would be possible to use an Option<Rc<Processor>> but this would risk leaking memory if you dropped all references to DataSource without setting st2r to None first. The Weak<Processor> behaves more or less like an Option<Rc<Processor>> that is set to None automatically when all other references are dropped, ensuring that memory will be freed properly.
I'm writing a game engine. In the engine, I've got a game state which contains the list of entities in the game.
I want to provide a function on my gamestate update which will in turn tell each entity to update. Each entity needs to be able to refer to the gamestate in order to correctly update itself.
Here's a simplified version of what I have so far.
pub struct GameState {
pub entities: Vec<Entity>,
}
impl GameState {
pub fn update(&mut self) {
for mut t in self.entities.iter_mut() {
t.update(self);
}
}
}
pub struct Entity {
pub value: i64,
}
impl Entity {
pub fn update(&mut self, container: &GameState) {
self.value += container.entities.len() as i64;
}
}
fn main() {
let mut c = GameState { entities: vec![] };
c.entities.push(Entity { value: 1 });
c.entities.push(Entity { value: 2 });
c.entities.push(Entity { value: 3 });
c.update();
}
The problem is the borrow checker doesn't like me passing the gamestate to the entity:
error[E0502]: cannot borrow `*self` as immutable because `self.entities` is also borrowed as mutable
--> example.rs:8:22
|
7 | for mut t in self.entities.iter_mut() {
| ------------- mutable borrow occurs here
8 | t.update(self);
| ^^^^ immutable borrow occurs here
9 | }
| - mutable borrow ends here
error: aborting due to previous error
Can anyone give me some suggestions on better ways to design this that fits with Rust better?
Thanks!
First, let's answer the question you didn't ask: Why is this not allowed?
The answer lies around the guarantees that Rust makes about & and &mut pointers. A & pointer is guaranteed to point to an immutable object, i.e. it's impossible for the objects behind the pointer to mutate while you can use that pointer. A &mut pointer is guaranteed to be the only active pointer to an object, i.e. you can be sure that nobody is going to observe or mutate the object while you're mutating it.
Now, let's look at the signature of Entity::update:
impl Entity {
pub fn update(&mut self, container: &GameState) {
// ...
}
}
This method takes two parameters: a &mut Entity and a &GameState. But hold on, we can get another reference to self through the &GameState! For example, suppose that self is the first entity. If we do this:
impl Entity {
pub fn update(&mut self, container: &GameState) {
let self_again = &container.entities[0];
// ...
}
}
then self and self_again alias each other (i.e. they refer to the same thing), which is not allowed as per the rules I mentioned above because one of the pointers is a mutable pointer.
What can you do about this?
One option is to remove an entity from the entities vector before calling update on it, then inserting it back after the call. This solves the aliasing problem because we can't get another alias to the entity from the game state. However, removing the entity from the vector and reinserting it are operations with linear complexity (the vector needs to shift all the following items), and if you do it for each entity, then the main update loop runs in quadratic complexity. You can work around that by using a different data structure; this can be as simple as a Vec<Option<Entity>>, where you simply take the Entity from each Option, though you might want to wrap this into a type that hides all None values to external code. A nice consequence is that when an entity has to interact with other entities, it will automatically skip itself when iterating on the entities vector, since it's no longer there!
A variation on the above is to simply take ownership of the whole vector of entities and temporarily replace the game state's vector of entities with an empty one.
impl GameState {
pub fn update(&mut self) {
let mut entities = std::mem::replace(&mut self.entities, vec![]);
for mut t in entities.iter_mut() {
t.update(self);
}
self.entities = entities;
}
}
This has one major downside: Entity::update will not be able to interact with the other entities.
Another option is to wrap each entity in a RefCell.
use std::cell::RefCell;
pub struct GameState {
pub entities: Vec<RefCell<Entity>>,
}
impl GameState {
pub fn update(&mut self) {
for t in self.entities.iter() {
t.borrow_mut().update(self);
}
}
}
By using RefCell, we can avoid retaining a mutable borrow on self. Here, we can use iter instead of iter_mut to iterate on entities. In return, we now need to call borrow_mut to obtain a mutable pointer to the value wrapped in the RefCell.
RefCell essentially performs borrow checking at runtime. This means that you can end up writing code that compiles fine but panics at runtime. For example, if we write Entity::update like this:
impl Entity {
pub fn update(&mut self, container: &GameState) {
for entity in container.entities.iter() {
self.value += entity.borrow().value;
}
}
}
the program will panic:
thread 'main' panicked at 'already mutably borrowed: BorrowError', ../src/libcore/result.rs:788
That's because we end up calling borrow on the entity that we're currently updating, which is still borrowed by the borrow_mut call done in GameState::update. Entity::update doesn't have enough information to know which entity is self, so you would have to use try_borrow or borrow_state (which are both unstable as of Rust 1.12.1) or pass additional data to Entity::update to avoid panics with this approach.
When writing callbacks for generic interfaces, it can be useful for them to define their own local data which they are responsible for creating and accessing.
In C I would just use a void pointer, C-like example:
struct SomeTool {
int type;
void *custom_data;
};
void invoke(SomeTool *tool) {
StructOnlyForThisTool *data = malloc(sizeof(*data));
/* ... fill in the data ... */
tool.custom_data = custom_data;
}
void execute(SomeTool *tool) {
StructOnlyForThisTool *data = tool.custom_data;
if (data.foo_bar) { /* do something */ }
}
When writing something similar in Rust, replacing void * with Option<Box<Any>>, however I'm finding that accessing the data is unreasonably verbose, eg:
struct SomeTool {
type: i32,
custom_data: Option<Box<Any>>,
};
fn invoke(tool: &mut SomeTool) {
let data = StructOnlyForThisTool { /* my custom data */ }
/* ... fill in the data ... */
tool.custom_data = Some(Box::new(custom_data));
}
fn execute(tool: &mut SomeTool) {
let data = tool.custom_data.as_ref().unwrap().downcast_ref::<StructOnlyForThisTool>().unwrap();
if data.foo_bar { /* do something */ }
}
There is one line here which I'd like to be able to write in a more compact way:
tool.custom_data.as_ref().unwrap().downcast_ref::<StructOnlyForThisTool>().unwrap()
tool.custom_data.as_ref().unwrap().downcast_mut::<StructOnlyForThisTool>().unwrap()
While each method makes sense on its own, in practice it's not something I'd want to write throughout a code-base, and not something I'm going to want to type out often or remember easily.
By convention, the uses of unwrap here aren't dangerous because:
While only some tools define custom data, the ones that do always define it.
When the data is set, by convention the tool only ever sets its own data. So there is no chance of having the wrong data.
Any time these conventions aren't followed, its a bug and should panic.
Given these conventions, and assuming accessing custom-data from a tool is something that's done often - what would be a good way to simplify this expression?
Some possible options:
Remove the Option, just use Box<Any> with Box::new(()) representing None so access can be simplified a little.
Use a macro or function to hide verbosity - passing in the Option<Box<Any>>: will work of course, but prefer not - would use as a last resort.
Add a trait to Option<Box<Any>> which exposes a method such as tool.custom_data.unwrap_box::<StructOnlyForThisTool>() with matching unwrap_box_mut.
Update 1): since asking this question a point I didn't include seems relevant.
There may be multiple callback functions like execute which must all be able to access the custom_data. At the time I didn't think this was important to point out.
Update 2): Wrapping this in a function which takes tool isn't practical, since the borrow checker then prevents further access to members of tool until the cast variable goes out of scope, I found the only reliable way to do this was to write a macro.
If the implementation really only has a single method with a name like execute, that is a strong indication to consider using a closure to capture the implementation data. SomeTool can incorporate an arbitrary callable in a type-erased manner using a boxed FnMut, as shown in this answer. execute() then boils down to invoking the closure stored in the struct field implementation closure using (self.impl_)(). For a more general approach, that will also work when you have more methods on the implementation, read on.
An idiomatic and type-safe equivalent of the type+dataptr C pattern is to store the implementation type and pointer to data together as a trait object. The SomeTool struct can contain a single field, a boxed SomeToolImpl trait object, where the trait specifies tool-specific methods such as execute. This has the following characteristics:
You no longer need an explicit type field because the run-time type information is incorporated in the trait object.
Each tool's implementation of the trait methods can access its own data in a type-safe manner without casts or unwraps. This is because the trait object's vtable automatically invokes the correct function for the correct trait implementation, and it is a compile-time error to try to invoke a different one.
The "fat pointer" representation of the trait object has the same performance characteristics as the type+dataptr pair - for example, the size of SomeTool will be two pointers, and accessing the implementation data will still involve a single pointer dereference.
Here is an example implementation:
struct SomeTool {
impl_: Box<SomeToolImpl>,
}
impl SomeTool {
fn execute(&mut self) {
self.impl_.execute();
}
}
trait SomeToolImpl {
fn execute(&mut self);
}
struct SpecificTool1 {
foo_bar: bool
}
impl SpecificTool1 {
pub fn new(foo_bar: bool) -> SomeTool {
let my_data = SpecificTool1 { foo_bar: foo_bar };
SomeTool { impl_: Box::new(my_data) }
}
}
impl SomeToolImpl for SpecificTool1 {
fn execute(&mut self) {
println!("I am {}", self.foo_bar);
}
}
struct SpecificTool2 {
num: u64
}
impl SpecificTool2 {
pub fn new(num: u64) -> SomeTool {
let my_data = SpecificTool2 { num: num };
SomeTool { impl_: Box::new(my_data) }
}
}
impl SomeToolImpl for SpecificTool2 {
fn execute(&mut self) {
println!("I am {}", self.num);
}
}
pub fn main() {
let mut tool1: SomeTool = SpecificTool1::new(true);
let mut tool2: SomeTool = SpecificTool2::new(42);
tool1.execute();
tool2.execute();
}
Note that, in this design, it doesn't make sense to make implementation an Option because we always associate the tool type with the implementation. While it is perfectly valid to have an implementation without data, it must always have a type associated with it.
Most of this is boilerplate, provided as a compilable example. Scroll down.
use std::rc::{Rc, Weak};
use std::cell::RefCell;
use std::any::{Any, AnyRefExt};
struct Shared {
example: int,
}
struct Widget {
parent: Option<Weak<Box<Widget>>>,
specific: RefCell<Box<Any>>,
shared: RefCell<Shared>,
}
impl Widget {
fn new(specific: Box<Any>,
parent: Option<Rc<Box<Widget>>>) -> Rc<Box<Widget>> {
let parent_option = match parent {
Some(parent) => Some(parent.downgrade()),
None => None,
};
let shared = Shared{pos: 10};
Rc::new(box Widget{
parent: parent_option,
specific: RefCell::new(specific),
shared: RefCell::new(shared)})
}
}
struct Woo {
foo: int,
}
impl Woo {
fn new() -> Box<Any> {
box Woo{foo: 10} as Box<Any>
}
}
fn main() {
let widget = Widget::new(Woo::new(), None);
{
// This is a lot of work...
let cell_borrow = widget.specific.borrow();
let woo = cell_borrow.downcast_ref::<Woo>().unwrap();
println!("{}", woo.foo);
}
// Can the above be made into a function?
// let woo = widget.get_specific::<Woo>();
}
I'm learning Rust and trying to figure out some workable way of doing a widget hierarchy. The above basically does what I need, but it is a bit cumbersome. Especially vexing is the fact that I have to use two statements to convert the inner widget (specific member of Widget). I tried several ways of writing a function that does it all, but the amount of reference and lifetime wizardry is just beyond me.
Can it be done? Can the commented out method at the bottom of my example code be made into reality?
Comments regarding better ways of doing this entire thing are appreciated, but put it in the comments section (or create a new question and link it)
I'll just present a working simplified and more idiomatic version of your code and then explain all the changed I made there:
use std::rc::{Rc, Weak};
use std::any::{Any, AnyRefExt};
struct Shared {
example: int,
}
struct Widget {
parent: Option<Weak<Widget>>,
specific: Box<Any>,
shared: Shared,
}
impl Widget {
fn new(specific: Box<Any>, parent: Option<Rc<Widget>>) -> Widget {
let parent_option = match parent {
Some(parent) => Some(parent.downgrade()),
None => None,
};
let shared = Shared { example: 10 };
Widget{
parent: parent_option,
specific: specific,
shared: shared
}
}
fn get_specific<T: 'static>(&self) -> Option<&T> {
self.specific.downcast_ref::<T>()
}
}
struct Woo {
foo: int,
}
impl Woo {
fn new() -> Woo {
Woo { foo: 10 }
}
}
fn main() {
let widget = Widget::new(box Woo::new() as Box<Any>, None);
let specific = widget.get_specific::<Woo>().unwrap();
println!("{}", specific.foo);
}
First of all, there are needless RefCells inside your structure. RefCells are needed very rarely - only when you need to mutate internal state of an object using only & reference (instead of &mut). This is useful tool for implementing abstractions, but it is almost never needed in application code. Because it is not clear from your code that you really need it, I assume that it was used mistakingly and can be removed.
Next, Rc<Box<Something>> when Something is a struct (like in your case where Something = Widget) is redundant. Putting an owned box into a reference-counted box is just an unnecessary double indirection and allocation. Plain Rc<Widget> is the correct way to express this thing. When dynamically sized types land, it will be also true for trait objects.
Finally, you should try to always return unboxed values. Returning Rc<Box<Widget>> (or even Rc<Widget>) is unnecessary limiting for the callers of your code. You can go from Widget to Rc<Widget> easily, but not the other way around. Rust optimizes by-value returns automatically; if your callers need Rc<Widget> they can box the return value themselves:
let rc_w = box(RC) Widget::new(...);
Same thing is also true for Box<Any> returned by Woo::new().
You can see that in the absence of RefCells your get_specific() method can be implemented very easily. However, you really can't do it with RefCell because RefCell uses RAII for dynamic borrow checks, so you can't return a reference to its internals from a method. You'll have to return core::cell::Refs, and your callers will need to downcast_ref() themselves. This is just another reason to use RefCells sparingly.
I have the following code (as a cut-down example):
class Item {
attributes: ~mut [~str];
}
class ItemList {
items: ~mut [ ~Item ];
}
fn read_item(rdr : Reader) -> ~mut Item {
}
fn load_item_list(rdr : Reader) -> ~mut ItemList {
}
When trying to implement these functions, I keep running into errors like "unresolved name ItemList" and conflicts between pointer/mutability types (&~mut vs ~mut, etc.)
Could someone give me a cut-down example which just allocates and returns the empty objects? From there I should be able to fill in the data.
I'm not sure what problem you are hitting but here are a few pointers. First of all, I recommend moving to Rust trunk - the class syntax you are using indicates a fairly old version of Rust. On trunk, mut is no longer a valid modifier for the interior of owned types, meaning ~mut T cannot be written, nor struct { mut field: T }. Instead mutability of owned types is inherited through its root in the stack. So if you have type Foo = ~[~str], that type is deeply immutable when declared in let foo = ~[~"bar"] and deeply mutable when let mut foo = ~[~"bar"].
Here's an example based on your types.
struct Item {
attributes: ~[~str]
}
struct ItemList {
items: ~[ ~Item ]
}
fn read_item() -> ~Item {
~Item {
attributes: ~[~"puppies"]
}
}
fn load_item_list() -> ~ItemList {
~ItemList {
items: ~[read_item()]
}
}
fn main() {
let mut my_items = load_item_list();
my_items.items[0].attributes[0] = ~"kitties";
}
You can see that the types don't mention mutability at all, but because we load them into a mutable slot (let mut my_items), the values in the inner vector can be mutated.
This can take some time to get used to, but Rust deals with interior mutability this way to avoid some potentially confounding errors related to borrowed pointers.
The exception to this inherited mutability is with managed types, which can also be the root of mutability, as in #mut T.