This question is about a specific pattern of ownership that may arise when implementing a state machine for a video game in Rust, where states can hold a reference to "global" borrowed context and where state machines own their states. I've tried to cut out as many details as I can while still motivating the problem, but it's a fairly large and tangled issue.
Here is the state trait:
pub trait AppState<'a> {
fn update(&mut self, Duration) -> Option<Box<AppState<'a> + 'a>>;
fn enter(&mut self, Box<AppState<'a> + 'a>);
//a number of other methods
}
I'm implementing states with a boxed trait object instead of an enum because I expect to have quite a lot of them. States return a Some(State) in their update method in order to cause their owning state machine to switch to a new state. I added a lifetime parameter because without it, the compiler was generating boxes with type: Box<AppState + 'static>, making the boxes useless because states contain mutable state.
Speaking of state machines, here it is:
pub struct StateMachine<'s> {
current_state: Box<AppState<'s> + 's>,
}
impl<'s> StateMachine<'s> {
pub fn switch_state(&'s mut self, new_state: Box<AppState<'s> + 's>) -> Box<AppState<'s> + 's> {
mem::replace(&mut self.current_state, new_state);
}
}
A state machine always has a valid state. By default, it starts with a Box<NullState>, which is a state that does nothing. I have omitted NullState for brevity. By itself, this seems to compile fine.
The InGame state is designed to implement a basic gameplay scenario:
type TexCreator = TextureCreator<WindowContext>;
pub struct InGame<'tc> {
app: AppControl,
tex_creator: &'tc TexCreator,
tileset: Tileset<'tc>,
}
impl<'tc> InGame<'tc> {
pub fn new(app: AppControl, tex_creator: &'tc TexCreator) -> InGame<'tc> {
// ... load tileset ...
InGame {
app,
tex_creator,
tileset,
}
}
}
This game depends on Rust SDL2. This particular set of bindings requires that textures be created by a TextureCreator, and that the textures not outlive their creator. Texture requires a lifetime parameter to ensure this. Tileset holds a texture and therefore exports this requirement. This means that I cannot store a TextureCreator within the state itself (though I'd like to), since a mutably-borrowed InGame could have texture creator moved out. Therefore, the texture creator is owned in main, where a reference to it is passed to when we create our main state:
fn main() {
let app_control = // ...
let tex_creator = // ...
let in_game = Box::new(states::InGame::new(app_control, &tex_creator));
let state_machine = states::StateMachine::new();
state_machine.switch_state(in_game);
}
I feel this program should be valid, because I have ensured that tex_creator outlives any possible state, and that state machine is the least long-lived variable. However, I get the following error:
error[E0597]: `state_machine` does not live long enough
--> src\main.rs:46:1
|
39 | state_machine.switch_state( in_game );
| ------------- borrow occurs here
...
46 | }
| ^ `state_machine` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
This doesn't make sense to me, because state_machine is only borrowed by the method invocation, but the compiler is saying that it's still borrowed when the method is over. I wish it let me trace who the borrower in the error message--I don't understand why the borrow isn't returned when the method returns.
Essentially, I want the following:
That states be implemented by trait.
That states be owned by the state machine.
That states be able to contain references to arbitrary non-static data with lifetime greater than that of the state machine.
That when a state is swapped out, the old box be still valid so that it can be moved into the constructor of the new state. This will allow the new state to switch back to the preceding state without requiring it to be re-constructed.
That a state can signal a state change by returning a new state from 'update'. The old state must be able to construct this new state within itself.
Are these constraints possible to satisfy, and if so, how?
I apologize for the long-winded question and the likelihood that I've missed something obvious, as there are a number of decisions made in the implementation above where I'm not confident I understand the semantics of the lifetimes. I've tried to search for examples of this pattern online, but it seems a lot more complicated and constrained than the toy examples I've seen.
In StateMachine::switch_state, you don't want to use the 's lifetime on &mut self; 's represents the lifetime of resources borrowed by a state, not the lifetime of the state machine. Notice that by doing that, the type of self ends up with 's twice: the full type is &'s mut StateMachine<'s>; you only need to use 's on StateMachine, not on the reference.
In a mutable reference (&'a mut T), T is invariant, hence 's is invariant too. This means that the compiler considers that the state machine has the same lifetime as whatever it borrows. Therefore, after calling switch_state, the compiler considers that the state machine ends up borrowing itself.
In short, change &'s mut self to &mut self:
impl<'s> StateMachine<'s> {
pub fn switch_state(&mut self, new_state: Box<AppState<'s> + 's>) -> Box<AppState<'s> + 's> {
mem::replace(&mut self.current_state, new_state)
}
}
You also need to declare state_machine in main as mutable:
let mut state_machine = states::StateMachine::new();
Related
I am trying to write a simple ECS:
struct Ecs {
component_sets: HashMap<TypeId, Box<dyn Any>>,
}
impl Ecs {
pub fn read_all<Component>(&self) -> &SparseSet<Component> {
self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast_ref::<SparseSet<Component>>()
.unwrap()
}
pub fn write_all<Component>(&mut self) -> &mut SparseSet<Component> {
self.component_sets
.get_mut(&TypeId::of::<Component>())
.unwrap()
.downcast_mut::<SparseSet<Component>>()
.unwrap()
}
}
I am trying to get mutable access to a certain component while another is immutable. This testing code triggers the error:
fn testing() {
let all_pos = { ecs.write_all::<Pos>() };
let all_vel = { ecs.read_all::<Vel>() };
for (p, v) in all_pos.iter_mut().zip(all_vel.iter()) {
p.x += v.x;
p.y += v.y;
}
}
And the error
error[E0502]: cannot borrow `ecs` as immutable because it is also borrowed as mutable
--> src\ecs.rs:191:25
|
190 | let all_pos = { ecs.write_all::<Pos>() };
| --- mutable borrow occurs here
191 | let all_vel = { ecs.read_all::<Vel>() };
| ^^^ immutable borrow occurs here
My understanding of the borrow checker rules tells me that it's totally fine to get references to different component sets mutably or immutably (that is, &mut SparseSet<Pos> and &SparseSet<Vel>) since they are two different types. In order to get these references though, I need to go through the main ECS struct which owns the sets, which is where the compiler complains (i.e. first I use &mut Ecs when I call ecs.write_all and then &Ecs on ecs.read_all).
My first instinct was to enclose the statements in a scope, thinking it could just drop the &mut Ecs after I get the reference to the inner component set so as not to have both mutable and immutable Ecs references alive at the same time. This is probably very stupid, yet I don't fully understand how, so I wouldn't mind some more explaining there.
I suspect one additional level of indirection is needed (similar to RefCell's borrow and borrow_mut) but I am not sure what exactly I should wrap and how I should go about it.
Update
Solution 1: make the method signature of write_all take a &self despite returning a RefMut<'_, SparseSet<Component>> by wrapping the SparseSet in a RefCell (as illustrated in the answer below by Kevin Reid).
Solution 2: similar as above (method signature takes &self) but uses this piece of unsafe code:
fn write_all<Component>(&self) -> &mut SparseSet<Component> {
let set = self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast_ref::<SparseSet<Component>>()
.unwrap();
unsafe {
let set_ptr = set as *const SparseSet<Component>;
let set_ptr = set_ptr as *mut SparseSet<Component>;
&mut *set_ptr
}
}
What are benefits of using solution 1, is the implied runtime borrow-checking provided by RefCell an hindrance in this case or would it actually prove useful?
Would the use of unsafe be tolerable in this case? Are there benefits? (e.g. performance)
it's totally fine to get references to different component sets mutably or immutably
This is true: we can safely have multiple mutable, or mutable and immutable references, as long as no mutable reference points to the same data as any other reference.
However, not every means of obtaining those references will be accepted by the compiler's borrow checker. This doesn't mean they're unsound; just that we haven't convinced the compiler that they're safe. In particular, the only way the compiler understands to have simultaneous references is a struct's fields, because the compiler can know those are disjoint using a purely local analysis (looking only at the code of a single function):
struct Ecs {
pub pos: SparseSet<Pos>,
pub vel: SparseSet<Vel>,
}
for (p, v) in ecs.pos.iter_mut().zip(ecs.vel.iter()) {
p.x += v.x;
p.y += v.y;
}
This would compile, because the compiler can see that the references refer to different subsets of memory. It will not compile if you replace ecs.pos with a method ecs.pos() — let alone a HashMap. As soon as you get a function involved, information about field borrowing is hidden. Your function
pub fn write_all<Component>(&mut self) -> &mut SparseSet<Component>
has the elided lifetimes (lifetimes the compiler picks for you because every & must have a lifetime)
pub fn write_all<'a, Component>(&'a mut self) -> &'a mut SparseSet<Component>
which are the only information the compiler will use about what is borrowed. Hence, the 'a mutable reference to the SparseSet is borrowing all of the Ecs (as &'a mut self) and you can't have any other access to it.
The ways to arrange to be able to have multiple mutable references in a mostly-statically-checked way are discussed in the documentation page on Borrow Splitting. However, all of those are based on having some statically known property, which you don't. There's no way to express “this is okay as long as the Component type is not equal to another call's”. Therefore, to do this you do need RefCell, our general-purpose helper for runtime borrow checking.
Given what you've got already, the simplest thing to do is to replace SparseSet<Component> with RefCell<SparseSet<Component>>:
// no mut; changed return type
pub fn write_all<Component>(&self) -> RefMut<'_, SparseSet<Component>> {
self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast::<RefCell<SparseSet<Component>>>() // changed type
.unwrap()
.borrow_mut() // added this line
}
Note the changed return type, because borrowing a RefCell must return an explicit handle in order to track the duration of the borrow. However, a Ref or RefMut acts mostly like an & or &mut thanks to deref coercion. (Your code that inserts items in the map, which you didn't show in the question, will also need a RefCell::new.)
Another option is to put the interior mutability — likely via RefCell, but not necessarily — inside the SparseSet type, or create a wrapper type that does that. This might or might not help the code be cleaner.
I'm learning Rust's async/await feature, and stuck with the following task. I would like to:
Create an async closure (or better to say async block) at runtime;
Pass created closure to constructor of some struct and store it;
Execute created closure later.
Looking through similar questions I wrote the following code:
use tokio;
use std::pin::Pin;
use std::future::Future;
struct Services {
s1: Box<dyn FnOnce(&mut Vec<usize>) -> Pin<Box<dyn Future<Output = ()>>>>,
}
impl Services {
fn new(f: Box<dyn FnOnce(&mut Vec<usize>) -> Pin<Box<dyn Future<Output = ()>>>>) -> Self {
Services { s1: f }
}
}
enum NumberOperation {
AddOne,
MinusOne
}
#[tokio::main]
async fn main() {
let mut input = vec![1,2,3];
let op = NumberOperation::AddOne;
let s = Services::new(Box::new(|numbers: &mut Vec<usize>| Box::pin(async move {
for n in numbers {
match op {
NumberOperation::AddOne => *n = *n + 1,
NumberOperation::MinusOne => *n = *n - 1,
};
}
})));
(s.s1)(&mut input).await;
assert_eq!(input, vec![2,3,4]);
}
But above code won't compile, because of invalid lifetimes.
How to specify lifetimes to make above example compile (so Rust will know that async closure should live as long as input). As I understand in provided example Rust requires closure to have static lifetime?
Also it's not clear why do we have to use Pin<Box> as return type?
Is it possible somehow to refactor code and eliminate: Box::new(|arg: T| Box::pin(async move {}))? Maybe there is some crate?
Thanks
Update
There is similar question How can I store an async function in a struct and call it from a struct instance?
. Although that's a similar question and actually my example is based on one of the answers from that question. Second answer contains information about closures created at runtime, but seems it works only when I pass an owned variable, but in my example I would like to pass to closure created at runtime mutable reference, not owned variable.
How to specify lifetimes to make above example compile (so Rust will know that async closure should live as long as input). As I understand in provided example Rust requires closure to have static lifetime?
Let's take a closer look at what happens when you invoke the closure:
(s.s1)(&mut input).await;
// ^^^^^^^^^^^^^^^^^^
// closure invocation
The closure immediately returns a future. You could assign that future to a variable and hold on to it until later:
let future = (s.s1)(&mut input);
// do some other stuff
future.await;
The problem is, because the future is boxed, it could be held around for the rest of the program's life without ever being driven to completion; that is, it could have 'static lifetime. And input must obviously remain borrowed until the future resolves: else imagine, for example, what would happen if "some other stuff" above involved modifying, moving or even dropping input—consider what would then happen when the future is run?
One solution would be to pass ownership of the Vec into the closure and then return it again from the future:
let s = Services::new(Box::new(move |mut numbers| Box::pin(async move {
for n in &mut numbers {
match op {
NumberOperation::AddOne => *n = *n + 1,
NumberOperation::MinusOne => *n = *n - 1,
};
}
numbers
})));
let output = (s.s1)(input).await;
assert_eq!(output, vec![2,3,4]);
See it on the playground.
#kmdreko's answer shows how you can instead actually tie the lifetime of the borrow to that of the returned future.
Also it's not clear why do we have to use Pin as return type?
Let's look at a stupidly simple async block:
async {
let mut x = 123;
let r = &mut x;
some_async_fn().await;
*r += 1;
x
}
Notice that execution may pause at the await. When that happens, the incumbent values of x and r must be stored temporarily (in the Future object: it's just a struct, in this case with fields for x and r). But r is a reference to another field in the same struct! If the future were then moved from its current location to somewhere else in memory, r would still refer to the old location of x and not the new one. Undefined Behaviour. Bad bad bad.
You may have observed that the future can also hold references to things that are stored elsewhere, such as the &mut input in #kmdreko's answer; because they are borrowed, those also cannot be moved for the duration of the borrow. So why can't the immovability of the future similarly be enforced by r's borrowing of x, without pinning? Well, the future's lifetime would then depend on its content—and such circularities are impossible in Rust.
This, generally, is the problem with self-referential data structures. Rust's solution is to prevent them from being moved: that is, to "pin" them.
Is it possible somehow to refactor code and eliminate: Box::new(|arg: T| Box::pin(async move {}))? Maybe there is some crate?
In your specific example, the closure and future can reside on the stack and you can simply get rid of all the boxing and pinning (the borrow-checker can ensure stack items don’t move without explicit pinning). However, if you want to return the Services from a function, you'll run into difficulties stating its type parameters: impl Trait would normally be your go-to solution for this type of problem, but it's limited and does not (currently) extend to associated types, such as that of the returned future.
There are work-arounds, but using boxed trait objects is often the most practical solution—albeit it introduces heap allocations and an additional layer of indirection with commensurate runtime cost. Such trait objects are however unavoidable where a single instance of your Services structure may hold different closures in s1 over the course of its life, where you're returning them from trait methods (which currently can’t use impl Trait), or where you're interfacing with a library that does not provide any alternative.
If you want your example to work as is, the missing component is communicating to the compiler what lifetime associations are allowed. Trait objects like dyn Future<...> are constrained to be 'static by default, which means it cannot have references to non-static objects. This is a problem because your closure returns a Future that needs to keep a reference to numbers in order to work.
The direct fix is to annotate that the dyn FnOnce can return a Future that can be bound to the life of the first parameter. This requires a higher-ranked trait bound and the syntax looks like for<'a>:
struct Services {
s1: Box<dyn for<'a> FnOnce(&'a mut Vec<usize>) -> Pin<Box<dyn Future<Output = ()> + 'a>>>,
}
impl Services {
fn new(f: Box<dyn for<'a> FnOnce(&'a mut Vec<usize>) -> Pin<Box<dyn Future<Output = ()> + 'a>>>) -> Self {
Services { s1: f }
}
}
The rest of your code now compiles without modification, check it out on the playground.
As Rust reference documention said
Breaking the pointer aliasing rules. &mut T and &T follow LLVM’s scoped noalias model, except if the &T contains an UnsafeCell.
It's really ambiguous.
I want to know that what's the exactly moment an undefined behavior of &mut noalias occurred in Rust.
Is it any of below, or something else?
When defining two &mut that point to the same address?
When two &mut that point to the same address exposed to rust?
When perfrom any operation on a &mut that point to the same address of any other &mut?
For example, this code is observously UB:
unsafe {
let mut x = 123usize;
let a = (&mut x as *mut usize).as_mut().unwrap(); // created, but not accessed
let b = (&mut x as *mut usize).as_mut().unwrap(); // created, accessed
*b = 666;
drop(a);
}
But what if I modify the code like this:
struct S<'a> {
ref_x: &'a mut usize
}
fn main() {
let mut x = 123;
let s = S { ref_x: &mut x }; // like the `T` in `ManuallyDrop<T>`
let taken = unsafe { std::ptr::read(&s as *const S) }; // like `ManuallyDrop<T>::take`
// at thist ime, we have two `&mut x`
*(taken.ref_x) = 666;
drop(s);
// UB or not?
}
Is the second version also UB?
The second version is totally the same implemention to std::mem::ManuallyDrop. If the second version is UB, is it a security bug of std::mem::ManuallyDrop<T>?
What the aliasing restriction is not
It is actually common to have multiple existing &mut T aliasing the same item.
The simplest example is:
fn main() {
let mut i = 32;
let j = &mut i;
let k = &mut *j;
*k = 3;
println!("{}", i);
}
Note, though, that due to borrowing rules you cannot access the other aliases simultaneously.
If you look at the implementation of ManuallyDrop::take:
pub unsafe fn take(slot: &mut ManuallyDrop<T>) -> T {
ptr::read(&slot.value)
}
You will note that there are no simultaneously accessible &mut T: calling the function re-borrows ManuallyDrop making slot the only accessible mutable reference.
Why is aliasing in Rust so ill-defined
It's really ambiguous. I want to know that what's the exactly moment an undefined behavior of &mut noalias occurred in Rust.
Tough luck, because as specified in the Nomicon:
Unfortunately, Rust hasn't actually defined its aliasing model.
The reason is that the language team wants to make sure that the definition they reach is both safe (demonstrably so), practical, and yet does not close the door to possible refinements. It's a tall order.
The Rust Unsafe Code Guidelines Working Group is still working on establishing the exact boundaries, and in particular Ralf Jung is working on an operational model for aliasing called Stacked Borrows.
Note: the Stacked Borrows model is implemented in MIRI, and therefore you can validate your code against the Stacked Borrows model simply by executing your code in MIRI. Of course Stacked Borrows is still experimental, so this doesn't guarantee anything.
What caution recommends
I personally subscribe to caution. Seeing as the exact model is unspecified, the rules are ever changing and therefore I recommend taking the stricter interpretation possible.
Thus, I interpret the no-aliasing rule of &mut T as:
At any point in the code, there shall not be two accessible references in scope which alias the same memory if one of them is &mut T.
That is, I consider that forming a &mut T to an instance T for which another &T or &mut T is in scope without invalidating the aliases (via borrowing) is ill-formed.
It may very well be overly cautious, but at least if the aliasing model ends up being more conservative than planned, my code will still be valid.
I'm fairly new to Rust and I've been trying to get a simple game working with ggez.
Now, normally in C++ or C# I would structure my game around having an instance of a "master" class that takes care of settings, frame limiting, window creation, event handling, etc., and "state" classes that are held by the master class. I would pass a pointer to the master class when creating the state class (something like game_master*) so that the state class can access the resources of the master class.
In Rust I can't pass a &mut self because the state could potentially outlive the reference.
pub fn new(_ctx: &mut Context) -> GameMaster {
let game = GameMaster {
state: None,
pending_state: None
};
game.state = Some(Box::new(TestState::new_with_master(&mut game))) <----- NOPE
game
}
I think this could be solved with lifetimes but I haven't found a way to get lifetimes working with traits. Passing the master in as a reference on every function call doesn't work either because only one mutable reference can be held at a time.
fn update(&mut self, ctx: &mut Context) -> GameResult<()> {
if self.state.is_some() {
(*self.state.as_mut().unwrap()).update(ctx, &mut self) <----- NOPE
}
else {
Ok(())
}
}
Is there a good way to do this in Rust?
I have a similar problem a while ago but I think the only choices are:
use Rc<RefCell<>> everywhere. Remember to also use Weak or add a special destroy method if you ever need to garbage collect these objects since they hold reference to each other.
use raw pointers and unsafe code. The problem is that with unsafe deference, you also lose the mutability check (RefCell ensures only one mutable reference exists at a time) in addition to the lifetime check.
I ran into a problem that simplifies into the following:
struct MyIter {
vec: Vec<i8>,
}
fn fill_with_useful_data(v: &mut Vec<i8>) {
/* ... */
}
impl<'a> Iterator for MyIter {
type Item = &'a [i8];
fn next(&mut self) -> Option<&'a [i8]> {
fill_with_useful_data(&mut self.vec);
Some(&self.vec)
}
}
fn main() {
for slice in (MyIter { vec: Vec::new() }) {
println!("{}", slice);
}
}
This generates the error:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:9:6
|
9 | impl<'a> Iterator for MyIter {
| ^^ unconstrained lifetime parameter
The idea is that the iterator does a bunch of work that reflects in its fields and at each step, it yields a reference into itself to the calling code. In this case I could model it as yielding a copy of the state instead of the reference, but let's pretend that's not possible or just inconveniently expensive.
Intuitively this shouldn't be a problem because the borrow checker can ensure that .next() isn't called again while the yielded reference can still be used to inspect the iterator's state, but the Iterator trait doesn't seem to provide for that sort of thing directly. Even with some permutations like only holding on to a reference to the vector in the iterator itself or making the iterator a reference or something to get the lifetimes baked into the type earlier on, I can't get anything past the borrow checker.
I read the "Iterators yielding mutable references" blogpost but I'm not sure if/how it applies to my problem that doesn't involve mutable references.
This is not possible. If it were allowed one could call next again and thus modify data that is also visible via & or even invalidate the reference entirely. This is because there is no connection between the self object itself and the returned reference: there is no explicit lifetime linking them.
For the compiler to reason about this and allow returning a reference into self next needs a signature like
fn next(&'a mut self) -> Option<&'a [i8]>
However, this differs from the signature of the trait which is not allowed as generic code that just takes an T: Iterator<...> cannot tell that there are different requirements on the use of the return value for some T; all have to be handled identically.
The Iterator trait is designed for return values that are independent of the iterator object, which is necessary for iterator adaptors like .collect to be correct and safe. This is more restrictive than necessary for many uses (e.g. a transient use inside a for loop) but it is just how it is at the moment. I don't think we have the tools for generalising this trait/the for loop properly now (specifically, I think we need associated types with higher rank lifetimes), but maybe in the future.