I ran into a problem that simplifies into the following:
struct MyIter {
vec: Vec<i8>,
}
fn fill_with_useful_data(v: &mut Vec<i8>) {
/* ... */
}
impl<'a> Iterator for MyIter {
type Item = &'a [i8];
fn next(&mut self) -> Option<&'a [i8]> {
fill_with_useful_data(&mut self.vec);
Some(&self.vec)
}
}
fn main() {
for slice in (MyIter { vec: Vec::new() }) {
println!("{}", slice);
}
}
This generates the error:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:9:6
|
9 | impl<'a> Iterator for MyIter {
| ^^ unconstrained lifetime parameter
The idea is that the iterator does a bunch of work that reflects in its fields and at each step, it yields a reference into itself to the calling code. In this case I could model it as yielding a copy of the state instead of the reference, but let's pretend that's not possible or just inconveniently expensive.
Intuitively this shouldn't be a problem because the borrow checker can ensure that .next() isn't called again while the yielded reference can still be used to inspect the iterator's state, but the Iterator trait doesn't seem to provide for that sort of thing directly. Even with some permutations like only holding on to a reference to the vector in the iterator itself or making the iterator a reference or something to get the lifetimes baked into the type earlier on, I can't get anything past the borrow checker.
I read the "Iterators yielding mutable references" blogpost but I'm not sure if/how it applies to my problem that doesn't involve mutable references.
This is not possible. If it were allowed one could call next again and thus modify data that is also visible via & or even invalidate the reference entirely. This is because there is no connection between the self object itself and the returned reference: there is no explicit lifetime linking them.
For the compiler to reason about this and allow returning a reference into self next needs a signature like
fn next(&'a mut self) -> Option<&'a [i8]>
However, this differs from the signature of the trait which is not allowed as generic code that just takes an T: Iterator<...> cannot tell that there are different requirements on the use of the return value for some T; all have to be handled identically.
The Iterator trait is designed for return values that are independent of the iterator object, which is necessary for iterator adaptors like .collect to be correct and safe. This is more restrictive than necessary for many uses (e.g. a transient use inside a for loop) but it is just how it is at the moment. I don't think we have the tools for generalising this trait/the for loop properly now (specifically, I think we need associated types with higher rank lifetimes), but maybe in the future.
Related
I am trying to write a simple ECS:
struct Ecs {
component_sets: HashMap<TypeId, Box<dyn Any>>,
}
impl Ecs {
pub fn read_all<Component>(&self) -> &SparseSet<Component> {
self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast_ref::<SparseSet<Component>>()
.unwrap()
}
pub fn write_all<Component>(&mut self) -> &mut SparseSet<Component> {
self.component_sets
.get_mut(&TypeId::of::<Component>())
.unwrap()
.downcast_mut::<SparseSet<Component>>()
.unwrap()
}
}
I am trying to get mutable access to a certain component while another is immutable. This testing code triggers the error:
fn testing() {
let all_pos = { ecs.write_all::<Pos>() };
let all_vel = { ecs.read_all::<Vel>() };
for (p, v) in all_pos.iter_mut().zip(all_vel.iter()) {
p.x += v.x;
p.y += v.y;
}
}
And the error
error[E0502]: cannot borrow `ecs` as immutable because it is also borrowed as mutable
--> src\ecs.rs:191:25
|
190 | let all_pos = { ecs.write_all::<Pos>() };
| --- mutable borrow occurs here
191 | let all_vel = { ecs.read_all::<Vel>() };
| ^^^ immutable borrow occurs here
My understanding of the borrow checker rules tells me that it's totally fine to get references to different component sets mutably or immutably (that is, &mut SparseSet<Pos> and &SparseSet<Vel>) since they are two different types. In order to get these references though, I need to go through the main ECS struct which owns the sets, which is where the compiler complains (i.e. first I use &mut Ecs when I call ecs.write_all and then &Ecs on ecs.read_all).
My first instinct was to enclose the statements in a scope, thinking it could just drop the &mut Ecs after I get the reference to the inner component set so as not to have both mutable and immutable Ecs references alive at the same time. This is probably very stupid, yet I don't fully understand how, so I wouldn't mind some more explaining there.
I suspect one additional level of indirection is needed (similar to RefCell's borrow and borrow_mut) but I am not sure what exactly I should wrap and how I should go about it.
Update
Solution 1: make the method signature of write_all take a &self despite returning a RefMut<'_, SparseSet<Component>> by wrapping the SparseSet in a RefCell (as illustrated in the answer below by Kevin Reid).
Solution 2: similar as above (method signature takes &self) but uses this piece of unsafe code:
fn write_all<Component>(&self) -> &mut SparseSet<Component> {
let set = self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast_ref::<SparseSet<Component>>()
.unwrap();
unsafe {
let set_ptr = set as *const SparseSet<Component>;
let set_ptr = set_ptr as *mut SparseSet<Component>;
&mut *set_ptr
}
}
What are benefits of using solution 1, is the implied runtime borrow-checking provided by RefCell an hindrance in this case or would it actually prove useful?
Would the use of unsafe be tolerable in this case? Are there benefits? (e.g. performance)
it's totally fine to get references to different component sets mutably or immutably
This is true: we can safely have multiple mutable, or mutable and immutable references, as long as no mutable reference points to the same data as any other reference.
However, not every means of obtaining those references will be accepted by the compiler's borrow checker. This doesn't mean they're unsound; just that we haven't convinced the compiler that they're safe. In particular, the only way the compiler understands to have simultaneous references is a struct's fields, because the compiler can know those are disjoint using a purely local analysis (looking only at the code of a single function):
struct Ecs {
pub pos: SparseSet<Pos>,
pub vel: SparseSet<Vel>,
}
for (p, v) in ecs.pos.iter_mut().zip(ecs.vel.iter()) {
p.x += v.x;
p.y += v.y;
}
This would compile, because the compiler can see that the references refer to different subsets of memory. It will not compile if you replace ecs.pos with a method ecs.pos() — let alone a HashMap. As soon as you get a function involved, information about field borrowing is hidden. Your function
pub fn write_all<Component>(&mut self) -> &mut SparseSet<Component>
has the elided lifetimes (lifetimes the compiler picks for you because every & must have a lifetime)
pub fn write_all<'a, Component>(&'a mut self) -> &'a mut SparseSet<Component>
which are the only information the compiler will use about what is borrowed. Hence, the 'a mutable reference to the SparseSet is borrowing all of the Ecs (as &'a mut self) and you can't have any other access to it.
The ways to arrange to be able to have multiple mutable references in a mostly-statically-checked way are discussed in the documentation page on Borrow Splitting. However, all of those are based on having some statically known property, which you don't. There's no way to express “this is okay as long as the Component type is not equal to another call's”. Therefore, to do this you do need RefCell, our general-purpose helper for runtime borrow checking.
Given what you've got already, the simplest thing to do is to replace SparseSet<Component> with RefCell<SparseSet<Component>>:
// no mut; changed return type
pub fn write_all<Component>(&self) -> RefMut<'_, SparseSet<Component>> {
self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast::<RefCell<SparseSet<Component>>>() // changed type
.unwrap()
.borrow_mut() // added this line
}
Note the changed return type, because borrowing a RefCell must return an explicit handle in order to track the duration of the borrow. However, a Ref or RefMut acts mostly like an & or &mut thanks to deref coercion. (Your code that inserts items in the map, which you didn't show in the question, will also need a RefCell::new.)
Another option is to put the interior mutability — likely via RefCell, but not necessarily — inside the SparseSet type, or create a wrapper type that does that. This might or might not help the code be cleaner.
I've worked down a real-life example in a web app, which I've solved using unnecessary heap allocation, to the following example:
// Try replacing with (_: &String)
fn make_debug<T>(_: T) -> impl std::fmt::Debug {
42u8
}
fn test() -> impl std::fmt::Debug {
let value = "value".to_string();
// try removing the ampersand to get this to compile
make_debug(&value)
}
pub fn main() {
println!("{:?}", test());
}
As is, compiling this code gives me:
error[E0597]: `value` does not live long enough
--> src/main.rs:9:16
|
5 | fn test() -> impl std::fmt::Debug {
| -------------------- opaque type requires that `value` is borrowed for `'static`
...
9 | make_debug(&value)
| ^^^^^^ borrowed value does not live long enough
10 | }
| - `value` dropped here while still borrowed
I can fix this error in at least two ways:
Instead of passing in a reference to value in test(), pass in value itself
Instead of the parameter T, explicitly state the type of the argument for make_debug as &String or &str
My understanding of what's happening is that, when there is a parameter, the borrow checker is assuming that any lifetime on that parameter affects the output impl Debug value.
Is there a way to keep the code parameterized, continue passing in a reference, and get the borrow checker to accept it?
I think this is due to the rules around how impl trait opaque types capture lifetimes.
If there are lifetimes inside an argument T, then an impl trait has to incorporate them. Additional lifetimes in the type signature follow the normal rules.
For more information please see:
https://github.com/rust-lang/rust/issues/43396#issuecomment-349716967
https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md#lifetime-parameters
https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md#assumption-3-there-should-be-an-explicit-marker-when-a-lifetime-could-be-embedded-in-a-return-type
https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md#scoping-for-type-and-lifetime-parameters
A more complete answer
Original goal: the send_form function takes an input parameter of type &T which is rendered to a binary representation. That binary representation is owned by the resulting impl Future, and no remnant of the original &T remains. Therefore, the lifetime of &T need not outlive the impl Trait. All good.
The problem arises when T itself, additionally, contains references with lifetimes. If we were not using impl Trait, our signature would look something like this:
fn send_form<T>(self, data: &T) -> SendFormFuture;
And by looking at SendFormFuture, we can readily observe that there is no remnant of T in there at all. Therefore, even if T has lifetimes of its own to deal with, we know that all references are used within the body of send_form, and never used again afterward by SendFormFuture.
However, with impl Future as the output, we get no such guarantees. There's no way to know if the concrete implementation of Future in fact holds onto the T.
In the case where T has no references, this still isn't a problem. Either the impl Future references the T, and fully takes ownership of it, or it doesn't reference it, and no lifetime issues arise.
However, if T does have references, you could end up in a situation where the concrete impl Future is holding onto a reference stored in the T. Even though the impl Future has ownership of the T itself, it doesn't have ownership of the values referenced by the T.
This is why the borrow check must be conservative, and insist that any references inside T must have a 'static lifetime.
The only workaround I can see is to bypass impl Future and be explicit in the return type. Then, you can demonstrate to the borrow checker quite easily that the output type does not reference the input T type at all, and any references in it are irrelevant.
The original code in the actix web client for send_form looks like:
https://docs.rs/awc/0.2.1/src/awc/request.rs.html#503-522
pub fn send_form<T: Serialize>(
self,
value: &T,
) -> impl Future<
Item = ClientResponse<impl Stream<Item = Bytes, Error = PayloadError>>,
Error = SendRequestError,
> {
let body = match serde_urlencoded::to_string(value) {
Ok(body) => body,
Err(e) => return Either::A(err(Error::from(e).into())),
};
// set content-type
let slf = self.set_header_if_none(
header::CONTENT_TYPE,
"application/x-www-form-urlencoded",
);
Either::B(slf.send_body(Body::Bytes(Bytes::from(body))))
}
You may need to patch the library or write your own function that does the same thing but with a concrete type. If anyone else knows how to deal with this apparent limitation of impl trait I'd love to hear it.
Here's how far I've gotten on a rewrite of send_form in awc (the actix-web client library):
pub fn send_form_alt<T: Serialize>(
self,
value: &T,
// ) -> impl Future<
// Item = ClientResponse<impl Stream<Item = Bytes, Error = PayloadError>>,
// Error = SendRequestError,
) -> Either<
FutureResult<String, actix_http::error::Error>,
impl Future<
Item = crate::response::ClientResponse<impl futures::stream::Stream>,
Error = SendRequestError,
>,
> {
Some caveats so far:
Either::B is necessarily an opaque impl trait of Future.
The first param of FutureResult might actually be Void or whatever the Void equivalent in Rust is called.
I ran into a problem that simplifies into the following:
struct MyIter {
vec: Vec<i8>,
}
fn fill_with_useful_data(v: &mut Vec<i8>) {
/* ... */
}
impl<'a> Iterator for MyIter {
type Item = &'a [i8];
fn next(&mut self) -> Option<&'a [i8]> {
fill_with_useful_data(&mut self.vec);
Some(&self.vec)
}
}
fn main() {
for slice in (MyIter { vec: Vec::new() }) {
println!("{}", slice);
}
}
This generates the error:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:9:6
|
9 | impl<'a> Iterator for MyIter {
| ^^ unconstrained lifetime parameter
The idea is that the iterator does a bunch of work that reflects in its fields and at each step, it yields a reference into itself to the calling code. In this case I could model it as yielding a copy of the state instead of the reference, but let's pretend that's not possible or just inconveniently expensive.
Intuitively this shouldn't be a problem because the borrow checker can ensure that .next() isn't called again while the yielded reference can still be used to inspect the iterator's state, but the Iterator trait doesn't seem to provide for that sort of thing directly. Even with some permutations like only holding on to a reference to the vector in the iterator itself or making the iterator a reference or something to get the lifetimes baked into the type earlier on, I can't get anything past the borrow checker.
I read the "Iterators yielding mutable references" blogpost but I'm not sure if/how it applies to my problem that doesn't involve mutable references.
This is not possible. If it were allowed one could call next again and thus modify data that is also visible via & or even invalidate the reference entirely. This is because there is no connection between the self object itself and the returned reference: there is no explicit lifetime linking them.
For the compiler to reason about this and allow returning a reference into self next needs a signature like
fn next(&'a mut self) -> Option<&'a [i8]>
However, this differs from the signature of the trait which is not allowed as generic code that just takes an T: Iterator<...> cannot tell that there are different requirements on the use of the return value for some T; all have to be handled identically.
The Iterator trait is designed for return values that are independent of the iterator object, which is necessary for iterator adaptors like .collect to be correct and safe. This is more restrictive than necessary for many uses (e.g. a transient use inside a for loop) but it is just how it is at the moment. I don't think we have the tools for generalising this trait/the for loop properly now (specifically, I think we need associated types with higher rank lifetimes), but maybe in the future.
I am trying to implement a system that would use borrow checking/lifetimes in order to provide safe custom indices on a collection. Consider the following code:
struct Graph(i32);
struct Edge<'a>(&'a Graph, i32);
impl Graph {
pub fn get_edge(&self) -> Edge {
Edge(&self, 0)
}
pub fn split(&mut self, Edge(_, edge_id): Edge) {
self.0 = self.0 + edge_id;
}
pub fn join(&mut self, Edge(_, edge0_id): Edge, Edge(_, edge1_id): Edge) {
self.0 = self.0 + edge0_id + edge1_id;
}
}
fn main() {
let mut graph = Graph(0);
let edge = graph.get_edge();
graph.split(edge)
}
References to the graph borrowed by the Edge struct should be dropped when methods such as split or join are called. This would fulfill the API invariant that all edge indices must be destroyed when the graph is mutated. However, the compiler doesn't get it. It fails with messages like
error[E0502]: cannot borrow `graph` as mutable because it is also borrowed as immutable
--> src/main.rs:23:5
|
22 | let edge = graph.get_edge();
| ----- immutable borrow occurs here
23 | graph.split(edge)
| ^^^^^ mutable borrow occurs here
24 | }
| - immutable borrow ends here
If I understand this correctly, the compiler fails to realise that the borrowing of the graph that happened in the edge struct is actually being released when the function is called. Is there a way to teach the compiler what I am trying to do here?
Bonus question: is there a way to do exactly the same but without actually borrowing the graph in the Edge struct? The edge struct is only used as a temporary for the purpose of traversal and will never be part of an external object state (I have 'weak' versions of the edge for that).
Addendum: After some digging around, it seems to be really far from trivial. First of all, Edge(_, edge_id) does not actually destructure the Edge, because _ does not get bound at all (yes, i32 is Copy which makes things even more complicated, but this is easily remedied by wrapping it into a non-Copy struct). Second, even if I completely destructure Edge (i.e. by doing it in a separate scope), the reference to the graph is still there, even though it should have been moved (this must be a bug). It only works if I perform the destructuring in a separate function. Now, I have an idea how to circumvent it (by having a separate object that describes a state change and destructures the indices as they are supplied), but this becomes very awkward very quickly.
You have a second problem that you didn’t mention: how does split know that the user didn’t pass an Edge from a different Graph? Fortunately, it’s possible to solve both problems with higher-rank trait bounds!
First, let’s have Edge carry a PhantomData marker instead of a real reference to the graph:
pub struct Edge<'a>(PhantomData<&'a mut &'a ()>, i32);
Second, let’s move all the Graph operations into a new GraphView object that gets consumed by operations that should invalidate the identifiers:
pub struct GraphView<'a> {
graph: &'a mut Graph,
marker: PhantomData<&'a mut &'a ()>,
}
impl<'a> GraphView<'a> {
pub fn get_edge(&self) -> Edge<'a> {
Edge(PhantomData, 0)
}
pub fn split(self, Edge(_, edge_id): Edge) {
self.graph.0 = self.graph.0 + edge_id;
}
pub fn join(self, Edge(_, edge0_id): Edge, Edge(_, edge1_id): Edge) {
self.graph.0 = self.graph.0 + edge0_id + edge1_id;
}
}
Now all we have to do is guard the construction of GraphView objects such that there’s never more than one with a given lifetime parameter 'a.
We can do this by (1) forcing GraphView<'a> to be invariant over 'a with a PhantomData member as above, and (2) only ever providing a constructed GraphView to a closure with a higher-rank trait bound that creates a fresh lifetime each time:
impl Graph {
pub fn with_view<Ret>(&mut self, f: impl for<'a> FnOnce(GraphView<'a>) -> Ret) -> Ret {
f(GraphView {
graph: self,
marker: PhantomData,
})
}
}
fn main() {
let mut graph = Graph(0);
graph.with_view(|view| {
let edge = view.get_edge();
view.split(edge);
});
}
Full demo on Rust Playground.
This isn’t totally ideal, since the caller may have to go through contortions to put all its operations inside the closure. But I think it’s the best we can do in the current Rust language, and it does allow us to enforce a huge class of compile-time guarantees that almost no other language can express at all. I’d love to see more ergonomic support for this pattern added to the language somehow—perhaps a way to create a fresh lifetime via a return value rather than a closure parameter (pub fn view(&mut self) -> exists<'a> GraphView<'a>)?
I ran into a problem that simplifies into the following:
struct MyIter {
vec: Vec<i8>,
}
fn fill_with_useful_data(v: &mut Vec<i8>) {
/* ... */
}
impl<'a> Iterator for MyIter {
type Item = &'a [i8];
fn next(&mut self) -> Option<&'a [i8]> {
fill_with_useful_data(&mut self.vec);
Some(&self.vec)
}
}
fn main() {
for slice in (MyIter { vec: Vec::new() }) {
println!("{}", slice);
}
}
This generates the error:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:9:6
|
9 | impl<'a> Iterator for MyIter {
| ^^ unconstrained lifetime parameter
The idea is that the iterator does a bunch of work that reflects in its fields and at each step, it yields a reference into itself to the calling code. In this case I could model it as yielding a copy of the state instead of the reference, but let's pretend that's not possible or just inconveniently expensive.
Intuitively this shouldn't be a problem because the borrow checker can ensure that .next() isn't called again while the yielded reference can still be used to inspect the iterator's state, but the Iterator trait doesn't seem to provide for that sort of thing directly. Even with some permutations like only holding on to a reference to the vector in the iterator itself or making the iterator a reference or something to get the lifetimes baked into the type earlier on, I can't get anything past the borrow checker.
I read the "Iterators yielding mutable references" blogpost but I'm not sure if/how it applies to my problem that doesn't involve mutable references.
This is not possible. If it were allowed one could call next again and thus modify data that is also visible via & or even invalidate the reference entirely. This is because there is no connection between the self object itself and the returned reference: there is no explicit lifetime linking them.
For the compiler to reason about this and allow returning a reference into self next needs a signature like
fn next(&'a mut self) -> Option<&'a [i8]>
However, this differs from the signature of the trait which is not allowed as generic code that just takes an T: Iterator<...> cannot tell that there are different requirements on the use of the return value for some T; all have to be handled identically.
The Iterator trait is designed for return values that are independent of the iterator object, which is necessary for iterator adaptors like .collect to be correct and safe. This is more restrictive than necessary for many uses (e.g. a transient use inside a for loop) but it is just how it is at the moment. I don't think we have the tools for generalising this trait/the for loop properly now (specifically, I think we need associated types with higher rank lifetimes), but maybe in the future.