Extend lifetime of struct to exceed that of another struct - rust

I'm trying to make a lexer for HTML, which requires swapping between states. Because states have to store data, I decided to make each state a struct. The problem is dealing with the lifetime of said state, because it requires storing references to previous states.
Currently, I have the following system:
pub struct Lexer<'a> {
input: Peekable<Chars<'a>>,
state: Box<dyn LexerState>,
}
trait LexerState {
fn next_token(&mut self, lexer: &mut Lexer) -> Token;
}
pub struct DataState {}
impl LexerState for DataState {
fn next_token(&mut self, lexer: &mut Lexer) -> Token {
lexer.state = Box::new(CharacterReferenceState { return_state: &lexer.state });
lexer.state.next_token(lexer)
}
}
pub struct CharacterReferenceState<'a> {
return_state: &'a Box<dyn LexerState>,
}
impl<'a> LexerState for CharacterReferenceState<'a> {
fn next_token(&mut self, lexer: &mut Lexer) -> Token {}
}
(for the sake of brevity, I've excluded irrelevant code)
This gives the error: "lexer has an anonymous lifetime '_ but it needs to satisfy a 'static lifetime requirement"
From what I understand, this means I need some guarantee that lexer is valid for the whole time LexerState is valid, because if lexer is invalid, then the state is invalid. From my research, it seems like 'static lifetime would fix this problem, but that raises a few concerns:
Can I fix this issue without static/is there a better design pattern for this?
Will the static lifetime variable be properly invalidated and freed after the parsing is complete?

As pointed out in the comments, the compiler error was a red herring and the actual problem was the self-referential reference. To fix this, I just transferred ownership of the Box instead.

Related

Proper way in Rust to store a reference in a struct

What is the proper way to store a reference in a struct and operate on it given this example:
// Trait that cannot be changed
pub trait FooTrait {
pub fn open(&self, client: &SomeType);
pub fn close(&self);
}
pub struct Foo {
// HOW TO STORE IT HERE???
// client: &SomeType,
}
impl FooTrait for Foo {
pub fn open(&self, client: &SomeType) {
// HOW TO SAVE IT HERE?
// NOTE that &self cannot be changed into &mut self because the trait cannot be modified
// smth like self.client = client;
}
pub fn close(&self) {
// HOW TO DELETE IT HERE?
// NOTE that &self cannot be changed into &mut self because the trait cannot be modified
}
}
Is there a design pattern that could fit to my snippet?
This is horribly complicated on its surface because of lifetime issues. Rust is designed to guarantee memory safety, but this pattern creates an untenable situation where the caller of FooTrait::open() needs some way to tell Rust that the client borrow will outlive *self. If it can't do that, Rust will disallow the method call. Actually making this work with references is probably not feasible, as Foo needs a lifetime parameter, but the code that creates the Foo may not know the appropriate lifetime parameter.
You can make this pattern work by combining a few things, but only if you can modify the trait. If you can't change the definition of the trait, then what you are asking is impossible.
You need an Option so that close can clear the value.
You need interior mutability (a Cell) to allow mutating self.client even if self is a shared reference.
You need something other than a bare reference. An owned value or a shared ownership type like Rc or Arc, for example. These types sidestep the lifetime issue entirely. You can make the code generic over Borrow<T> to support them all at once.
use std::cell::Cell;
use std::borrow::Borrow;
pub trait FooTrait {
fn open(&self, client: impl Borrow<SomeType> + 'static);
fn close(&self);
}
pub struct SomeType;
pub struct Foo {
client: Cell<Option<Box<dyn Borrow<SomeType>>>>,
}
impl FooTrait for Foo {
fn open(&self, client: impl Borrow<SomeType> + 'static) {
self.client.set(Some(Box::new(client)));
}
fn close(&self) {
self.client.set(None);
}
}

Understanding "the trait X cannot be made into an object" for `&mut Box<Self>` parameter

I've got this code snippet (playground):
struct TeddyBear {
fluffiness: u8,
}
trait Scruffy {
fn scruff_up(self: &mut Box<Self>) -> Box<dyn Scruffy>;
}
impl Scruffy for TeddyBear {
fn scruff_up(self: &mut Box<Self>) -> Box<dyn Scruffy> {
// do something about the TeddyBear's fluffiness
}
}
It doesn't compile. The error is:
the trait Scruffy cannot be made into an object
, along with the hint:
because method scruff_up's self parameter cannot be dispatched on.
I checked the "E0038" error description, but I haven't been able to figure out which category my error falls into.
I also read the "object-safety" entry in "The Rust Reference", and I believe this matches the "All associated functions must either be dispatchable from a trait object", but I'm not sure, partly because I'm not sure what "receiver" means in that context.
Can you please clarify for me what's the problem with this code and why it doesn't work?
The problem is when you pass it in as a reference, because the inner type may not be well-sized (e.g. a trait object, like if you passed in a Box<Fluffy>) the compiler doesn't have enough information to figure out how to call methods on it. If you restrict it to sized objects (like your TeddyBear) it should compile
trait Scruffy {
fn scruff_up(self: &mut Box<Self>) -> Box<dyn Scruffy> where Self: Sized;
}
A receiver is the self (&self, &mut self, self: &mut Box<Self> and so on).
Note that the list you cited from the reference lists both Box<Self> and &mut Self, but does not list &mut Box<Self> nor it says that combinations of these types are allowed.
This is, indeed, forbidden. As for the why, it is a little more complex.
In order for a type to be a valid receiver, it needs to hold the following condition:
Given any type Self that implements Trait and the receiver type Receiver, the receiver type should implement DispatchFromDyn<dyn Trait> for itself with all Self occurrences of Self replaced with dyn Trait.
For instance:
&self (has the type &Self) has to implement DispatchFromDyn<&dyn Trait>, which it does.
Box<Self> has to implement DispatchFromDyn<Box<dyn Trait>>, which it does.
But in order for &mut Box<Self> to be an object-safe receiver, it would need to impl DispatchFromDyn<&mut Box<dyn Trait>>. What you want is kind of blanket implementation DispatchFromDyn<&mut T> for &mut U where U: DispatchFromDyn<T>.
This impl will never exist. Because it is unsound (even ignoring coherence problems).
As explained in the code in rustc that calculates this:
The only case where the receiver is not dispatchable, but is still a valid receiver type (just not object-safe), is when there is more than one level of pointer indirection. E.g., self: &&Self, self: &Rc<Self>, self: Box<Box<Self>>. In these cases, there is no way, or at least no inexpensive way, to coerce the receiver from the version where Self = dyn Trait to the version where Self = T, where T is the unknown erased type contained by the trait object, because the object that needs to be coerced is behind a pointer.
The problem is inherent to how Rust handles dyn Trait.
dyn Trait is a fat pointer: it is actually two words sized. One is a pointer to the data, and the other is a pointer to the vtable.
When you call a method on dyn Trait, the compiler looks up in the vtable, find the method for the concrete type (which is unknown at compilation time, but known at runtime), and calls it.
This all may be very abstract without an example:
trait Trait {
fn foo(&self);
}
impl Trait for () {
fn foo(&self) {}
}
fn call_foo(v: &dyn Trait) {
v.foo();
}
fn create_dyn_trait(v: &impl Trait) {
let v: &dyn Trait = v;
call_foo(v);
}
The compiler generates code like:
trait Trait {
fn foo(&self);
}
impl Trait for () {
fn foo(&self) {}
}
struct TraitVTable {
foo: fn(*const ()),
}
static TRAIT_FOR_UNIT_VTABLE: TraitVTable = TraitVTable {
foo: unsafe { std::mem::transmute(<() as Trait>::foo) },
};
type DynTraitRef = (*const (), &'static TraitVTable);
impl Trait for dyn Trait {
fn foo(self: DynTraitRef) {
(self.1.foo)(self.0)
}
}
fn call_foo(v: DynTraitRef) {
v.foo();
}
fn create_dyn_trait(v: &impl Trait) {
let v: DynTraitRef = (v as *const (), &TRAIT_FOR_UNIT_VTABLE);
call_foo(v);
}
Now suppose that the pointer to the value is behind an indirection. I'll use Box<&self> because it's simple but demonstrates the concept best, but the concept applies to &mut Box<Self> too: they have the same layout. How will we write foo() for impl Trait for dyn Trait?
trait Trait {
fn foo(self: Box<&Self>);
}
impl Trait for () {
fn foo(self: Box<&Self>) {}
}
struct TraitVTable {
foo: fn(Box<*const ()>),
}
static TRAIT_FOR_UNIT_VTABLE: TraitVTable = TraitVTable {
foo: unsafe { std::mem::transmute(<() as Trait>::foo) },
};
type DynTraitRef = (*const (), &'static TraitVTable);
impl Trait for dyn Trait {
fn foo(self: Box<DynTraitRef>) {
let concrete_foo: fn(Box<*const ()>) = self.1.foo;
let data: *const () = self.0;
concrete_foo(data) // We need to wrap `data` in `Box`! Error.
}
}
You may think "then the compiler should just insert a call to Box::new()!" But besides Box not being the only one here (what with Rc, for example?) and we will need some trait to abstract over this behavior, Rust never performs any hard work implicitly. This is a design choice, and an important one (as opposed to e.g. C++, where an innocent-looking statement like auto v1 = v; can allocate and copy 10GB by a copy constructor). Converting a type to dyn Trait and back is done implicitly: the first one by a coercion, the second one when you call a method of the trait. Thus, the only thing that Rust does for that is attaching a VTable pointer in the first case, or discarding it in the second case. Even allowing only references (&&&Self, no need to call a method, just take the address of a temporary) exceeds that. And it can have severe implications in unexpected places, e.g. register allocation.
So, what to do? You can take &mut self or self: Box<Self>. Which one to choose depends on whether you need ownership (use Box) or not (use a reference). And anyway, &mut Box<Self> is not so useful (its only advantage over &mut T is that you can replace the box and not just its contents, but when you do that that's usually a mistake).

How can I pass something string-like with the `Read` trait to an implementation of `From`?

I'm writing a tokenizer, and for convenience I wrote a Reader object, that returns words one at a time. When words is exhausted, it reads from the BufReader to populate the words. Accordingly, I figured that file and words should both live in the struct.
The problem I'm having is that I want to test it by passing in strings to be tokenized, rather than having to rely on files. That's why I tried to implement From on both a File and then &str and String. The latter two don't work (as highlighted below).
I tried to annotate Reader with a lifetime, that I then used in the implementation of From<&'a str>, but that didn't work. I ended up with a Reader<'a, T: Read>, but the compiler complained that nothing used the lifetime parameter.
An alternative implementation of From<&'static str> works fine, but that means any strings passed in have to exist for the static lifetime.
I also saw this question/answer, but it seems to be different since their Enum has a lifetime parameter.
I have two supplementary question along with my overall question in the title:
I also saw FromStr, but haven't tried to use that yet - is it appropriate for this?
Are my code comments re variable ownership/lifetimes below correct?
My minimal example is here (with imports elided):
#[derive(Debug)]
struct Reader<T: Read> {
file: BufReader<T>,
words: Vec<String>,
}
impl From<File> for Reader<File> {
fn from(value: File) -> Self { // value moves into from
Reader::new(BufReader::new(value)) // value moves into BufReader(?)
}
}
// THE NEXT TWO DON'T WORK
impl From<&str> for Reader<&[u8]> {
fn from(value: &str) -> Self { // Compiler can't know how long the underlying data lives
Reader::new(BufReader::new(value.as_bytes())) // The data may not live as long as BufReader
}
}
impl From<String> for Reader<&[u8]> {
fn from(value: String) -> Self { // value moves into from
Reader::new(BufReader::new(value.as_bytes())) // value doesn't move into BufReader or Reader
} // value gets dropped
}
impl<T: Read> Reader<T> {
fn new(input: BufReader<T>) -> Self {
Self {
file: input,
words: vec![],
}
}
}
The &str one compiles with lifetime annotations (playground):
impl<'a> From<&'a str> for Reader<&'a [u8]> {
fn from(value: &'a str) -> Self {
Reader::new(BufReader::new(value.as_bytes()))
}
}
As discussed in the comments, you need to only annotate the reference, not try to incorporate lifetime annotations into the Reader itself.
Note that the same approach doesn't work for String because the signature of from moves it into the function, and the function cannot return bytes that belong to a local variable. You could implement it for &String, but then you can as well use &str.

Making a struct outlive a parameter given to a method of that struct

I am looking for a way to ensure a struct outlives the parameter given to a method of that struct.
Even if the struct doesn't hold a reference to that data after leaving the method.
This is for wrapped raw pointers fed to an FFI. I want to guarantee that the struct implementing the FFI outlives the Option<&'a Any> I use to feed the Rust object to the pointer wrapper.
Context is the FFI wrapper.
Data holds different types that map to FFI types. The FFI functions copies all these types immediately before returning.
Except raw pointers.
So I add a lifetime specifier to Context just for those and use that in send_data().
But somehow this is not enough. I expected below code to not compile.
Edit: someone one the Rust Discord suggested making &self mutable in send_data(). This has the desired effect but my FFI is thread safe (and stateless) and send_data() is time critical. So I would very much like to avoid this.
use std::any::Any;
use std::marker::PhantomData;
struct IntegerArray<'a> {
data: &'a [i32],
}
struct WrappedRawPointer<'a> {
ptr: *const std::ffi::c_void,
_marker: PhantomData<&'a ()>,
}
impl<'a> WrappedRawPointer<'a> {
fn new(data: Option<&'a dyn Any>) -> Self {
Self {
ptr: data
.map(|p| p as *const _ as *const std::ffi::c_void)
.unwrap_or(std::ptr::null()),
_marker: PhantomData,
}
}
}
enum Data<'a, 'b> {
IntegerArray(IntegerArray<'a>),
WrappedRawPointer(WrappedRawPointer<'b>),
}
struct Context<'a> {
ctx: u32,
_marker: PhantomData<&'a ()>,
}
impl<'a> Context<'a> {
fn new() -> Self {
Self {
ctx: 0, // Call FFI to initialize context
_marker: PhantomData,
}
}
fn send_data(&self, data: Data<'_, 'a>) {
match data {
Data::IntegerArray(_i) => (), // Call FFI function
Data::WrappedRawPointer(_p) => (), // Call FFI function
}
}
}
fn main() {
let ctx = Context::new();
{
let some_float: f32 = 42.0;
ctx.send_data(
Data::WrappedRawPointer(
WrappedRawPointer::new(
Some(&some_float)
)
)
);
// I would like rustc to complain
// here that some_float does not
// outlive ctx
}
// Explicitly drop outside
// the previous block to
// prevent rustc from being
// clever
drop(ctx);
}
Making send_data take &mut self instead of &self works because it makes the type of the self parameter invariant with respect to the type Self. Subtyping and Variance is described in the Rustonomicon, as well as other questions here on Stack Overflow (see below).
Since you want invariance even when self is an immutable reference, that suggests that the variance of Context<'a> itself is wrong: it is covariant in 'a, but it should be invariant. You can fix this by changing the type argument to PhantomData to something that is also invariant in 'a:
struct Context<'a> {
ctx: u32,
_marker: PhantomData<*mut &'a ()>, // or Cell<&'a ()>, or fn(&'a ()) -> &'a (), etc.
}
PhantomData is not just something you add mechanically to make the compiler not yell at you. The specific form of the type argument to PhantomData tells the compiler how your struct is related to its type and lifetime parameters (when the compiler can't figure it out by itself). In this case you want to tell the compiler that a Context<'some_long_lifetime> can't be substituted for a Context<'a_much_shorter_lifetime> even though its fields would all allow that substitution.
Some more questions on variance
How can this instance seemingly outlive its own parameter lifetime?
Why does linking lifetimes matter only with mutable references?
How do I share a struct containing a phantom pointer among threads? (may be relevant if Context should be Send or Sync)

Why "explicit lifetime bound required" for Box<T> in struct?

Editor's note: This code no longer produces the same error after RFC 599 was implemented, but the concepts discussed in the answers are still valid.
I'm trying to compile this code:
trait A {
fn f(&self);
}
struct S {
a: Box<A>,
}
and I'm getting this error:
a.rs:6:13: 6:14 error: explicit lifetime bound required
a.rs:6 a: Box<A>,
I want S.a to own an instance of A, and don't see how that lifetime is appropriate here. What do I need to do to make the compiler happy?
My Rust version:
rustc --version
rustc 0.12.0-pre-nightly (79a5448f4 2014-09-13 20:36:02 +0000)
(Slightly pedantic point: that A is a trait, so S is not owning an instance of A, it is owning an boxed instance of some type that implements A.)
A trait object represents data with some unknown type, that is, the only thing known about the data is that it implements the trait A. Because the type is not known, the compiler cannot directly reason about the lifetime of the contained data, and so requires that this information is explicitly stated in the trait object type.
This is done via Trait+'lifetime. The easiest route is to just use 'static, that is, completely disallow storing data that can become invalid due to scopes:
a: Box<A + 'static>
Previously, (before the possibility of lifetime-bounded trait objects and this explicit lifetime bound required error message was introduced) all boxed trait objects were implicitly 'static, that is, this restricted form was the only choice.
The most flexible form is exposing the lifetime externally:
struct S<'x> {
a: Box<A + 'x>
}
This allows S to store a trait object of any type that implements A, possibly with some restrictions on the scopes in which the S is valid (i.e. for types for which 'x is less than 'static the S object will be trapped within some stack frame).
The problem here is that a trait can be implemented for references too, so if you don't specify the required lifetime for Box anything could be stored in there.
You can see about lifetime requirements in this rfc.
So one possible solution is to bind the lifetime so Send (we put I in S):
trait A {
fn f(&self);
}
struct I;
impl A for I {
fn f(&self) {
println!("A for I")
}
}
struct S {
a: Box<A + Send>
}
fn main() {
let s = S {
a: box I
};
s.a.f();
}
The other is setting the lifetime to 'a (we can put a reference &I or I to S):
trait A {
fn f(&self);
}
struct I;
impl A for I {
fn f(&self) {
println!("A for I")
}
}
impl <'a> A for &'a I {
fn f(&self) {
println!("A for &I")
}
}
struct S<'a> {
a: Box<A + 'a>
}
fn main() {
let s = S {
a: box &I
};
s.a.f();
}
Note that this is more general and we can store both references and owned data (Send kind that has a lifetime of 'static) but you require a lifetime parameter everywhere the type is used.

Resources