Vec of struct with lifetimes - rust

I'm trying to use RustyBuzz to do some text display. Shaping is done by a struct called Face<'a> which contains a reference to bytes from a font file. I would like to allow for loading fonts "on the fly", e.g. a user inputs a path to a font file, the file is loaded, a new Face<'a> is created and added to a container for fonts. How can I create such a font container, given the lifetime parameter in Face<'a>?
Naïvely, one would do something like this:
#[derive(Default)]
struct FontAtlas<'a> {
fonts : Vec<Face<'a>>
}
impl<'a> FontAtlas<'a> {
fn add_new_font(&mut self, bytes : &'a [u8]) {
self.fonts.push(Face::from_slice(bytes, 0).unwrap());
}
}
Then schematically, in a main function:
fn main() {
let mut atlas = FontAtlas::default();
loop {
let font_path = get_user_input();
let font_bytes = std::fs::read(font_path).unwrap();
atlas.add_new_font(&font_bytes);
}
}
This does not work because atlas outlives font_bytes. One could make font_bytes longer-lived like so:
fn main() {
let mut font_data : Vec<Vec<u8>> = Vec::new(); // holds all bytes from font files
let mut atlas = FontAtlas::default();
loop {
let font_path = get_user_input();
let font_bytes = std::fs::read(font_path).unwrap();
font_data.push(font_bytes); // mutable borrow invalidates all refs in atlas
atlas.add_new_font(font_data.last().unwrap()); // 'font_data.last()' will live longer than 'atlas', so we're good on this side
}
}
But with the lifetime restrictions, this violates borrowing rules: font_data must be borrowed immutably for the whole loop, which prevents the mutable borrow needed to push to font_data.
Is there any way I can achieve font loading "on the fly"? Or is this intrinsically "unsafe"?

This is a hack, but you can leak a Box to get a reference with 'static lifetime:
fn main() {
let mut atlas = FontAtlas::default();
loop {
let font_path = get_user_input();
let font_bytes = Box::leak(std::fs::read(font_path).unwrap().into_boxed_slice());
atlas.add_new_font(font_bytes);
}
}
This assumes that the byte data must exist for the entire duration of the program.
You can wrap up this basic idea to create a safe abstraction of a "registry of owned byte slices" that also cleans up its own allocations:
use std::cell::RefCell;
#[derive(Default)]
struct ByteRegistry(RefCell<Vec<*mut [u8]>>);
impl ByteRegistry {
pub fn new() -> Self {
Self::default()
}
// Note this can take self by shared reference due to the use of RefCell.
pub fn add(&self, bytes: impl Into<Box<[u8]>>) -> &[u8] {
let data = Box::into_raw(bytes.into());
self.0.borrow_mut().push(data);
// SAFETY: We own the data, and the reference is tied to our lifetime,
// so it will be released before we are dropped.
unsafe { &*data }
}
}
impl Drop for ByteRegistry {
fn drop(&mut self) {
for data in self.0.take().into_iter() {
// SAFETY: We obtained the pointers from Box::into_raw() and all
// borrows from add() have ended by now.
unsafe { drop(Box::from_raw(data)) }
}
}
}

Related

Writing to a file or String in Rust

TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.

Is it undefined behavior to do runtime borrow management with the help of raw pointers in Rust?

As part of binding a C API to Rust, I have a mutable reference ph: &mut Ph, a struct struct EnsureValidContext<'a> { ph: &'a mut Ph }, and some methods:
impl Ph {
pub fn print(&mut self, s: &str) {
/*...*/
}
pub fn with_context<F, R>(&mut self, ctx: &Context, f: F) -> Result<R, InvalidContextError>
where
F: Fn(EnsureValidContext) -> R,
{
/*...*/
}
/* some others */
}
impl<'a> EnsureValidContext<'a> {
pub fn print(&mut self, s: &str) {
self.ph.print(s)
}
pub fn close(self) {}
/* some others */
}
I don't control these. I can only use these.
Now, the closure API is nice if you want the compiler to force you to think about performance (and the tradeoffs you have to make between performance and the behaviour you want. Context validation is expensive). However, let's say you just don't care about that and want it to just work.
I was thinking of making a wrapper that handles it for you:
enum ValidPh<'a> {
Ph(&'a mut Ph),
Valid(*mut Ph, EnsureValidContext<'a>),
Poisoned,
}
impl<'a> ValidPh<'a> {
pub fn print(&mut self) {
/* whatever the case, just call .print() on the inner object */
}
pub fn set_context(&mut self, ctx: &Context) {
/*...*/
}
pub fn close(&mut self) {
/*...*/
}
/* some others */
}
This would work by, whenever necessary, checking if we're a Ph or a Valid, and if we're a Ph we'd upgrade to a Valid by going:
fn upgrade(&mut self) {
if let Ph(_) = self { // don't call mem::replace unless we need to
if let Ph(ph) = mem::replace(self, Poisoned) {
let ptr = ph as *mut _;
let evc = ph.with_context(ph.get_context(), |evc| evc);
*self = Valid(ptr, evc);
}
}
}
Downgrading is different for each method, as it has to call the target method, but here's an example close:
pub fn close(&mut self) {
if let Valid(_, _) = self {
/* ok */
} else {
self.upgrade()
}
if let Valid(ptr, evc) = mem::replace(self, Invalid) {
evc.close(); // consume the evc, dropping the borrow.
// we can now use our original borrow, but since we don't have it anymore, bring it back using our trusty ptr
*self = unsafe { Ph(&mut *ptr) };
} else {
// this can only happen due to a bug in our code
unreachable!();
}
}
You get to use a ValidPh like:
/* given a &mut vph */
vph.print("hello world!");
if vph.set_context(ctx) {
vph.print("closing existing context");
vph.close();
}
vph.print("opening new context");
vph.open("context_name");
vph.print("printing in new context");
Without vph, you'd have to juggle &mut Ph and EnsureValidContext around on your own. While the Rust compiler makes this trivial (just follow the errors), you may want to let the library handle it automatically for you. Otherwise you might end up just calling the very expensive with_context for every operation, regardless of whether the operation can invalidate the context or not.
Note that this code is rough pseudocode. I haven't compiled or tested it yet.
One might argue I need an UnsafeCell or a RefCell or some other Cell. However, from reading this it appears UnsafeCell is only a lang item because of interior mutability — it's only necessary if you're mutating state through an &T, while in this case I have &mut T all the way.
However, my reading may be flawed. Does this code invoke UB?
(Full code of Ph and EnsureValidContext, including FFI bits, available here.)
Taking a step back, the guarantees upheld by Rust are:
&T is a reference to T which is potentially aliased,
&mut T is a reference to T which is guaranteed not to be aliased.
The crux of the question therefore is: what does guaranteed not to be aliased means?
Let's consider a safe Rust sample:
struct Foo(u32);
impl Foo {
fn foo(&mut self) { self.bar(); }
fn bar(&mut self) { *self.0 += 1; }
}
fn main() { Foo(0).foo(); }
If we take a peek at the stack when Foo::bar is being executed, we'll see at least two pointers to Foo: one in bar and one in foo, and there may be further copies on the stack or in other registers.
So, clearly, there are aliases in existence. How come! It's guaranteed NOT to be aliased!
Take a deep breath: how many of those aliases can you access at the time?
Only 1. The guarantee of no aliasing is not spatial but temporal.
I would think, therefore, that at any point in time, if a &mut T is accessible, then no other reference to this instance must be accessible.
Having a raw pointer (*mut T) is perfectly fine, it requires unsafe to access; however forming a second reference may or may not be safe, even without using it, so I would avoid it.
Rust's memory model is not rigorously defined yet, so it's hard to say for sure, but I believe it's not undefined behavior to:
carry a *mut Ph around while a &'a mut Ph is also reachable from another path, so long as you don't dereference the *mut Ph, even just for reading, and don't convert it to a &Ph or &mut Ph, because mutable references grant exclusive access to the pointee.
cast the *mut Ph back to a &'a mut Ph once the other &'a mut Ph falls out of scope.

Why does this variable definition imply static lifetime?

I'm trying to execute a function on chunks of a vector and then send the result back using the message passing library.
However, I get a strange error about the lifetime of the vector that isn't even participating in the thread operations:
src/lib.rs:153:27: 154:25 error: borrowed value does not live long enough
src/lib.rs:153 let extended_segments = (segment_size..max_val)
error: src/lib.rs:154 .collect::<Vec<_>>()borrowed value does not live long enough
note: reference must be valid for the static lifetime...:153
let extended_segments = (segment_size..max_val)
src/lib.rs:153:3: 155:27: 154 .collect::<Vec<_>>()
note: but borrowed value is only valid for the statement at 153:2:
reference must be valid for the static lifetime...
src/lib.rs:
let extended_segments = (segment_size..max_val)
consider using a `let` binding to increase its lifetime
I tried moving around the iterator and adding lifetimes to different places, but I couldn't get the checker to pass and still stay on type.
The offending code is below, based on the concurrency chapter in the Rust book. (Complete code is at github.)
use std::sync::mpsc;
use std::thread;
fn sieve_segment(a: &[usize], b: &[usize]) -> Vec<usize> {
vec![]
}
fn eratosthenes_sieve(val: usize) -> Vec<usize> {
vec![]
}
pub fn segmented_sieve_parallel(max_val: usize, mut segment_size: usize) -> Vec<usize> {
if max_val <= ((2 as i64).pow(16) as usize) {
// early return if the highest value is small enough (empirical)
return eratosthenes_sieve(max_val);
}
if segment_size > ((max_val as f64).sqrt() as usize) {
segment_size = (max_val as f64).sqrt() as usize;
println!("Segment size is larger than √{}. Reducing to {} to keep resource use down.",
max_val,
segment_size);
}
let small_primes = eratosthenes_sieve((max_val as f64).sqrt() as usize);
let mut big_primes = small_primes.clone();
let (tx, rx): (mpsc::Sender<Vec<usize>>, mpsc::Receiver<Vec<usize>>) = mpsc::channel();
let extended_segments = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
for this_segment in extended_segments.clone() {
let small_primes = small_primes.clone();
let tx = tx.clone();
thread::spawn(move || {
let sieved_segment = sieve_segment(&small_primes, this_segment);
tx.send(sieved_segment).unwrap();
});
}
for _ in 1..extended_segments.count() {
big_primes.extend(&rx.recv().unwrap());
}
big_primes
}
fn main() {}
How do I understand and avoid this error? I'm not sure how to make the lifetime of the thread closure static as in this question and still have the function be reusable (i.e., not main()). I'm not sure how to "consume all things that come into [the closure]" as mentioned in this question. And I'm not sure where to insert .map(|s| s.into()) to ensure that all references become moves, nor am I sure I want to.
When trying to reproduce a problem, I'd encourage you to create a MCVE by removing all irrelevant code. In this case, something like this seems to produce the same error:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
}
fn main() {}
Let's break that down:
Create an iterator between numbers.
Collect all of them into a Vec<usize>.
Return an iterator that contains references to the vector.
Since the vector isn't bound to any variable, it's dropped at the end of the statement. This would leave the iterator pointing to an invalid region of memory, so that's disallowed.
Check out the definition of slice::chunks:
fn chunks(&self, size: usize) -> Chunks<T>
pub struct Chunks<'a, T> where T: 'a {
// some fields omitted
}
The lifetime marker 'a lets you know that the iterator contains a reference to something. Lifetime elision has removed the 'a from the function, which looks like this, expanded:
fn chunks<'a>(&'a self, size: usize) -> Chunks<'a, T>
Check out this line of the error message:
help: consider using a let binding to increase its lifetime
You can follow that as such:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>();
let bar = foo.chunks(segment_size);
}
fn main() {}
Although I'd write it as
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo: Vec<_> = (segment_size..max_val).collect();
let bar = foo.chunks(segment_size);
}
fn main() {}
Re-inserting this code back into your original problem won't solve the problem, but it will be much easier to understand. That's because you are attempting to pass a reference to thread::spawn, which may outlive the current thread. Thus, everything passed to thread::spawn must have the 'static lifetime. There are tons of questions that detail why that must be prevented and a litany of solutions, including scoped threads and cloning the vector.
Cloning the vector is the easiest, but potentially inefficient:
for this_segment in extended_segments.clone() {
let this_segment = this_segment.to_vec();
// ...
}

Why does the Rust borrow checker ignore drop()? [duplicate]

I am writing a program that writes to a file and rotates the file it's writing to every now and then. When I check to rotate the file, I can't seem to change the file since it is borrowed by my struct. Even if I drop the instance of the struct, I can't seem to regain ownership of the file to rename it.
Here is my example:
use std::fs::File;
use std::io::{Write};
use std::mem::{drop};
pub struct FileStruct<W: Write> {
pub writer: Option<W>,
}
impl <W: Write> FileStruct<W> {
pub fn new(writer: W) -> FileStruct<W> {
FileStruct {
writer: Some(writer),
}
}
}
fn main() {
let mut file = File::create("tmp.txt").unwrap();
let mut tmp = FileStruct::new(&mut file);
loop {
if true { //will be time based if check
drop(tmp);
drop(file);
file = File::create("tmp2.txt").unwrap();
tmp = FileStruct::new(&mut file);
}
// write to file
}
}
I know I can get this to work by moving the file creation into the new function call of FileStruct instead of having an intermediate variable, file, but I would like to know why this method where I forcibly drop all the variables where all the variables references should be returned doesn't work.
As the std::mem::drop documentation says,
While this does call the argument's implementation of Drop, it will not release any borrows, as borrows are based on lexical scope.
So even if you call drop, file will remain borrowed nonetheless.
Dropping tmp does not "release the borrow" of file because borrowing is lexically scoped. It's "active" as long as the program execution is within the lexical scope that contains tmp even if you drop it. What you intended to do might be possible in the future if/once "non-lexical scopes" are supported. Until then, you can make it work with RefCell:
use std::cell::RefCell;
use std::io::{ self, Write };
/// wraps a reference to a RefCell<W>
struct RefCellWriteRef<'a, W: 'a>(&'a RefCell<W>);
/// implement Write for when W is Write
impl<'a, W: Write + 'a> Write for RefCellWriteRef<'a, W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
let mut w = self.0.borrow_mut();
w.write(buf)
}
fn flush(&mut self) -> io::Result<()> {
let mut w = self.0.borrow_mut();
w.flush()
}
}
fn main() {
let file: RefCell<Vec<u8>> = RefCell::new(Vec::new());
// use RefCellWriteRef(&file) instead of &mut file
let mut tmp = RefCellWriteRef(&file);
for iter in 0..10 {
if iter == 5 {
drop(tmp);
file.borrow_mut().clear(); // like opening a new file
tmp = RefCellWriteRef(&file);
}
tmp.write(b"foo").unwrap();
}
drop(tmp);
println!("{}", file.borrow().len()); // should print 15
}
The trick here is that given a shared reference to a RefCell<T> you can (eventually) get a &mut T via borrow_mut(). The compile-time borrow checker is pleased because we only use a shared reference on the surface and it's OK to share file like that. Mutable aliasing is avoided by checking at runtime whether the internal T has already been mutably borrowed.

"cannot move out of variable because it is borrowed" when rotating variables

I am writing a program that writes to a file and rotates the file it's writing to every now and then. When I check to rotate the file, I can't seem to change the file since it is borrowed by my struct. Even if I drop the instance of the struct, I can't seem to regain ownership of the file to rename it.
Here is my example:
use std::fs::File;
use std::io::{Write};
use std::mem::{drop};
pub struct FileStruct<W: Write> {
pub writer: Option<W>,
}
impl <W: Write> FileStruct<W> {
pub fn new(writer: W) -> FileStruct<W> {
FileStruct {
writer: Some(writer),
}
}
}
fn main() {
let mut file = File::create("tmp.txt").unwrap();
let mut tmp = FileStruct::new(&mut file);
loop {
if true { //will be time based if check
drop(tmp);
drop(file);
file = File::create("tmp2.txt").unwrap();
tmp = FileStruct::new(&mut file);
}
// write to file
}
}
I know I can get this to work by moving the file creation into the new function call of FileStruct instead of having an intermediate variable, file, but I would like to know why this method where I forcibly drop all the variables where all the variables references should be returned doesn't work.
As the std::mem::drop documentation says,
While this does call the argument's implementation of Drop, it will not release any borrows, as borrows are based on lexical scope.
So even if you call drop, file will remain borrowed nonetheless.
Dropping tmp does not "release the borrow" of file because borrowing is lexically scoped. It's "active" as long as the program execution is within the lexical scope that contains tmp even if you drop it. What you intended to do might be possible in the future if/once "non-lexical scopes" are supported. Until then, you can make it work with RefCell:
use std::cell::RefCell;
use std::io::{ self, Write };
/// wraps a reference to a RefCell<W>
struct RefCellWriteRef<'a, W: 'a>(&'a RefCell<W>);
/// implement Write for when W is Write
impl<'a, W: Write + 'a> Write for RefCellWriteRef<'a, W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
let mut w = self.0.borrow_mut();
w.write(buf)
}
fn flush(&mut self) -> io::Result<()> {
let mut w = self.0.borrow_mut();
w.flush()
}
}
fn main() {
let file: RefCell<Vec<u8>> = RefCell::new(Vec::new());
// use RefCellWriteRef(&file) instead of &mut file
let mut tmp = RefCellWriteRef(&file);
for iter in 0..10 {
if iter == 5 {
drop(tmp);
file.borrow_mut().clear(); // like opening a new file
tmp = RefCellWriteRef(&file);
}
tmp.write(b"foo").unwrap();
}
drop(tmp);
println!("{}", file.borrow().len()); // should print 15
}
The trick here is that given a shared reference to a RefCell<T> you can (eventually) get a &mut T via borrow_mut(). The compile-time borrow checker is pleased because we only use a shared reference on the surface and it's OK to share file like that. Mutable aliasing is avoided by checking at runtime whether the internal T has already been mutably borrowed.

Resources