Why does the Rust borrow checker ignore drop()? [duplicate] - rust

I am writing a program that writes to a file and rotates the file it's writing to every now and then. When I check to rotate the file, I can't seem to change the file since it is borrowed by my struct. Even if I drop the instance of the struct, I can't seem to regain ownership of the file to rename it.
Here is my example:
use std::fs::File;
use std::io::{Write};
use std::mem::{drop};
pub struct FileStruct<W: Write> {
pub writer: Option<W>,
}
impl <W: Write> FileStruct<W> {
pub fn new(writer: W) -> FileStruct<W> {
FileStruct {
writer: Some(writer),
}
}
}
fn main() {
let mut file = File::create("tmp.txt").unwrap();
let mut tmp = FileStruct::new(&mut file);
loop {
if true { //will be time based if check
drop(tmp);
drop(file);
file = File::create("tmp2.txt").unwrap();
tmp = FileStruct::new(&mut file);
}
// write to file
}
}
I know I can get this to work by moving the file creation into the new function call of FileStruct instead of having an intermediate variable, file, but I would like to know why this method where I forcibly drop all the variables where all the variables references should be returned doesn't work.

As the std::mem::drop documentation says,
While this does call the argument's implementation of Drop, it will not release any borrows, as borrows are based on lexical scope.
So even if you call drop, file will remain borrowed nonetheless.

Dropping tmp does not "release the borrow" of file because borrowing is lexically scoped. It's "active" as long as the program execution is within the lexical scope that contains tmp even if you drop it. What you intended to do might be possible in the future if/once "non-lexical scopes" are supported. Until then, you can make it work with RefCell:
use std::cell::RefCell;
use std::io::{ self, Write };
/// wraps a reference to a RefCell<W>
struct RefCellWriteRef<'a, W: 'a>(&'a RefCell<W>);
/// implement Write for when W is Write
impl<'a, W: Write + 'a> Write for RefCellWriteRef<'a, W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
let mut w = self.0.borrow_mut();
w.write(buf)
}
fn flush(&mut self) -> io::Result<()> {
let mut w = self.0.borrow_mut();
w.flush()
}
}
fn main() {
let file: RefCell<Vec<u8>> = RefCell::new(Vec::new());
// use RefCellWriteRef(&file) instead of &mut file
let mut tmp = RefCellWriteRef(&file);
for iter in 0..10 {
if iter == 5 {
drop(tmp);
file.borrow_mut().clear(); // like opening a new file
tmp = RefCellWriteRef(&file);
}
tmp.write(b"foo").unwrap();
}
drop(tmp);
println!("{}", file.borrow().len()); // should print 15
}
The trick here is that given a shared reference to a RefCell<T> you can (eventually) get a &mut T via borrow_mut(). The compile-time borrow checker is pleased because we only use a shared reference on the surface and it's OK to share file like that. Mutable aliasing is avoided by checking at runtime whether the internal T has already been mutably borrowed.

Related

Vec of struct with lifetimes

I'm trying to use RustyBuzz to do some text display. Shaping is done by a struct called Face<'a> which contains a reference to bytes from a font file. I would like to allow for loading fonts "on the fly", e.g. a user inputs a path to a font file, the file is loaded, a new Face<'a> is created and added to a container for fonts. How can I create such a font container, given the lifetime parameter in Face<'a>?
Naïvely, one would do something like this:
#[derive(Default)]
struct FontAtlas<'a> {
fonts : Vec<Face<'a>>
}
impl<'a> FontAtlas<'a> {
fn add_new_font(&mut self, bytes : &'a [u8]) {
self.fonts.push(Face::from_slice(bytes, 0).unwrap());
}
}
Then schematically, in a main function:
fn main() {
let mut atlas = FontAtlas::default();
loop {
let font_path = get_user_input();
let font_bytes = std::fs::read(font_path).unwrap();
atlas.add_new_font(&font_bytes);
}
}
This does not work because atlas outlives font_bytes. One could make font_bytes longer-lived like so:
fn main() {
let mut font_data : Vec<Vec<u8>> = Vec::new(); // holds all bytes from font files
let mut atlas = FontAtlas::default();
loop {
let font_path = get_user_input();
let font_bytes = std::fs::read(font_path).unwrap();
font_data.push(font_bytes); // mutable borrow invalidates all refs in atlas
atlas.add_new_font(font_data.last().unwrap()); // 'font_data.last()' will live longer than 'atlas', so we're good on this side
}
}
But with the lifetime restrictions, this violates borrowing rules: font_data must be borrowed immutably for the whole loop, which prevents the mutable borrow needed to push to font_data.
Is there any way I can achieve font loading "on the fly"? Or is this intrinsically "unsafe"?
This is a hack, but you can leak a Box to get a reference with 'static lifetime:
fn main() {
let mut atlas = FontAtlas::default();
loop {
let font_path = get_user_input();
let font_bytes = Box::leak(std::fs::read(font_path).unwrap().into_boxed_slice());
atlas.add_new_font(font_bytes);
}
}
This assumes that the byte data must exist for the entire duration of the program.
You can wrap up this basic idea to create a safe abstraction of a "registry of owned byte slices" that also cleans up its own allocations:
use std::cell::RefCell;
#[derive(Default)]
struct ByteRegistry(RefCell<Vec<*mut [u8]>>);
impl ByteRegistry {
pub fn new() -> Self {
Self::default()
}
// Note this can take self by shared reference due to the use of RefCell.
pub fn add(&self, bytes: impl Into<Box<[u8]>>) -> &[u8] {
let data = Box::into_raw(bytes.into());
self.0.borrow_mut().push(data);
// SAFETY: We own the data, and the reference is tied to our lifetime,
// so it will be released before we are dropped.
unsafe { &*data }
}
}
impl Drop for ByteRegistry {
fn drop(&mut self) {
for data in self.0.take().into_iter() {
// SAFETY: We obtained the pointers from Box::into_raw() and all
// borrows from add() have ended by now.
unsafe { drop(Box::from_raw(data)) }
}
}
}

Open a single file from a ZIP archive and pass on as Read instance

I am using the zip crate to read data from ZIP archives:
impl<R: Read + Seek> ZipArchive<R> {
pub fn new(reader: R) -> ZipResult<ZipArchive<R>> {...}
pub fn by_name<'a>(&'a mut self, name: &str) -> ZipResult<ZipFile<'a>> {...}
...
}
I need to implement a function that given the name of a ZIP archive and the name of a contained file returns an instance of std::io::Read. Is this possible?
ZipFile does implement Read, but unfortunately it retains a reference to the ZipArchive and I can't find a way to build a struct that takes ownership of both the ZipArchive and ZipFile.
Unfortunately the zip crate requires a self-referential struct for such usage. Self-referential structs are not allowed by the borrow checker, but you can avoid the underlying problem by heap-allocating ZipArchive to prevent it from moving.
Even with the use of Box for heap allocation, the borrow checker still won't accept the resulting code because it doesn't special-case Box, and because it can't prove that some code won't move the object out of the box. To make it compile you'll need to use unsafe transmute to decouple the borrow of ZipFile from the archive. It will be up to you to maintain the invariants: that ZipArchive doesn't move and that ZipFile gets destroyed before ZipArchive. Fortunately the code is short, so it should be easy to review for correctness.
Here is a possible implementation:
pub fn read_zip(file_name: &str, member_name: &str) -> ZipResult<impl std::io::Read> {
struct ZipReader {
// actually has lifetime of `archive`
// declared first so it's droped before `archive`
reader: ZipFile<'static>,
#[allow(dead_code)]
// safety: we must never move out of this box as long as reader is alive
archive: Box<ZipArchive<BufReader<File>>>,
}
impl Read for ZipReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
self.reader.read(buf)
}
}
let file = BufReader::new(File::open(file_name)?);
// Safety: We must never move `archive` out of its Box, and we must destroy
// `reader` before `archive`. The first is ensured by never giving access to
// the box, and the second by the drop order guarantees documented by Rust.
let mut archive = Box::new(ZipArchive::new(file)?);
let reader = archive.by_name(member_name)?;
let reader = unsafe { std::mem::transmute(reader) };
Ok(ZipReader { archive, reader })
}
The above code should be sound even though we lie to the borrow checker about the lifetime of the reference. First, the lie is consistent with the premise of the 'static bound: it is indeed possible to indefinitely extend the lifetime of ZipRead without invalidating the reference. (This is what the borrow checker cannot yet prove on its own.) Secondly, Rust's lifetime analysis never affects code generation, it only validates the code, thus our "lie" cannot cause the code to miscompile.
If you're ok with an external dependency, you can use ouroboros to avoid unsafe (or rather confine it to the code generated by its proc macro). That way the code you write should be sound, providing there are no issues in ouroboros. This is what it would look like:
pub fn read_zip(file_name: &str, member_name: &str) -> ZipResult<impl std::io::Read> {
#[ouroboros::self_referencing]
struct ZipReader {
archive: ZipArchive<BufReader<File>>,
#[borrows(mut archive)]
#[not_covariant]
reader: ZipFile<'this>,
}
impl Read for ZipReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
self.with_reader_mut(|reader| reader.read(buf))
}
}
let file = BufReader::new(File::open(file_name)?);
let archive = ZipArchive::new(file)?;
// ZipReaderBuilder and ZipReaderTryBuilder are generated by ouroboros.
ZipReaderTryBuilder {
archive,
reader_builder: |archive| archive.by_name(member_name),
}
.try_build()
}

Writing to a file or String in Rust

TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.

Why does this variable definition imply static lifetime?

I'm trying to execute a function on chunks of a vector and then send the result back using the message passing library.
However, I get a strange error about the lifetime of the vector that isn't even participating in the thread operations:
src/lib.rs:153:27: 154:25 error: borrowed value does not live long enough
src/lib.rs:153 let extended_segments = (segment_size..max_val)
error: src/lib.rs:154 .collect::<Vec<_>>()borrowed value does not live long enough
note: reference must be valid for the static lifetime...:153
let extended_segments = (segment_size..max_val)
src/lib.rs:153:3: 155:27: 154 .collect::<Vec<_>>()
note: but borrowed value is only valid for the statement at 153:2:
reference must be valid for the static lifetime...
src/lib.rs:
let extended_segments = (segment_size..max_val)
consider using a `let` binding to increase its lifetime
I tried moving around the iterator and adding lifetimes to different places, but I couldn't get the checker to pass and still stay on type.
The offending code is below, based on the concurrency chapter in the Rust book. (Complete code is at github.)
use std::sync::mpsc;
use std::thread;
fn sieve_segment(a: &[usize], b: &[usize]) -> Vec<usize> {
vec![]
}
fn eratosthenes_sieve(val: usize) -> Vec<usize> {
vec![]
}
pub fn segmented_sieve_parallel(max_val: usize, mut segment_size: usize) -> Vec<usize> {
if max_val <= ((2 as i64).pow(16) as usize) {
// early return if the highest value is small enough (empirical)
return eratosthenes_sieve(max_val);
}
if segment_size > ((max_val as f64).sqrt() as usize) {
segment_size = (max_val as f64).sqrt() as usize;
println!("Segment size is larger than √{}. Reducing to {} to keep resource use down.",
max_val,
segment_size);
}
let small_primes = eratosthenes_sieve((max_val as f64).sqrt() as usize);
let mut big_primes = small_primes.clone();
let (tx, rx): (mpsc::Sender<Vec<usize>>, mpsc::Receiver<Vec<usize>>) = mpsc::channel();
let extended_segments = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
for this_segment in extended_segments.clone() {
let small_primes = small_primes.clone();
let tx = tx.clone();
thread::spawn(move || {
let sieved_segment = sieve_segment(&small_primes, this_segment);
tx.send(sieved_segment).unwrap();
});
}
for _ in 1..extended_segments.count() {
big_primes.extend(&rx.recv().unwrap());
}
big_primes
}
fn main() {}
How do I understand and avoid this error? I'm not sure how to make the lifetime of the thread closure static as in this question and still have the function be reusable (i.e., not main()). I'm not sure how to "consume all things that come into [the closure]" as mentioned in this question. And I'm not sure where to insert .map(|s| s.into()) to ensure that all references become moves, nor am I sure I want to.
When trying to reproduce a problem, I'd encourage you to create a MCVE by removing all irrelevant code. In this case, something like this seems to produce the same error:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
}
fn main() {}
Let's break that down:
Create an iterator between numbers.
Collect all of them into a Vec<usize>.
Return an iterator that contains references to the vector.
Since the vector isn't bound to any variable, it's dropped at the end of the statement. This would leave the iterator pointing to an invalid region of memory, so that's disallowed.
Check out the definition of slice::chunks:
fn chunks(&self, size: usize) -> Chunks<T>
pub struct Chunks<'a, T> where T: 'a {
// some fields omitted
}
The lifetime marker 'a lets you know that the iterator contains a reference to something. Lifetime elision has removed the 'a from the function, which looks like this, expanded:
fn chunks<'a>(&'a self, size: usize) -> Chunks<'a, T>
Check out this line of the error message:
help: consider using a let binding to increase its lifetime
You can follow that as such:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>();
let bar = foo.chunks(segment_size);
}
fn main() {}
Although I'd write it as
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo: Vec<_> = (segment_size..max_val).collect();
let bar = foo.chunks(segment_size);
}
fn main() {}
Re-inserting this code back into your original problem won't solve the problem, but it will be much easier to understand. That's because you are attempting to pass a reference to thread::spawn, which may outlive the current thread. Thus, everything passed to thread::spawn must have the 'static lifetime. There are tons of questions that detail why that must be prevented and a litany of solutions, including scoped threads and cloning the vector.
Cloning the vector is the easiest, but potentially inefficient:
for this_segment in extended_segments.clone() {
let this_segment = this_segment.to_vec();
// ...
}

"cannot move out of variable because it is borrowed" when rotating variables

I am writing a program that writes to a file and rotates the file it's writing to every now and then. When I check to rotate the file, I can't seem to change the file since it is borrowed by my struct. Even if I drop the instance of the struct, I can't seem to regain ownership of the file to rename it.
Here is my example:
use std::fs::File;
use std::io::{Write};
use std::mem::{drop};
pub struct FileStruct<W: Write> {
pub writer: Option<W>,
}
impl <W: Write> FileStruct<W> {
pub fn new(writer: W) -> FileStruct<W> {
FileStruct {
writer: Some(writer),
}
}
}
fn main() {
let mut file = File::create("tmp.txt").unwrap();
let mut tmp = FileStruct::new(&mut file);
loop {
if true { //will be time based if check
drop(tmp);
drop(file);
file = File::create("tmp2.txt").unwrap();
tmp = FileStruct::new(&mut file);
}
// write to file
}
}
I know I can get this to work by moving the file creation into the new function call of FileStruct instead of having an intermediate variable, file, but I would like to know why this method where I forcibly drop all the variables where all the variables references should be returned doesn't work.
As the std::mem::drop documentation says,
While this does call the argument's implementation of Drop, it will not release any borrows, as borrows are based on lexical scope.
So even if you call drop, file will remain borrowed nonetheless.
Dropping tmp does not "release the borrow" of file because borrowing is lexically scoped. It's "active" as long as the program execution is within the lexical scope that contains tmp even if you drop it. What you intended to do might be possible in the future if/once "non-lexical scopes" are supported. Until then, you can make it work with RefCell:
use std::cell::RefCell;
use std::io::{ self, Write };
/// wraps a reference to a RefCell<W>
struct RefCellWriteRef<'a, W: 'a>(&'a RefCell<W>);
/// implement Write for when W is Write
impl<'a, W: Write + 'a> Write for RefCellWriteRef<'a, W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
let mut w = self.0.borrow_mut();
w.write(buf)
}
fn flush(&mut self) -> io::Result<()> {
let mut w = self.0.borrow_mut();
w.flush()
}
}
fn main() {
let file: RefCell<Vec<u8>> = RefCell::new(Vec::new());
// use RefCellWriteRef(&file) instead of &mut file
let mut tmp = RefCellWriteRef(&file);
for iter in 0..10 {
if iter == 5 {
drop(tmp);
file.borrow_mut().clear(); // like opening a new file
tmp = RefCellWriteRef(&file);
}
tmp.write(b"foo").unwrap();
}
drop(tmp);
println!("{}", file.borrow().len()); // should print 15
}
The trick here is that given a shared reference to a RefCell<T> you can (eventually) get a &mut T via borrow_mut(). The compile-time borrow checker is pleased because we only use a shared reference on the surface and it's OK to share file like that. Mutable aliasing is avoided by checking at runtime whether the internal T has already been mutably borrowed.

Resources