Is there a way to have the Rust compiler error if a struct is to be automatically dropped?
Example: I'm implementing a memory pool and want the allocations to be manually returned to the memory pool to prevent leakage. Is there something like RequireManualDrop from the below example?
impl MemoryPool {
pub fn allocate(&mut self) -> Option<Allocation> { /* ... */ }
pub fn free(&mut self, alloc: Allocation) { let (ptr, size) = alloc.inner.into_inner(); /* ... */ }
}
pub struct Allocation {
inner: RequireManualDrop<(*mut u8, usize)>,
}
fn valid_usage(mem: &mut MemoryPool) {
let chunk = mem.allocate();
/* ... */
mem.free(chunk);
}
/* compile error: Allocation.inner needs to be manully dropped */
fn will_have_compile_error(mem: &mut MemoryPool) {
let chunk = mem.allocate();
/* ... */
}
Rust doesn't have a direct solution for this. However, you can use a nit trick for that (this trick is taken from https://github.com/Kixunil/dont_panic).
If you call a declared but not defined function, the linker will report an error. We can use that to error on drop. By calling an undefined function in the destructor, the code will not compile unless the destructor is not called.
#[derive(Debug)]
pub struct Undroppable(pub i32);
impl Drop for Undroppable {
fn drop(&mut self) {
extern "C" {
// This will show (somewhat) useful error message instead of complete gibberish
#[link_name = "\n\nERROR: `Undroppable` implicitly dropped\n\n"]
fn trigger() -> !;
}
unsafe { trigger() }
}
}
pub fn manually_drop(v: Undroppable) {
let v = std::mem::ManuallyDrop::new(v);
println!("Manually dropping {v:?}...");
}
Playground.
However, beware that while in this case it worked even in debug builds, it may require optimizations in other cases to eliminate the call. And unwinding may complicate the process even more, because it can lead to unexpected implicit drops. For example, if I change manually_drop() to the following seemingly identical version:
pub fn manually_drop(v: Undroppable) {
println!("Manually dropping {v:?}...");
std::mem::forget(v);
}
It doesn't work, because println!() may unwind, and then std::mem::forget(v) won't be reached. If a function that takes Undroppable by value must unwind, you probably have no options but to wrap it with ManuallyDrop and manually ensure it is correctly dropped (though you can just drop it at the end of the function, and let the unwind case leak it, as leaking in panicking is mostly fine).
Related
The question is not new and there are two approaches, as far as I can tell:
Use Vec<T>, as suggested here
Manage the heap memory yourself, using std::alloc::alloc, as shown here
My question is whether these are indeed the two (good) alternatives.
Just to make this perfectly clear: Both approaches work. The question is whether there is another, maybe preferred way. The example below is introduced to identify where use of Vec is not good, and where other approaches therefore may be better.
Let's state the problem: Suppose there's a C library that requires some buffer to write into. This could be a compression library, for example. It is easiest to have Rust allocate the heap memory and manage it instead of allocating in C/C++ with malloc/new and then somehow passing ownership to Rust.
Let's go with the compression example. If the library allows incremental (streaming) compression, then I would need a buffer that keeps track of some offset.
Following approach 1 (that is: "abuse" Vec<T>) I would wrap Vec and use len and capacity for my purposes:
/// `Buffer` is basically a Vec
pub struct Buffer<T>(Vec<T>);
impl<T> Buffer<T> {
/// Create new buffer of length `len`
pub fn new(len: usize) -> Self {
Buffer(Vec::with_capacity(len))
}
/// Return length of `Buffer`
pub fn len(&self) -> usize {
return self.0.len()
}
/// Return total allocated size of `Buffer`
pub fn capacity(&self) -> usize {
return self.0.capacity()
}
/// Return remaining length of `Buffer`
pub fn remaining(&self) -> usize {
return self.0.capacity() - self.len()
}
/// Increment the offset
pub fn increment(&mut self, by:usize) {
unsafe { self.0.set_len(self.0.len()+by); }
}
/// Returns an unsafe mutable pointer to the buffer
pub fn as_mut_ptr(&mut self) -> *mut T {
unsafe { self.0.as_mut_ptr().add(self.0.len()) }
}
/// Returns ref to `Vec<T>` inside `Buffer`
pub fn as_vec(&self) -> &Vec<T> {
&self.0
}
}
The only interesting functions are increment and as_mut_ptr.
Buffer would be used like this
fn main() {
// allocate buffer for compressed data
let mut buf: Buffer<u8> = Buffer::new(1024);
loop {
// perform C function call
let compressed_len: usize = compress(some_input, buf.as_mut_ptr(), buf.remaining());
// increment
buf.increment(compressed_len);
}
// get Vec inside buf
let compressed_data = buf.as_vec();
}
Buffer<T> as shown here is clearly dangerous, for example if any reference type is used. Even T=bool may result in undefined behaviour. But the problems with uninitialised instance of T can be avoided by introducing a trait that limits the possible types T.
Also, if alignment matters, then Buffer<T> is not a good idea.
But otherwise, is such a Buffer<T> really the best way to do this?
There doesn't seem to be an out-of-the box solution. The bytes crate comes close, it offers a "container for storing and operating on contiguous slices of memory", but the interface is not flexible enough.
You absolutely can use a Vec's spare capacity as to write into manually. That is why .set_len() is available. However, compress() must know that the given pointer is pointing to uninitialized memory and thus is not allowed to read from it (unless written to first) and you must guarantee that the returned length is the number of bytes initialized. I think these rules are roughly the same between Rust and C or C++ in this regard.
Writing this in Rust would look like this:
pub struct Buffer<T>(Vec<T>);
impl<T> Buffer<T> {
pub fn new(len: usize) -> Self {
Buffer(Vec::with_capacity(len))
}
/// SAFETY: `by` must be less than or equal to `space_len()` and the bytes at
/// `space_ptr_mut()` to `space_ptr_mut() + by` must be initialized
pub unsafe fn increment(&mut self, by: usize) {
self.0.set_len(self.0.len() + by);
}
pub fn space_len(&self) -> usize {
self.0.capacity() - self.0.len()
}
pub fn space_ptr_mut(&mut self) -> *mut T {
unsafe { self.0.as_mut_ptr().add(self.0.len()) }
}
pub fn as_vec(&self) -> &Vec<T> {
&self.0
}
}
unsafe fn compress(_input: i32, ptr: *mut u8, len: usize) -> usize {
// right now just writes 5 bytes if there's space for them
let written = usize::min(5, len);
for i in 0..written {
ptr.add(i).write(0);
}
written
}
fn main() {
let mut buf: Buffer<u8> = Buffer::new(1024);
let some_input = 5i32;
unsafe {
let compressed_len: usize = compress(some_input, buf.space_ptr_mut(), buf.space_len());
buf.increment(compressed_len);
}
let compressed_data = buf.as_vec();
println!("{:?}", compressed_data);
}
You can see it on the playground. If you run it through Miri, you'll see it picks up no undefined behavior, but if you over-advertise how much you've written (say return written + 10) then it does produce an error that reading uninitialized memory was detected.
One of the reasons there isn't an out-of-the-box type for this is because Vec is that type:
fn main() {
let mut buf: Vec<u8> = Vec::with_capacity(1024);
let some_input = 5i32;
let spare_capacity = buf.spare_capacity_mut();
unsafe {
let compressed_len: usize = compress(
some_input,
spare_capacity.as_mut_ptr().cast(),
spare_capacity.len(),
);
buf.set_len(buf.len() + compressed_len);
}
println!("{:?}", buf);
}
Your Buffer type doesn't really add any convenience or safety and a third-party crate can't do so because it relies on the correctness of compress().
Is such a Buffer really the best way to do this?
Yes, this is pretty much the lowest cost ways to provide a buffer for writing. Looking at the generated release assembly, it is just one call to allocate and that's it. You can get tricky by using a special allocator or simply pre-allocate and reuse allocations if you're doing this many times (but be sure to measure since the built-in allocator will do this anyway, just more generally).
For the library I need to write I have a struct representing a service written in C:
struct RustService {
id: u8, /* it's me setting the id in the first place */
service: *mut c_void,
status: *mut c_void,
value: Option<u32>,
/* various fields here */
}
impl RustService {
fn new() -> /* can be wrapped*/RustService {
/* ... */
}
}
I have several callbacks passed to C library like this:
#[no_mangle]
extern "C" fn callback_handle(service: *mut c_void, size: *mut u32, buffer: *mut u8)
{
let id: u8 = c_library_get_id(service);
let instance: &mut RustService = /* get that from static container based on id */
if size = mem::size_of::<u32>() as u32 {
let value = buffer.cast::<u32>().read();
instance.value = Some(value);
} else {
panic!("Invalid length passed.");
}
}
What am I looking for is a proven/good practice that would apply here that works without causing deadlock, where my callback could refer back to the right instance using the library in C. That library in C can run multiple instances, however it's not that important. Alternatively I can accept a single instance pattern, that would work in this case.
Other methods of RustService would need to call for example a function with the following signature (from bindgen):
extern "C" pub fn clibCoreWork(status: *mut clibStatus, service: *mut clibService) -> u32;
and guts of that function may decide to call these callbacks mentioned above.
The approach I've taken shoots me in the foot as the inner Mutex deadlocks even in the single thread:
lazy_static! {
static ref MY_INSTANCES: Mutex<HashMap<u8, Weak<Mutex<RustService>>>> = Mutex::new(HashMap::new());
}
I dug the internet but fellow Rustaceans didn't seem to post anything about such problem.
Note: I cannot change underlying types in the C library I'm using. In the ideal world it would let me store pointer to user data and I'd brutally convert it to a mutable reference in unsafe code.
As part of binding a C API to Rust, I have a mutable reference ph: &mut Ph, a struct struct EnsureValidContext<'a> { ph: &'a mut Ph }, and some methods:
impl Ph {
pub fn print(&mut self, s: &str) {
/*...*/
}
pub fn with_context<F, R>(&mut self, ctx: &Context, f: F) -> Result<R, InvalidContextError>
where
F: Fn(EnsureValidContext) -> R,
{
/*...*/
}
/* some others */
}
impl<'a> EnsureValidContext<'a> {
pub fn print(&mut self, s: &str) {
self.ph.print(s)
}
pub fn close(self) {}
/* some others */
}
I don't control these. I can only use these.
Now, the closure API is nice if you want the compiler to force you to think about performance (and the tradeoffs you have to make between performance and the behaviour you want. Context validation is expensive). However, let's say you just don't care about that and want it to just work.
I was thinking of making a wrapper that handles it for you:
enum ValidPh<'a> {
Ph(&'a mut Ph),
Valid(*mut Ph, EnsureValidContext<'a>),
Poisoned,
}
impl<'a> ValidPh<'a> {
pub fn print(&mut self) {
/* whatever the case, just call .print() on the inner object */
}
pub fn set_context(&mut self, ctx: &Context) {
/*...*/
}
pub fn close(&mut self) {
/*...*/
}
/* some others */
}
This would work by, whenever necessary, checking if we're a Ph or a Valid, and if we're a Ph we'd upgrade to a Valid by going:
fn upgrade(&mut self) {
if let Ph(_) = self { // don't call mem::replace unless we need to
if let Ph(ph) = mem::replace(self, Poisoned) {
let ptr = ph as *mut _;
let evc = ph.with_context(ph.get_context(), |evc| evc);
*self = Valid(ptr, evc);
}
}
}
Downgrading is different for each method, as it has to call the target method, but here's an example close:
pub fn close(&mut self) {
if let Valid(_, _) = self {
/* ok */
} else {
self.upgrade()
}
if let Valid(ptr, evc) = mem::replace(self, Invalid) {
evc.close(); // consume the evc, dropping the borrow.
// we can now use our original borrow, but since we don't have it anymore, bring it back using our trusty ptr
*self = unsafe { Ph(&mut *ptr) };
} else {
// this can only happen due to a bug in our code
unreachable!();
}
}
You get to use a ValidPh like:
/* given a &mut vph */
vph.print("hello world!");
if vph.set_context(ctx) {
vph.print("closing existing context");
vph.close();
}
vph.print("opening new context");
vph.open("context_name");
vph.print("printing in new context");
Without vph, you'd have to juggle &mut Ph and EnsureValidContext around on your own. While the Rust compiler makes this trivial (just follow the errors), you may want to let the library handle it automatically for you. Otherwise you might end up just calling the very expensive with_context for every operation, regardless of whether the operation can invalidate the context or not.
Note that this code is rough pseudocode. I haven't compiled or tested it yet.
One might argue I need an UnsafeCell or a RefCell or some other Cell. However, from reading this it appears UnsafeCell is only a lang item because of interior mutability — it's only necessary if you're mutating state through an &T, while in this case I have &mut T all the way.
However, my reading may be flawed. Does this code invoke UB?
(Full code of Ph and EnsureValidContext, including FFI bits, available here.)
Taking a step back, the guarantees upheld by Rust are:
&T is a reference to T which is potentially aliased,
&mut T is a reference to T which is guaranteed not to be aliased.
The crux of the question therefore is: what does guaranteed not to be aliased means?
Let's consider a safe Rust sample:
struct Foo(u32);
impl Foo {
fn foo(&mut self) { self.bar(); }
fn bar(&mut self) { *self.0 += 1; }
}
fn main() { Foo(0).foo(); }
If we take a peek at the stack when Foo::bar is being executed, we'll see at least two pointers to Foo: one in bar and one in foo, and there may be further copies on the stack or in other registers.
So, clearly, there are aliases in existence. How come! It's guaranteed NOT to be aliased!
Take a deep breath: how many of those aliases can you access at the time?
Only 1. The guarantee of no aliasing is not spatial but temporal.
I would think, therefore, that at any point in time, if a &mut T is accessible, then no other reference to this instance must be accessible.
Having a raw pointer (*mut T) is perfectly fine, it requires unsafe to access; however forming a second reference may or may not be safe, even without using it, so I would avoid it.
Rust's memory model is not rigorously defined yet, so it's hard to say for sure, but I believe it's not undefined behavior to:
carry a *mut Ph around while a &'a mut Ph is also reachable from another path, so long as you don't dereference the *mut Ph, even just for reading, and don't convert it to a &Ph or &mut Ph, because mutable references grant exclusive access to the pointee.
cast the *mut Ph back to a &'a mut Ph once the other &'a mut Ph falls out of scope.
I have a function like this:
extern {
fn foo(layout: *const RawLayout) -> libc::uint8_t;
}
fn bar(layout: Layout) -> bool {
unsafe {
foo(&layout.into() as *const _) != 0
}
}
Where Layout is a copyable type that can be converted .into() a RawLayout.
I want to make sure I understand what is happening as it is unsafe. As I understand it, layout.into() creates a temporary RawLayout, then & takes a reference to it, and as *const _ converts it to a raw pointer (*const RawLayout). Then the foo() function is called and returns, and finally the temporary RawLayout is dropped.
Is that correct? Or is there some tricky reason why I shouldn't do this?
You are right. In this case, foo is called first and RawLayout is dropped afterwards. This is explained in The Rust Reference (follow the link to see concrete examples of how this works out in practice):
The lifetime of temporary values is typically the innermost enclosing
statement
However, I would rather follow Shepmaster's advice. Explicitly introducing a local variable would help the reader of the code concentrate in more important things, like ensuring that the unsafe code is correct (instead of having to figure out the exact semantics of temporary variables).
How to check this
You can use the code below to check this behavior:
struct Layout;
struct RawLayout;
impl Into<RawLayout> for Layout {
fn into(self) -> RawLayout {
RawLayout
}
}
impl Drop for RawLayout {
fn drop(&mut self) {
println!("Dropping RawLayout");
}
}
unsafe fn foo(layout: *const RawLayout) -> u8 {
println!("foo called");
1
}
fn bar(layout: Layout) -> bool {
unsafe {
foo(&layout.into() as *const _) != 0
}
}
fn main() {
bar(Layout);
}
The output is:
foo called
Dropping RawLayout
I am trying to figure out how to apply Rust lifetimes to add some compile-time enforcement to Erlang NIF modules. NIF modules are shared libraries normally written in C that provide extensions.
A simplified prototype of the callback you would write in C looks like this:
Handle my_nif_function(Heap *heap, Handle handle);
You are provided a handle and a pointer to the heap that owns it. In your callback you may inspect the input handle, create more handles on the heap, and return one of them as the function return. The heap and all its handles become invalid after your callback returns, so you must not store copies of the heap or its handles during the callback. Unfortunately I’ve seen people do exactly this and it eventually results in a mysterious emulator crash. Can Rust enforce these lifetime constraints?
I think the heap can be easily managed by turning it into a reference.
fn my_nif_function(heap: &Heap, handle: Handle) -> Handle
But how can I link the lifetime of the input and output handles to the heap?
Another wrinkle to this is that you can also create your own heaps and handles which are allowed to live outside the scope of a callback invocation. In C++ I would use std::unique_ptr with a custom destructor. What is the Rust equivalent? The [simplified] C API for managing heaps looks like this:
Heap *create_heap();
void destroy_heap(Heap *);
Reference: NIFs are described here: http://www.erlang.org/doc/man/erl_nif.html . The Erlang names for "heaps" and "handles" are "environments" and "terms". I used the names "heaps" and "handles" so that the question would be more broadly understood.
Rust 1.0
The various marker types have been unified into one: PhantomData
use std::ptr;
use std::marker::PhantomData;
struct Heap {
ptr: *const u8,
}
impl Heap {
fn new(c_ptr: *const u8) -> Heap {
Heap {
ptr: c_ptr
}
}
fn wrap_handle<'a>(&'a self, c_handle: *const u8) -> Handle<'a> {
Handle {
ptr: c_handle,
marker: PhantomData,
}
}
}
struct Handle<'a> {
ptr: *const u8,
marker: PhantomData<&'a ()>,
}
fn main() {
let longer_heap = Heap::new(ptr::null());
let handle = {
let shorter_heap = Heap::new(ptr::null());
let longer_handle = longer_heap.wrap_handle(ptr::null());
let shorter_handle = shorter_heap.wrap_handle(ptr::null());
// longer_handle // ok to return
// shorter_handle // error: `shorter_heap` does not live long enough
};
}
Original Answer
Here's an example of using ContravariantLifetime. We wrap the raw heap pointer into a struct and then wrap raw handle pointers in another struct, reusing the lifetime of the heap.
use std::ptr;
use std::marker::ContravariantLifetime;
struct Heap {
ptr: *const u8,
}
impl Heap {
fn new(c_ptr: *const u8) -> Heap {
Heap {
ptr: c_ptr
}
}
fn wrap_handle<'a>(&'a self, c_handle: *const u8) -> Handle<'a> {
Handle {
ptr: c_handle,
marker: ContravariantLifetime,
}
}
}
struct Handle<'a> {
ptr: *const u8,
marker: ContravariantLifetime<'a>,
}
fn main() {
let longer_heap = Heap::new(ptr::null());
let handle = {
let shorter_heap = Heap::new(ptr::null());
let longer_handle = longer_heap.wrap_handle(ptr::null());
let shorter_handle = shorter_heap.wrap_handle(ptr::null());
// longer_handle // ok to return
// shorter_handle // error: `shorter_heap` does not live long enough
};
}
Lifetime markers
There are 3 lifetime markers. I won't attempt to replicate the reasonably good but dense documentation here, but can also point out the dense Wikipedia page, which might be some small assistance. I've listed them in the order that you are most likely to use them:
ContravariantLifetime
InvariantLifetime
CovariantLifetime