Wouldnt ManuallyDrop without drop call cause memory leak? - rust

I am going through wasm-bindgen guide and i came across the glue code it generates for interacting between js and rust. A reference to a value is passed from js to rust. Rust has to wrap it in ManuallyDrop so that it wont call the Drop implemented on JsValue.
pub fn foo(a: &JsValue) {
// ...
}
#[export_name = "foo"]
pub extern "C" fn __wasm_bindgen_generated_foo(arg0: u32) {
let arg0 = unsafe {
ManuallyDrop::new(JsValue::__from_idx(arg0))
};
let arg0 = &*arg0;
foo(arg0);
}
But I do not see a ManuallyDrop::drop being called on arg0 . So would the JsValuewrapped in ManuallyDrop be dropped unless the ManuallyDrop::drop(arg0) function is called? Wouldnt it cause a memory leak?

ManuallyDrop does not stop the inner value from being destroyed. It only stops drop from being called. Consider a Vec:
pub struct Vec<T> {
ptr: *mut T,
cap: usize,
len: usize,
}
The fields ptr, cap, and len will still be destroyed even when wrapped by ManuallyDrop. However, any dynamic resources managed (in this case the data referenced by ptr) will not be released since drop is not called.
Since JsValue simply holds a u32, no leak will occur on the Rust-side. And since the glue code ensures proper cleanup for &JsValue arguments, no memory is leaked on the Javascript-side.

Related

Rust type that requires manual drop

Is there a way to have the Rust compiler error if a struct is to be automatically dropped?
Example: I'm implementing a memory pool and want the allocations to be manually returned to the memory pool to prevent leakage. Is there something like RequireManualDrop from the below example?
impl MemoryPool {
pub fn allocate(&mut self) -> Option<Allocation> { /* ... */ }
pub fn free(&mut self, alloc: Allocation) { let (ptr, size) = alloc.inner.into_inner(); /* ... */ }
}
pub struct Allocation {
inner: RequireManualDrop<(*mut u8, usize)>,
}
fn valid_usage(mem: &mut MemoryPool) {
let chunk = mem.allocate();
/* ... */
mem.free(chunk);
}
/* compile error: Allocation.inner needs to be manully dropped */
fn will_have_compile_error(mem: &mut MemoryPool) {
let chunk = mem.allocate();
/* ... */
}
Rust doesn't have a direct solution for this. However, you can use a nit trick for that (this trick is taken from https://github.com/Kixunil/dont_panic).
If you call a declared but not defined function, the linker will report an error. We can use that to error on drop. By calling an undefined function in the destructor, the code will not compile unless the destructor is not called.
#[derive(Debug)]
pub struct Undroppable(pub i32);
impl Drop for Undroppable {
fn drop(&mut self) {
extern "C" {
// This will show (somewhat) useful error message instead of complete gibberish
#[link_name = "\n\nERROR: `Undroppable` implicitly dropped\n\n"]
fn trigger() -> !;
}
unsafe { trigger() }
}
}
pub fn manually_drop(v: Undroppable) {
let v = std::mem::ManuallyDrop::new(v);
println!("Manually dropping {v:?}...");
}
Playground.
However, beware that while in this case it worked even in debug builds, it may require optimizations in other cases to eliminate the call. And unwinding may complicate the process even more, because it can lead to unexpected implicit drops. For example, if I change manually_drop() to the following seemingly identical version:
pub fn manually_drop(v: Undroppable) {
println!("Manually dropping {v:?}...");
std::mem::forget(v);
}
It doesn't work, because println!() may unwind, and then std::mem::forget(v) won't be reached. If a function that takes Undroppable by value must unwind, you probably have no options but to wrap it with ManuallyDrop and manually ensure it is correctly dropped (though you can just drop it at the end of the function, and let the unwind case leak it, as leaking in panicking is mostly fine).

Ensuring value lives for its entire scope

I have a type (specifically CFData from core-foundation), whose memory is managed by C APIs and that I need to pass and receive from C functions as a *c_void. For simplicity, consider the following struct:
struct Data {
ptr: *mut ffi::c_void,
}
impl Data {
pub fn new() -> Self {
// allocate memory via an unsafe C-API
Self {
ptr: std::ptr::null(), // Just so this compiles.
}
}
pub fn to_raw(&self) -> *const ffi::c_void {
self.ptr
}
}
impl Drop for Data {
fn drop(&mut self) {
unsafe {
// Free memory via a C-API
}
}
}
Its interface is safe, including to_raw(), since it only returns a raw pointer. It doesn't dereference it. And the caller doesn't dereference it. It's just used in a callback.
pub extern "C" fn called_from_C_ok(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// Do things to compute Data.
// This is actually async code, which is why there's a completion handler.
on_complete(data.to_raw()); // data survives through the function call
}
This is fine. Data is safe to manipulate, and (I strongly believe) Rust promises that data will live until the end of the on_complete function call.
On the other hand, this is not ok:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
In my code, I made this mistake and it started crashing. It's easy to see why and it's easy to fix. But it's subtle, and it's easy for a future developer (me) to modify the ok version into the broken version without realizing the problem (and it may not always crash).
What I'd like to do is to ensure data lives as long as ptr. In Swift, I'd do this with:
withExtendedLifetime(&data) { data in
// ...data cannot be dropped until this closure ends...
}
Is there a similar construct in Rust that explicitly marks the minimum lifetime for a variable to a scope (that the optimizer may not reorder), even if it's not directly accessed? (I'm sure it's trivial to build a custom with_extended_lifetime in Rust, but I'm looking for a more standard solution so that it will be obvious to other developers what's going on).
Playground
I do believe the following "works" but I'm not sure how flexible it is, or if it's just replacing a more standard solution:
fn with_extended_lifetime<T, U, F>(value: &T, f: F) -> U
where
F: Fn(&T) -> U,
{
f(value)
}
with_extended_lifetime(&data, |data| {
let ptr = data.to_raw();
on_complete(ptr)
});
The optimizer is not allowed to change when a value is dropped. If you assign a value to a variable (and that value is not then moved elsewhere or overwritten by assignment), it will always be dropped at the end of the block, not earlier.
You say that this code is incorrect:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
but in fact data may not be dropped at that point, and this code is sound. What you may be confusing this with is the mistake of not assigning the value to a variable:
let ptr = Data::new().to_raw();
on_complete(ptr);
In this case, the pointer is dangling, because the result of Data::new() is stored in a temporary variable within the statement, which is dropped at the end of the statement, not a local variable, which is dropped at the end of the block.
If you want to adopt a programming style which makes explicit when values are dropped, the usual pattern is to use the standard drop() function to mark the exact time of drop:
let data = Data::new();
...
on_complete(data.to_raw());
drop(data); // We have stopped using the raw pointer now
(Note that drop() is not magic: it is simply a function which takes one argument and does nothing with it other than dropping. Its presence in the standard library is to give a name to this pattern, not to provide special functionality.)
However, if you want to, there isn't anything wrong with using your with_extended_lifetime (other than nonstandard style and arguably a misleading name) if you want to make the code structure even more strongly indicate the scope of the value. One nitpick: the function parameter should be FnOnce, not Fn, for maximum generality (this allows it to be passed functions that can't be called more than once).
Other than explicitly dropping as the other answer mentions, there is another way to help prevent these types of accidental drops: use a wrapper around a raw pointer that has lifetime information.
use std::marker::PhantomData;
#[repr(transparent)]
struct PtrWithLifetime<'a>{
ptr: *mut ffi::c_void,
_data: PhantomData<&'a ffi::c_void>,
}
impl Data {
fn to_raw(&self) -> PtrWithLife<'_>{
PtrWithLifetime{
ptr: self.ptr,
_data: PhantomData,
}
}
}
The #[repr(transparent)] guarantees that PtrWithLife is stored in memory the same as *const ffi::c_void is, so you can adjust the declaration of on_complete to
fn called_from_c(on_complete: extern "C" fn(PtrWithLifetime<'_>)){
//stuff
}
without causing any major inconvenience to any downstream users, especially since the ffi bindings can be adjusted in a similar fashion.

How do I force Rust to allocate a big object on the heap without using a Vec? [duplicate]

In this code...
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> { // Stack frame:
let v = Test { a: 123, b: 456 }; // 12 bytes
Box::new(v) // `usize` bytes (`*const T` in `Box`)
}
... as far as I understand (ignoring possible optimizations), v gets allocated on the stack and then copied to the heap, before being returned in a Box.
And this code...
fn foo() -> Box<Test> {
Box::new(Test { a: 123, b: 456 })
}
...shouldn't be any different, presumably, since there should be a temporary variable for struct allocation (assuming compiler doesn't have any special semantics for the instantiation expression inside Box::new()).
I've found Do values in return position always get allocated in the parents stack frame or receiving Box?. Regarding my specific question, it only proposes the experimental box syntax, but mostly talks about compiler optimizations (copy elision).
So my question remains: using stable Rust, how does one allocate structs directly on the heap, without relying on compiler optimizations?
As of Rust 1.39, there seems to be only one way in stable to allocate memory on the heap directly - by using std::alloc::alloc (note that the docs state that it is expected to be deprecated). It's reasonably unsafe.
Example:
#[derive(Debug)]
struct Test {
a: i64,
b: &'static str,
}
fn main() {
use std::alloc::{alloc, dealloc, Layout};
unsafe {
let layout = Layout::new::<Test>();
let ptr = alloc(layout) as *mut Test;
(*ptr).a = 42;
(*ptr).b = "testing";
let bx = Box::from_raw(ptr);
println!("{:?}", bx);
}
}
This approach is used in the unstable method Box::new_uninit.
It turns out there's even a crate for avoiding memcpy calls (among other things): copyless. This crate also uses an approach based on this.
You seem to be looking for the box_syntax feature, however as of Rust 1.39.0 it is not stable and only available with a nightly compiler. It also seems like this feature will not be stabilized any time soon, and might have a different design if it ever gets stabilized.
On a nightly compiler, you can write:
#![feature(box_syntax)]
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> {
box Test { a: 123, b: 456 }
}
Is there a way to allocate directly to the heap without box?
No. If there was, it wouldn't need a language change.
People tend to avoid this by using the unstable syntax indirectly, such as by using one of the standard containers which, in turn, uses it internally.
See also:
How to allocate arrays on the heap in Rust 1.0?
Is there any way to allocate a standard Rust array directly on the heap, skipping the stack entirely?
What does the box keyword do?
What is the <- symbol in Rust?
I recently had the same problem. Based on the answers here and other places, I wrote a simple function for heap allocation:
pub fn unsafe_allocate<T>() -> Box<T> {
let mut grid_box: Box<T>;
unsafe {
use std::alloc::{alloc, dealloc, Layout};
let layout = Layout::new::<T>();
let ptr = alloc(layout) as *mut T;
grid_box = Box::from_raw(ptr);
}
return grid_box;
}
This will create a region in memory automatically sized after T and unsafely convince Rust that that memory region is an actual T value. The memory may contain arbitrary data; you should not assume all values are 0.
Example use:
let mut triangles: Box<[Triangle; 100000]> = unsafe_allocate::<[Triangle; 100000]>();

Is there a way to allocate directly to the heap without using the unstable box syntax? [duplicate]

In this code...
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> { // Stack frame:
let v = Test { a: 123, b: 456 }; // 12 bytes
Box::new(v) // `usize` bytes (`*const T` in `Box`)
}
... as far as I understand (ignoring possible optimizations), v gets allocated on the stack and then copied to the heap, before being returned in a Box.
And this code...
fn foo() -> Box<Test> {
Box::new(Test { a: 123, b: 456 })
}
...shouldn't be any different, presumably, since there should be a temporary variable for struct allocation (assuming compiler doesn't have any special semantics for the instantiation expression inside Box::new()).
I've found Do values in return position always get allocated in the parents stack frame or receiving Box?. Regarding my specific question, it only proposes the experimental box syntax, but mostly talks about compiler optimizations (copy elision).
So my question remains: using stable Rust, how does one allocate structs directly on the heap, without relying on compiler optimizations?
As of Rust 1.39, there seems to be only one way in stable to allocate memory on the heap directly - by using std::alloc::alloc (note that the docs state that it is expected to be deprecated). It's reasonably unsafe.
Example:
#[derive(Debug)]
struct Test {
a: i64,
b: &'static str,
}
fn main() {
use std::alloc::{alloc, dealloc, Layout};
unsafe {
let layout = Layout::new::<Test>();
let ptr = alloc(layout) as *mut Test;
(*ptr).a = 42;
(*ptr).b = "testing";
let bx = Box::from_raw(ptr);
println!("{:?}", bx);
}
}
This approach is used in the unstable method Box::new_uninit.
It turns out there's even a crate for avoiding memcpy calls (among other things): copyless. This crate also uses an approach based on this.
You seem to be looking for the box_syntax feature, however as of Rust 1.39.0 it is not stable and only available with a nightly compiler. It also seems like this feature will not be stabilized any time soon, and might have a different design if it ever gets stabilized.
On a nightly compiler, you can write:
#![feature(box_syntax)]
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> {
box Test { a: 123, b: 456 }
}
Is there a way to allocate directly to the heap without box?
No. If there was, it wouldn't need a language change.
People tend to avoid this by using the unstable syntax indirectly, such as by using one of the standard containers which, in turn, uses it internally.
See also:
How to allocate arrays on the heap in Rust 1.0?
Is there any way to allocate a standard Rust array directly on the heap, skipping the stack entirely?
What does the box keyword do?
What is the <- symbol in Rust?
I recently had the same problem. Based on the answers here and other places, I wrote a simple function for heap allocation:
pub fn unsafe_allocate<T>() -> Box<T> {
let mut grid_box: Box<T>;
unsafe {
use std::alloc::{alloc, dealloc, Layout};
let layout = Layout::new::<T>();
let ptr = alloc(layout) as *mut T;
grid_box = Box::from_raw(ptr);
}
return grid_box;
}
This will create a region in memory automatically sized after T and unsafely convince Rust that that memory region is an actual T value. The memory may contain arbitrary data; you should not assume all values are 0.
Example use:
let mut triangles: Box<[Triangle; 100000]> = unsafe_allocate::<[Triangle; 100000]>();

Adding lifetime constraints to non-reference types

I am trying to figure out how to apply Rust lifetimes to add some compile-time enforcement to Erlang NIF modules. NIF modules are shared libraries normally written in C that provide extensions.
A simplified prototype of the callback you would write in C looks like this:
Handle my_nif_function(Heap *heap, Handle handle);
You are provided a handle and a pointer to the heap that owns it. In your callback you may inspect the input handle, create more handles on the heap, and return one of them as the function return. The heap and all its handles become invalid after your callback returns, so you must not store copies of the heap or its handles during the callback. Unfortunately I’ve seen people do exactly this and it eventually results in a mysterious emulator crash. Can Rust enforce these lifetime constraints?
I think the heap can be easily managed by turning it into a reference.
fn my_nif_function(heap: &Heap, handle: Handle) -> Handle
But how can I link the lifetime of the input and output handles to the heap?
Another wrinkle to this is that you can also create your own heaps and handles which are allowed to live outside the scope of a callback invocation. In C++ I would use std::unique_ptr with a custom destructor. What is the Rust equivalent? The [simplified] C API for managing heaps looks like this:
Heap *create_heap();
void destroy_heap(Heap *);
Reference: NIFs are described here: http://www.erlang.org/doc/man/erl_nif.html . The Erlang names for "heaps" and "handles" are "environments" and "terms". I used the names "heaps" and "handles" so that the question would be more broadly understood.
Rust 1.0
The various marker types have been unified into one: PhantomData
use std::ptr;
use std::marker::PhantomData;
struct Heap {
ptr: *const u8,
}
impl Heap {
fn new(c_ptr: *const u8) -> Heap {
Heap {
ptr: c_ptr
}
}
fn wrap_handle<'a>(&'a self, c_handle: *const u8) -> Handle<'a> {
Handle {
ptr: c_handle,
marker: PhantomData,
}
}
}
struct Handle<'a> {
ptr: *const u8,
marker: PhantomData<&'a ()>,
}
fn main() {
let longer_heap = Heap::new(ptr::null());
let handle = {
let shorter_heap = Heap::new(ptr::null());
let longer_handle = longer_heap.wrap_handle(ptr::null());
let shorter_handle = shorter_heap.wrap_handle(ptr::null());
// longer_handle // ok to return
// shorter_handle // error: `shorter_heap` does not live long enough
};
}
Original Answer
Here's an example of using ContravariantLifetime. We wrap the raw heap pointer into a struct and then wrap raw handle pointers in another struct, reusing the lifetime of the heap.
use std::ptr;
use std::marker::ContravariantLifetime;
struct Heap {
ptr: *const u8,
}
impl Heap {
fn new(c_ptr: *const u8) -> Heap {
Heap {
ptr: c_ptr
}
}
fn wrap_handle<'a>(&'a self, c_handle: *const u8) -> Handle<'a> {
Handle {
ptr: c_handle,
marker: ContravariantLifetime,
}
}
}
struct Handle<'a> {
ptr: *const u8,
marker: ContravariantLifetime<'a>,
}
fn main() {
let longer_heap = Heap::new(ptr::null());
let handle = {
let shorter_heap = Heap::new(ptr::null());
let longer_handle = longer_heap.wrap_handle(ptr::null());
let shorter_handle = shorter_heap.wrap_handle(ptr::null());
// longer_handle // ok to return
// shorter_handle // error: `shorter_heap` does not live long enough
};
}
Lifetime markers
There are 3 lifetime markers. I won't attempt to replicate the reasonably good but dense documentation here, but can also point out the dense Wikipedia page, which might be some small assistance. I've listed them in the order that you are most likely to use them:
ContravariantLifetime
InvariantLifetime
CovariantLifetime

Resources