When should I call the free() methods generated by wasm-pack? - rust

I wrote some Rust code and compiled it with wasm-pack. I notice these free() methods in the generated .d.ts files:
export class PdfDoc {
free(): void;
...
}
PdfDoc owns a massive amount of memory, up to 1GB, so it's important that all that memory be properly released for reuse when the javascript code is done with it.
Questions:
When should I call these free() methods?
Do I need to call them explicitly or will they be called automatically?
What happens if I never call them?
I searched for "wasm-pack free method" but these combination of search terms didn't find anything useful.

I was wondering the same thing: do I need to carefully pair each new MyStruct() with a call to free() when using wasm-bindgen?
When should I call these free() methods?
Call free() before losing the last reference to the JS object wrapper instance, or earlier if you are done using the object.
Do I need to call them explicitly or will they be called automatically?
Currently WASM-allocated memory will not free when the JS object wrapper goes out of scope (but s.a. weak references below).
What happens if I never call them?
The WASM memory is lost and without a pointer now you won't be able to recover it. This might not be a problem for a fixed or limited number of smaller sized structs, the whole WASM memory is released on unloading the page.
In more detail:
Looking at the created bindings we see that the memory allocated in the constructors is not tracked elsewhere and effectively lost if we just forget the returned instance (a JS wrapper object that stores the raw pointer as ptr).
The wasm-bindgen Guide also hints to this in Support for Weak References
mentioning that TC39 weak references is not supported/implemented right now (late 2022):
Without weak references your JS integration may be susceptible to memory leaks in Rust, for example:
You could forget to call .free() on a JS object, leaving the Rust memory allocated.
The wasm-bindgen Guide example WebAudio shows the usage of free() to prevent
leaking memory when repeatedly creating objects that go out of scope. There is at most exactly one (active) object remaining, which mostly reflects your use-case:
Cleaning up objects by calling free() when they are not needed anymore and before they go out of scope.
As an additional aside on careful memory management:
There might be a design catch to watch out for when using copy-types, consider:
#[wasm_bindgen]
#[derive(Clone, Copy)]
pub struct Bounds {
width: usize,
height: usize,
}
#[wasm_bindgen]
impl Bounds {
// ...
#[wasm_bindgen(getter)]
pub fn width(&self) -> usize {
self.width
}
}
#[wasm_bindgen]
pub struct MyThing {
bounds: Bounds,
// ...
}
#[wasm_bindgen]
impl MyThing {
// ...
#[wasm_bindgen(getter)]
pub fn bounds(&self) -> Bounds {
self.bounds
}
}
which is easily usual and safe code in Rust but here will leak memory if simply used from JS like
console.log(`Current width is ${myThing.bounds.width} px`);
You might want to watch WASM memory while developing with e.g.
console.log(`WASM memory usage is ${wasm.memory.buffer.byteLength} bytes`);

Related

What does "uninitialized" mean in the context of FFI?

I'm writing some GPU code for macOS using the metal crate. In doing so, I allocate a Buffer object by calling:
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared)
This FFIs to Apple's Metal API, which allocates a region of memory that both the CPU and GPU can access and the Rust wrapper returns a Buffer object. I can then get a pointer to this region of memory by doing:
let data = buffer.contents() as *mut u32
In the colloquial sense, this region of memory is uninitialized. However, is this region of memory "uninitialized" in the Rust sense?
Is this sound?
let num_bytes = num_u32 * std::mem::size_of::<u32>();
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut u32;
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
for i in as_slice {
*i = 42u32;
}
Here I'm writing u32s to a region of memory returned to me by FFI. From the nomicon:
...The subtle aspect of this is that usually, when we use = to assign to a value that the Rust type checker considers to already be initialized (like x[i]), the old value stored on the left-hand side gets dropped. This would be a disaster. However, in this case, the type of the left-hand side is MaybeUninit<Box>, and dropping that does not do anything! See below for some more discussion of this drop issue.
None of the from_raw_parts rules are violated and u32 doesn't have a drop method.
Nonetheless, is this sound?
Would reading from the region (as u32s) before writing to it be sound (nonsense values aside)? The region of memory is valid and u32 is defined for all bit patterns.
Best practices
Now consider a type T that does have a drop method (and you've done all the bindgen and #[repr(C)] nonsense so that it can go across FFI boundaries).
In this situation, should one:
Initialize the buffer in Rust by scanning the region with pointers and calling .write()?
Do:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };
for i in as_slice {
*i = unsafe { MaybeUninit::new(T::new()).assume_init() };
}
Furthermore, after initializing the region, how does the Rust compiler remember this region is initialized on subsequent calls to .contents() later in the program?
Thought experiment
In some cases, the buffer is the output of a GPU kernel and I want to read the results. All the writes occurred in code outside of Rust's control and when I call .contents(), the pointer at the region of memory contains the correct uint32_t values. This thought experiment should relay my concern with this.
Suppose I call C's malloc, which returns an allocated buffer of uninitialized data. Does reading u32 values from this buffer (pointers are properly aligned and in bounds) as any type should fall squarely into undefined behavior.
However, suppose I instead call calloc, which zeros the buffer before returning it. If you don't like calloc, then suppose I have an FFI function that calls malloc, explicitly writes 0 uint32_t types in C, then returns this buffer to Rust. This buffer is initialized with valid u32 bit patterns.
From Rust's perspective, does malloc return "uninitialized" data while calloc returns initialized data?
If the cases are different, how would the Rust compiler know the difference between the two with respect to soundness?
There are multiple parameters to consider when you have an area of memory:
The size of it is the most obvious.
Its alignment is still somewhat obvious.
Whether or not it's initialized -- and notably, for types like bool whether it's initialized with valid values as not all bit-patterns are valid.
Whether it's concurrently read/written.
Focusing on the trickier aspects, the recommendation is:
If the memory is potentially uninitialized, use MaybeUninit.
If the memory is potentially concurrently read/written, use a synchronization method -- be it a Mutex or AtomicXXX or ....
And that's it. Doing so will always be sound, no need to look for "excuses" or "exceptions".
Hence, in your case:
let num_bytes = num_u32 * std::mem::size_of::<u32>();
assert!(num_bytes <= isize::MAX as usize);
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut MaybeUninit<u32>;
// Safety:
// - `data` is valid for reads and writes.
// - `data` points to `num_u32` elements.
// - Access to `data` is exclusive for the duration.
// - `num_u32 * size_of::<u32>() <= isize::MAX`.
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
for i in as_slice {
i.write(42); // Yes you can write `*i = MaybeUninit::new(42);` too,
// but why would you?
}
// OR with nightly:
as_slice.write_slice(some_slice_of_u32s);
This is very similar to this post on the users forum mentioned in the comment on your question. (here's some links from that post: 2 3)
The answers there aren't the most organized, but it seems like there's four main issues with uninitialized memory:
Rust assumes it is initialized
Rust assumes the memory is a valid bit pattern for the type
The OS may overwrite it
Security vulnerabilities from reading freed memory
For #1, this seems to me to not be an issue, since if there was another version of the FFI function that returned initialized memory instead of uninitialized memory, it would look identical to rust.
I think most people understand #2, and that's not an issue for u32.
#3 could be a problem, but since this is for a specific OS you may be able to ignore this if MacOS guarantees it does not do this.
#4 may or may not be undefined behavior, but it is highly undesirable. This is why you should treat it as uninitialized even if rust thinks it's a list of valid u32s. You don't want rust to think it's valid. Therefore, you should use MaybeUninit even for u32.
MaybeUninit
It's correct to cast the pointer to a slice of MaybeUninit. Your example isn't written correctly, though. assume_init returns T, and you can't assign that to an element from [MaybeUninit<T>]. Fixed:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };
for i in as_slice {
i.write(T::new());
}
Then, turning that slice of MaybeUninit into a slice of T:
let init_slice = unsafe { &mut *(as_slice as *mut [MaybeUninit<T>] as *mut [T]) };
Another issue is that &mut may not be correct to have at all here since you say it's shared between GPU and CPU. Rust depends on your rust code being the only thing that can access &mut data, so you need to ensure any &mut are gone while the GPU accesses the memory. If you want to interlace rust access and GPU access, you need to synchronize them somehow, and only store *mut while the GPU has access (or reacquire it from FFI).
Notes
The code is mainly taken from Initializing an array element-by-element in the MaybeUninit doc, plus the very useful Alternatives section from transmute. The conversion from &mut [MaybeUninit<T>] to &mut [T] is how slice_assume_init_mut is written as well. You don't need to transmute like in the other examples since it is behind a pointer. Another similar example is in the nomicon: Unchecked Uninitialized Memory. That one accesses the elements by index, but it seems like doing that, using * on each &mut MaybeUninit<T>, and calling write are all valid. I used write since it's shortest and is easy to understand. The nomicon also says that using ptr methods like write is also valid, which should be equivalent to using MaybeUninit::write.
There's some nightly [MaybeUninit] methods that will be helpful in the future, like slice_assume_init_mut

Are read_volatile and write_volatile atomic for usize?

I want to use read_volatile and write_volatile for IPC using shared memory. Is it guaranteed that writing of an unsigned integer of usize type will be atomic?
At the time of this writing, Rust does not have a proper memory model, but instead it uses that imposed by the LLVM, that is basically that of C++, that in turn is inherited fom C. So the best references you have of what is guaranteed doing memory stuff is that from C.
In C volatile should not be used for syncronization, its intended use is for memory mapped I/O and maybe for single-threaded signal handlers. See for example this Linux-kernel specific gideline. Or this other description of volatile:
This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution.
If you want to do concurrent access to a value you should use atomics operations. They have the volatile guarantee plus additional ones. They are guaranteed to be atomic even in the presence of concurrent access. And moreover they allow you to set the ordering mode.
For your particular case you should use AtomicUsize. Note that the availability of that type is conditioned on your architecture having the necessary support, but that is exactly what you want.
Note that an AtomicUsize has the same memory layout of a plain usize, so if you have a usize embedded in a shared struct you can access atomically with a pointer cast. I think this code is sound:
struct SharedData {
// ...
x: usize
}
fn test(data: *mut SharedData) {
let x = unsafe { &*(&(*data).x as *const usize as *const AtomicUsize) };
let _ = x.load(Ordering::Relaxed);
}
Although you would be better just declaring that x as AtomicUsize directly.
Also note that reading or writing that value using any non-atomic operation (even just reading it out of curiosity, even using volatile access) invokes Undefined Behavior.

Does moving data to Rc/Arc always copy it from the stack to the heap?

Take a look at the following simple example:
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
}
The official documentation of Rc says:
The type Rc<T> provides shared ownership of a value of type T,
allocated in the heap.
Theoretically it is clear. But, firstly my_struct is not immediately wrapped into Rc, and secondly MyStruct is a very simple type. I can see 2 scenarios here.
When my_struct is moved into the Rc the memory content is literally copied from the stack to the heap.
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
If number 1 is true, then there might be a hidden performance bottleneck as when reading the code one does not explicitly see memory being copied (I am assuming MyStruct being much more complex).
If number 2 is true, I wonder whether the compiler is always able to resolve such things. The provided example is very simple, but I can imagine that my_struct is much more complex and is mutated several times by different functions before being moved to the Rc.
Tl;dr It could be either scenario, but for the most part, you should just write code in the most obvious way and let the compiler worry about it.
According to the semantics of the abstract machine, that is, the theoretical model of computation that defines Rust's behavior, there is always a copy. In fact, there are at least two: my_struct is first created in the stack frame of main, but then has to be moved into the stack frame of Rc::new. Then Rc::new has to create an allocation and move my_struct a second time, from its own stack frame into the newly allocated memory*. Each of these moves is conceptually a copy.
However, this analysis isn't particularly useful for predicting the performance of code in practice, for three reasons:
Copies are actually pretty darn cheap. Moving my_struct from one place to another may actually be much cheaper, in the long run, than referencing it with a pointer. Copying a chunk of bytes is easy to optimize on modern processors; following a pointer to some arbitrary location is not. (Bear in mind also that the complexity of the structure is irrelevant because all moves are bytewise copies; for instance, moving any Vec is just copying three usizes regardless of the contents.)
If you haven't measured the performance and shown that excessive copying is a problem, you must not assume that it is without evidence: you may accidentally pessimize instead of optimizing your code. Measure first.
The semantics of the abstract machine is not the semantics of your real machine. The whole point of an optimizing compiler is to figure out the best way to transform one to the other. Under reasonable assumptions, it's very unlikely that the code here would result in 2 copies with optimizations turned on. But how the compiler eliminates one or both copies may be dependent on the rest of the code: not just on the snippet that contains them but on how the data is initialized and so forth. Real machine performance is complicated and generally requires analysis of more than just a few lines at a time. Again, this is the whole point of an optimizing compiler: it can do a much more comprehensive analysis, much faster than you or I can.
Even if the compiler leaves a copy "on the table", you shouldn't assume without evidence that removing the copy would make things better simply because it is a copy. Measure first.
It probably doesn't matter anyway, in this case. Requesting a new allocation from the heap is likely† more expensive than copying a bunch of bytes from one place to another, so fiddling around with 1 fast copy vs. no copies while ignoring a (plausible) big bottleneck is probably a waste of time. Don't try to optimize things before you've profiled your application or library to see where the most performance is being lost. Measure first.
See also
Questions about overflowing the stack by accidentally putting large data on it (to which the solution is usually to use Vec instead of an array):
How to allocate arrays on the heap in Rust 1.0?
Thread '<main>' has overflowed its stack when allocating a large array using Box
* Rc, although part of the standard library, is written in plain Rust code, which is how I analyze it here. Rc could theoretically be subject to guaranteed optimizations that aren't available to ordinary code, but that doesn't happen to be relevant to this case.
† Depending at least on the allocator and on whether new memory must be acquired from the OS or if a recently freed allocation can be re-used.
You can just test what happens:
Try to use my_struct after creating an Rc out of it. The value has been moved, so you can't use it.
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
// Add this line. Compilation error "borrow of moved value"
println!("my_struct.a = {}", my_struct.a);
}
Make your struct implement the Copy trait, and it will be automatically copied into the Rc::new function. Now the code above works, because the my_struct variable is not moved anywhere, just copied.
#[derive(Clone, Copy)]
struct MyStruct {
a: i8,
}
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
Take a look at Rc::new source code (removed the comment which is irrelevant).
struct RcBox<T: ?Sized> {
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
// ...
pub fn new(value: T) -> Rc<T> {
Self::from_inner(Box::into_raw_non_null(box RcBox {
strong: Cell::new(1),
weak: Cell::new(1),
value,
}))
}
It takes the value you pass to it, and creates a Box, so it's always put on the heap. This is plain Rust and I don't think it performs too many sophisticated optimizations, but that may change.
Note that "move" in Rust may also copy data implicitly, and this may depend on the current compiler's behavior. In that case, if you are concerned about performance you can try to make the struct as small as possible, and store some information on the heap. For example when a Vec<T> is moved, as far as I know it only copies the capacity, length and pointer to the heap, but the actual array which is on the heap is not copied element by element, so only a few bytes are copied when moving a vector (assuming the data is copied, because that's also subject to compiler optimizations in case copying is not actually needed).

Correctly storing a Rust Rc<T> in C-managed memory

I'm wrapping a Rust object to be used from Lua. I need the object to be destroyed when neither Rust code nor Lua still has a reference to it, so the obvious (to me) solution is to use Rc<T>, stored in Lua-managed memory.
The Lua API (I'm using rust-lua53 for now) lets you allocate a chunk of memory and attach methods and a finalizer to it, so I want to store an Rc<T> into that chunk of memory.
My current attempt looks like. First, creating an object:
/* Allocate a block of uninitialized memory to use */
let p = state.new_userdata(mem::size_of::<Rc<T>>() as size_t) as *mut Rc<T>;
/* Make a ref-counted pointer to a Rust object */
let rc = Rc::<T>::new(...);
/* Store the Rc */
unsafe { ptr::write(p, rc) };
And in the finaliser:
let p: *mut Rc<T> = ...; /* Get a pointer to the item to finalize */
unsafe { ptr::drop_in_place(p) }; /* Release the object */
Now this seems to work (as briefly tested by adding a println!() to the drop method). But is it correct and safe (as long as I make sure it's not accessed after finalization)? I don't feel confident enough in unsafe Rust to be sure that it's ok to ptr::write an Rc<T>.
I'm also wondering about, rather than storing an Rc<T> directly, storing an Option<Rc<T>>; then instead of drop_in_place() I would ptr::swap() it with None. This would make it easy to handle any use after finalization.
Now this seems to work (as briefly tested by adding a println!() to the drop method). But is it correct and safe (as long as I make sure it's not accessed after finalisation)? I don't feel confident enough in unsafe Rust to be sure that it's ok to ptr::write an Rc<T>.
Yes, you may ptr::write any Rust type to any memory location. This "leaks" the Rc<T> object, but writes a bit-equivalent to the target location.
When using it, you need to guarantee that no one modified it outside of Rust code and that you are still in the same thread as the one where it was created. If you want to be able to move across threads, you need to use Arc.
Rust's thread safety cannot protect you here, because you are using raw pointers.
I'm also wondering about, rather than storing an Rc<T> directly, storing an Option<Rc<T>>; then instead of drop_in_place() I would ptr::swap() it with None. This would make it easy to handle any use after finalisation.
The pendant to ptr::write is ptr::read. So if you can guarantee that no one ever tries to ptr::read or drop_in_place() the object, then you can just call ptr::read (which returns the object) and use that object as you would use any other Rc<T> object. You don't need to care about dropping or anything, because now it's back in Rust's control.
You should also be using new_userdata_typed instead of new_userdata, since that takes the memory handling off your hands. There are other convenience wrapper functions ending with the postfix _typed for most userdata needs.
Your code will work; of course, note that the drop_in_place(p) will just decrease the counter of the Rc and only drop the contained T if and only if it was the last reference, which is the correct action.

Is it possible to have safe mutable aliasing to non-overlapping memory?

I'm looking for a way to take a large object and break it into smaller mutable child objects, which can be processed in parallel.
Something like:
struct PixelBuffer { data:Vec<u32>, width:u32, height:u32 }
struct PixelBlock { data:Vec<u32> }
impl PixelBuffer {
fn decompose(&'a mut self) -> Vec<Guard<'a, PixelBlock>>> {
...
}
}
Where the resulting PixelBlock's can be processed in parallel, and the parent PixelBuffer will remain locked until all Guard<PixelBlock> are dropped.
This is effectively mutable pointer aliasing; the large data block in PixelBuffer will be directly modified via each PixelBlock.
However, each PixelBlock is non-overlapping segment from the internal data in PixelBuffer.
You can certainly do this in unsafe code (internal buffer is a raw pointer; generate a new external pointer for each PixelBlock); but is it possible to achieve the same result using safe code?
(NB. I'm open to using a data block allocated from libc::malloc if that'll help?)
This works fine and is a natural consequence of how, e.g., iterators work: the next method hands out a sequence of values that are not lifetime-connected to the reference they come from, i.e. fn next(&mut self) -> Option<Self::Item>. This automatically means that any iterator that yields &mut pointers (like, slice.iter_mut()) is yielding pointers to non-overlapping memory, because anything else would be incorrect.
One way to use this in parallel is something like my simple_parallel library, e.g. Pool::for_.
(You'll need to give more details about the internals of PixelBuffer to be more specific about how to do it in this case.)
There is no way to completely avoid unsafe Rust, because the compiler cannot currently evaluate the safety of sub-slices. However, the standard library contains code that provides a safe wrapper that you can use.
Read up on std::slice::Chunks and std::slice::ChunksMut.
Sample code: https://play.rust-lang.org/?gist=ceec5be3e1530c0a6d3b&version=stable
However, your next problem is sending the slices to separate threads, because the best way to do that would be thread::scoped, which is currently deprecated due to some safety problems that were discovered this year...
Also, keep in mind that Vec<_> owns its contents, whereas slices are just a view. Generally, you want to write most functions in terms of slices, and keep only one "Vec" to hold the data.

Resources