I'm writing some GPU code for macOS using the metal crate. In doing so, I allocate a Buffer object by calling:
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared)
This FFIs to Apple's Metal API, which allocates a region of memory that both the CPU and GPU can access and the Rust wrapper returns a Buffer object. I can then get a pointer to this region of memory by doing:
let data = buffer.contents() as *mut u32
In the colloquial sense, this region of memory is uninitialized. However, is this region of memory "uninitialized" in the Rust sense?
Is this sound?
let num_bytes = num_u32 * std::mem::size_of::<u32>();
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut u32;
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
for i in as_slice {
*i = 42u32;
}
Here I'm writing u32s to a region of memory returned to me by FFI. From the nomicon:
...The subtle aspect of this is that usually, when we use = to assign to a value that the Rust type checker considers to already be initialized (like x[i]), the old value stored on the left-hand side gets dropped. This would be a disaster. However, in this case, the type of the left-hand side is MaybeUninit<Box>, and dropping that does not do anything! See below for some more discussion of this drop issue.
None of the from_raw_parts rules are violated and u32 doesn't have a drop method.
Nonetheless, is this sound?
Would reading from the region (as u32s) before writing to it be sound (nonsense values aside)? The region of memory is valid and u32 is defined for all bit patterns.
Best practices
Now consider a type T that does have a drop method (and you've done all the bindgen and #[repr(C)] nonsense so that it can go across FFI boundaries).
In this situation, should one:
Initialize the buffer in Rust by scanning the region with pointers and calling .write()?
Do:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };
for i in as_slice {
*i = unsafe { MaybeUninit::new(T::new()).assume_init() };
}
Furthermore, after initializing the region, how does the Rust compiler remember this region is initialized on subsequent calls to .contents() later in the program?
Thought experiment
In some cases, the buffer is the output of a GPU kernel and I want to read the results. All the writes occurred in code outside of Rust's control and when I call .contents(), the pointer at the region of memory contains the correct uint32_t values. This thought experiment should relay my concern with this.
Suppose I call C's malloc, which returns an allocated buffer of uninitialized data. Does reading u32 values from this buffer (pointers are properly aligned and in bounds) as any type should fall squarely into undefined behavior.
However, suppose I instead call calloc, which zeros the buffer before returning it. If you don't like calloc, then suppose I have an FFI function that calls malloc, explicitly writes 0 uint32_t types in C, then returns this buffer to Rust. This buffer is initialized with valid u32 bit patterns.
From Rust's perspective, does malloc return "uninitialized" data while calloc returns initialized data?
If the cases are different, how would the Rust compiler know the difference between the two with respect to soundness?
There are multiple parameters to consider when you have an area of memory:
The size of it is the most obvious.
Its alignment is still somewhat obvious.
Whether or not it's initialized -- and notably, for types like bool whether it's initialized with valid values as not all bit-patterns are valid.
Whether it's concurrently read/written.
Focusing on the trickier aspects, the recommendation is:
If the memory is potentially uninitialized, use MaybeUninit.
If the memory is potentially concurrently read/written, use a synchronization method -- be it a Mutex or AtomicXXX or ....
And that's it. Doing so will always be sound, no need to look for "excuses" or "exceptions".
Hence, in your case:
let num_bytes = num_u32 * std::mem::size_of::<u32>();
assert!(num_bytes <= isize::MAX as usize);
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut MaybeUninit<u32>;
// Safety:
// - `data` is valid for reads and writes.
// - `data` points to `num_u32` elements.
// - Access to `data` is exclusive for the duration.
// - `num_u32 * size_of::<u32>() <= isize::MAX`.
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
for i in as_slice {
i.write(42); // Yes you can write `*i = MaybeUninit::new(42);` too,
// but why would you?
}
// OR with nightly:
as_slice.write_slice(some_slice_of_u32s);
This is very similar to this post on the users forum mentioned in the comment on your question. (here's some links from that post: 2 3)
The answers there aren't the most organized, but it seems like there's four main issues with uninitialized memory:
Rust assumes it is initialized
Rust assumes the memory is a valid bit pattern for the type
The OS may overwrite it
Security vulnerabilities from reading freed memory
For #1, this seems to me to not be an issue, since if there was another version of the FFI function that returned initialized memory instead of uninitialized memory, it would look identical to rust.
I think most people understand #2, and that's not an issue for u32.
#3 could be a problem, but since this is for a specific OS you may be able to ignore this if MacOS guarantees it does not do this.
#4 may or may not be undefined behavior, but it is highly undesirable. This is why you should treat it as uninitialized even if rust thinks it's a list of valid u32s. You don't want rust to think it's valid. Therefore, you should use MaybeUninit even for u32.
MaybeUninit
It's correct to cast the pointer to a slice of MaybeUninit. Your example isn't written correctly, though. assume_init returns T, and you can't assign that to an element from [MaybeUninit<T>]. Fixed:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };
for i in as_slice {
i.write(T::new());
}
Then, turning that slice of MaybeUninit into a slice of T:
let init_slice = unsafe { &mut *(as_slice as *mut [MaybeUninit<T>] as *mut [T]) };
Another issue is that &mut may not be correct to have at all here since you say it's shared between GPU and CPU. Rust depends on your rust code being the only thing that can access &mut data, so you need to ensure any &mut are gone while the GPU accesses the memory. If you want to interlace rust access and GPU access, you need to synchronize them somehow, and only store *mut while the GPU has access (or reacquire it from FFI).
Notes
The code is mainly taken from Initializing an array element-by-element in the MaybeUninit doc, plus the very useful Alternatives section from transmute. The conversion from &mut [MaybeUninit<T>] to &mut [T] is how slice_assume_init_mut is written as well. You don't need to transmute like in the other examples since it is behind a pointer. Another similar example is in the nomicon: Unchecked Uninitialized Memory. That one accesses the elements by index, but it seems like doing that, using * on each &mut MaybeUninit<T>, and calling write are all valid. I used write since it's shortest and is easy to understand. The nomicon also says that using ptr methods like write is also valid, which should be equivalent to using MaybeUninit::write.
There's some nightly [MaybeUninit] methods that will be helpful in the future, like slice_assume_init_mut
Related
In vulkano, to create a CPUAccessibleBuffer you need give it some data and the CPUAccessibleBuffer::from_data function requires the data to have the 'static lifetime.
I have some data in &[u8] array (created at runtime) that I would like to pass to that function.
However, it errors with this message
argument requires that `data` is borrowed for `'static`
So how can I make the lifetime of the data 'static ?
You should use CpuAccessibleBuffer::from_iter instead, it does the same thing but does not require the collection to be Copy or 'static:
let data: &[u8] = todo!();
let _ = CpuAccessibleBuffer::from_iter(
device,
usage,
host_cached,
data.iter().copied(), // <--- pass like so
);
Or if you actually have a Vec<u8>, you can pass it directly:
let data: Vec<u8> = todo!();
let _ = CpuAccessibleBuffer::from_iter(
device,
usage,
host_cached,
data, // <--- pass like so
);
If you really must create the data at runtime, and you really need to last for 'static, then you can use one of the memory leaking methods such as Box::leak or Vec::leak to deliberately leak a heap allocation and ensure it is never freed.
While leaking memory is normally something one avoids, in this case it's actually a sensible thing to do. If the data must live forever then leaking it is actually the correct thing to do, semantically speaking. You don't want the memory to be freed, not ever, which is exactly what happens when memory is leaked.
Example:
fn require_static_data(data: &'static [u8]) {
unimplemented!()
}
fn main() {
let data = vec![1, 2, 3, 4];
require_static_data(data.leak());
}
Playground
That said, really think over the reallys I led with. Make sure you understand why the code you're calling wants 'static data and ask yourself why your data isn't already 'static.
Is it possible to create the data at compile time? Rust has a powerful build time macro system. It's possible, for example, to use include_bytes! to read in a file and do some processing on it before it's embedded into your executable.
Is there another API you can use, another function call you're not seeing that doesn't require 'static?
(These questions aren't for you specifically, but for anyone who comes across this Q&A in the future.)
If the data is created at runtime, it can't have a static lifetime. Static means that data is present for the whole lifetime of the program, which is necessary in some contexts, especially when threading is involved. One way for data to be static is, as Paul already answered, explicitly declaring it as such, i.e.:
static constant_value: i32 = 0;
However, there's no universally applicable way to make arbitrary data static. This type of inference is made at compile-time by the borrow checker, not by the programmer.
Usually if a function requires 'static (type) arguments (as in this case) it means that anything less could potentially be unsafe, and you need to reorganize the way data flows in and out of your program to provide this type of data safely. Unfortunately, that's not something SO can provide within the scope of this question.
Make a constant with static lifetime:
static NUM: i32 = 18;
I'm playing with unsafe rust and trying to implement and I've found something I don't understand. I thought for sure I'd have a dangling pointer and that I'd get some kind of runtime error when trying to run this, but I don't.
fn main() {
let s1 = String::from("s1");
let s1_raw_ptr: *const String = &s1;
drop(s1);
unsafe {
let s = &*s1_raw_ptr;
println!("s recovered from raw pointer: {:?}", s);
}
}
This outputs:
s recovered from raw pointer: "s1"
I thought that when a value goes out of scope in Rust that it is immediately cleaned up. How is it that dereferencing a raw pointer to a now-dropped value is working?
When a String is dropped in Rust, ultimately what ends up getting called is Allocator::deallocate on the system allocator. After this, using the data is undefined behaviour, so anything could happen! But in practice what tends to happen if there aren't any funky compiler optimizations is that you just get whatever data is stored in memory there. If there aren't any new allocations at that place, then you just get whatever data was there before.
When a memory allocation is freed, nothing happens to the allocation. Clearing the data by setting it to all zero (or some other value) would be pointless, since any newly allocated memory always needs to be initialized by the user of that memory.
Take a look at the following simple example:
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
}
The official documentation of Rc says:
The type Rc<T> provides shared ownership of a value of type T,
allocated in the heap.
Theoretically it is clear. But, firstly my_struct is not immediately wrapped into Rc, and secondly MyStruct is a very simple type. I can see 2 scenarios here.
When my_struct is moved into the Rc the memory content is literally copied from the stack to the heap.
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
If number 1 is true, then there might be a hidden performance bottleneck as when reading the code one does not explicitly see memory being copied (I am assuming MyStruct being much more complex).
If number 2 is true, I wonder whether the compiler is always able to resolve such things. The provided example is very simple, but I can imagine that my_struct is much more complex and is mutated several times by different functions before being moved to the Rc.
Tl;dr It could be either scenario, but for the most part, you should just write code in the most obvious way and let the compiler worry about it.
According to the semantics of the abstract machine, that is, the theoretical model of computation that defines Rust's behavior, there is always a copy. In fact, there are at least two: my_struct is first created in the stack frame of main, but then has to be moved into the stack frame of Rc::new. Then Rc::new has to create an allocation and move my_struct a second time, from its own stack frame into the newly allocated memory*. Each of these moves is conceptually a copy.
However, this analysis isn't particularly useful for predicting the performance of code in practice, for three reasons:
Copies are actually pretty darn cheap. Moving my_struct from one place to another may actually be much cheaper, in the long run, than referencing it with a pointer. Copying a chunk of bytes is easy to optimize on modern processors; following a pointer to some arbitrary location is not. (Bear in mind also that the complexity of the structure is irrelevant because all moves are bytewise copies; for instance, moving any Vec is just copying three usizes regardless of the contents.)
If you haven't measured the performance and shown that excessive copying is a problem, you must not assume that it is without evidence: you may accidentally pessimize instead of optimizing your code. Measure first.
The semantics of the abstract machine is not the semantics of your real machine. The whole point of an optimizing compiler is to figure out the best way to transform one to the other. Under reasonable assumptions, it's very unlikely that the code here would result in 2 copies with optimizations turned on. But how the compiler eliminates one or both copies may be dependent on the rest of the code: not just on the snippet that contains them but on how the data is initialized and so forth. Real machine performance is complicated and generally requires analysis of more than just a few lines at a time. Again, this is the whole point of an optimizing compiler: it can do a much more comprehensive analysis, much faster than you or I can.
Even if the compiler leaves a copy "on the table", you shouldn't assume without evidence that removing the copy would make things better simply because it is a copy. Measure first.
It probably doesn't matter anyway, in this case. Requesting a new allocation from the heap is likely† more expensive than copying a bunch of bytes from one place to another, so fiddling around with 1 fast copy vs. no copies while ignoring a (plausible) big bottleneck is probably a waste of time. Don't try to optimize things before you've profiled your application or library to see where the most performance is being lost. Measure first.
See also
Questions about overflowing the stack by accidentally putting large data on it (to which the solution is usually to use Vec instead of an array):
How to allocate arrays on the heap in Rust 1.0?
Thread '<main>' has overflowed its stack when allocating a large array using Box
* Rc, although part of the standard library, is written in plain Rust code, which is how I analyze it here. Rc could theoretically be subject to guaranteed optimizations that aren't available to ordinary code, but that doesn't happen to be relevant to this case.
† Depending at least on the allocator and on whether new memory must be acquired from the OS or if a recently freed allocation can be re-used.
You can just test what happens:
Try to use my_struct after creating an Rc out of it. The value has been moved, so you can't use it.
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
// Add this line. Compilation error "borrow of moved value"
println!("my_struct.a = {}", my_struct.a);
}
Make your struct implement the Copy trait, and it will be automatically copied into the Rc::new function. Now the code above works, because the my_struct variable is not moved anywhere, just copied.
#[derive(Clone, Copy)]
struct MyStruct {
a: i8,
}
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
Take a look at Rc::new source code (removed the comment which is irrelevant).
struct RcBox<T: ?Sized> {
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
// ...
pub fn new(value: T) -> Rc<T> {
Self::from_inner(Box::into_raw_non_null(box RcBox {
strong: Cell::new(1),
weak: Cell::new(1),
value,
}))
}
It takes the value you pass to it, and creates a Box, so it's always put on the heap. This is plain Rust and I don't think it performs too many sophisticated optimizations, but that may change.
Note that "move" in Rust may also copy data implicitly, and this may depend on the current compiler's behavior. In that case, if you are concerned about performance you can try to make the struct as small as possible, and store some information on the heap. For example when a Vec<T> is moved, as far as I know it only copies the capacity, length and pointer to the heap, but the actual array which is on the heap is not copied element by element, so only a few bytes are copied when moving a vector (assuming the data is copied, because that's also subject to compiler optimizations in case copying is not actually needed).
I'm trying to understand what exactly the Rust aliasing/memory model allows. In particular I'm interested in when accessing memory outside the range you have a reference to (which might be aliased by other code on the same or different threads) becomes undefined behaviour.
The following examples all access memory outside what is ordinarily allowed, but in ways that would be safe if the compiler produced the obvious assembly code. In addition, I see little conflict potential with compiler optimization, but they might still violate strict aliasing rules of Rust or LLVM thus constituting undefined behavior.
The operations are all properly aligned and thus cannot cross a cache-line or page boundary.
Read the aligned 32-bit word surrounding the data we want to access and discard the parts outside of what we're allowed to read.
Variants of this could be useful in SIMD code.
pub fn read(x: &u8) -> u8 {
let pb = x as *const u8;
let pw = ((pb as usize) & !3) as *const u32;
let w = unsafe { *pw }.to_le();
(w >> ((pb as usize) & 3) * 8) as u8
}
Same as 1, but reads the 32-bit word using an atomic_load intrinsic.
pub fn read_vol(x: &u8) -> u8 {
let pb = x as *const u8;
let pw = ((pb as usize) & !3) as *const AtomicU32;
let w = unsafe { (&*pw).load(Ordering::Relaxed) }.to_le();
(w >> ((pb as usize) & 3) * 8) as u8
}
Replace the aligned 32-bit word containing the value we care about using CAS. It overwrites the parts outside what we're allowed to access with what's already in there, so it only affects the parts we're allowed to access.
This could be useful to emulate small atomic types using bigger ones. I used AtomicU32 for simplicity, in practice AtomicUsize is the interesting one.
pub fn write(x: &mut u8, value:u8) {
let pb = x as *const u8;
let atom_w = unsafe { &*(((pb as usize) & !3) as *const AtomicU32) };
let mut old = atom_w.load(Ordering::Relaxed);
loop {
let shift = ((pb as usize) & 3) * 8;
let new = u32::from_le((old.to_le() & 0xFF_u32 <<shift)|((value as u32) << shift));
match atom_w.compare_exchange_weak(old, new, Ordering::SeqCst, Ordering::Relaxed) {
Ok(_) => break,
Err(x) => old = x,
}
}
}
This is a very interesting question.
There are actually several issues with these functions, making them unsound (i.e., not safe to expose) for various formal reasons.
At the same time, I am unable to actually construct a problematic interaction between these functions and compiler optimizations.
Out-of-bounds accesses
I'd say all of these functions are unsound because they can access unallocated memory. Each of them I can call with a &*Box::new(0u8) or &mut *Box::new(0u8), resulting in out-of-bounds accesses, i.e. accesses beyond what was allocated using malloc (or whatever allocator). Neither C nor LLVM permit such accesses. (I'm using the heap because I find it easier to think about allocations there, but the same applies to the stack where every stack variable is really its own independent allocation.)
Granted, the LLVM language reference doesn't actually define when a load has undefined behavior due to the access not being inside the object. However, we can get a hint in the documentation of getlementptr inbounds, which says
The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end.
I am fairly certain that being in bounds is a necessary but not sufficient requirement for actually using an address with load/store.
Note that this is independent of what happens on the assembly level; LLVM will do optimizations based on a much higher-level memory model that argues in terms of allocated blocks (or "objects" as C calls them) and staying within the bounds of these blocks.
C (and Rust) are not assembly, and it is not possible to use assembly-based reasoning on them.
Most of the time it is possible to derive contradictions from assembly-based reasoning (see e.g. this bug in LLVM for a very subtle example: casting a pointer to an integer and back is not a NOP).
This time, however, the only examples I can come up with are fairly far-fetched: For example, with memory-mapped IO, even reads from a location could "mean" something to the underlying hardware, and there could be such a read-sensitive location sitting right next to the one that's passed into read.
But really I don't know much about this kind of embedded/driver development, so this may be entirely unrealistic.
(EDIT: I should add that I am not an LLVM expert. Probably the llvm-dev mailing list is a better place to determine if they are willing to commit to permitting such out-of-bounds accesses.)
Data races
There is another reason at least some of these functions are not sound: Concurrency. You clearly already saw this coming, judging from the use of concurrent accesses.
Both read and read_vol are definitely unsound under the concurrency semantics of C11. Imagine x is the first element of a [u8], and another thread is writing to the second element at the same time as we execute read/read_vol. Our read of the whole 32bit word overlaps with the other thread's write. This is a classical "data race": Two threads accessing the same location at the same time, one access being a write, and one access not being atomic. Under C11, any data race is UB so we are out. LLVM is slightly more permissive so both read and read_val are probably allowed, but right now Rust declares that it uses the C11 model.
Also note that "vol" is a bad name (assuming you meant this as short-hand for "volatile") -- in C, atomicity has nothing to do with volatile! It is literally impossible to write correct concurrent code when using volatile and not atomics. Unfortunately, Java's volatile is about atomicity, but that's a very different volatile than the one in C.
And finally, write also introduces a data race between an atomic read-modify-update and a non-atomic write in the other thread, so it is UB in C11 as well. And this time it is also UB in LLVM: Another thread could be reading from one of the extra locations that write affects, so calling write would introduce a data race between our writing and the other thread's reading. LLVM specifies that in this case, the read returns undef. So, calling write can make safe accesses to the same location in other threads return undef, and subsequently trigger UB.
Do we have any examples of issues caused by these functions?
The frustrating part is, while I found multiple reasons to rule out your functions following the spec(s), there seems to be no good reason that these functions are ruled out! The read and read_vol concurrency issues are fixed by LLVM's model (which however has other problems, compared to C11), but write is illegal in LLVM just because read-write data races make the read return undef -- and in this case we know we are writing the same value that was already stored in these other bytes! Couldn't LLVM just say that in this special case (writing the value that's already there), the read must return that value? Probably yes, but this stuff is subtle enough that I would also not be surprised if that invalidates some obscure optimization.
Moreover, at least on non-embedded platforms the out-of-bounds accesses done by read are unlikely to cause actual trouble. I guess one could imagine a semantics which returns undef when reading an out-of-bounds byte that is guaranteed to sit on the same page as an in-bounds byte. But that would still leave write illegal, and that is a really tough one: write can only be allowed if the memory on these other locations is left absolutely unchanged. There could be arbitrary data sitting there from other allocations, parts of the stack frame, whatever. So somehow the formal model would have to let you read those other bytes, not allow you to gain anything by inspecting them, but also verify that you are not changing the bytes before writing them back with a CAS. I'm not aware of any model that would let you do that. But I thank you for bringing these nasty cases to my attention, it's always good to know that there is still plenty of stuff left to research in terms of memory models :)
Rust's aliasing rules
Finally, what you were probably wondering about is whether these functions violate any of the additional aliasing rules that Rust adds. The trouble is, we don't know -- these rules are still under development. However, all the proposals I have seen so far would indeed rule out your functions: When you hold an &mut u8 (say, one that points right next to the one that's passed to read/read_vol/write), the aliasing rules provide a guarantee that no access whatsoever will happen to that byte by anyone but you. So, your functions reading from memory that others could hold a &mut u8 to already makes them violate the aliasing rules.
However, the motivation for these rules is to conform with the C11 concurrency model and LLVM's rules for memory access. If LLVM declares something UB, we have to make it UB in Rust as well unless we are willing to change our codegen in a way that avoids the UB (and typically sacrifices performance). Moreover, given that Rust adopted the C11 concurrency model, the same holds true for that. So for these cases, the aliasing rules really don't have any choice but make these accesses illegal. We could revisit this once we have a more permissive memory model, but right now our hands are bound.
So I understand the simple answer to how this works is that local stuff happens in the stack and box stuff happens on the heap.
However, what happens when you have more complex behavior?
Specifically, lets talk about data that gets held in FFI for an indeterminate amount of time and then has to be resurrected later from a *mut c_void.
If you 'forget' a pointer, using std::mem::forget, or std::mem::transmute() a pointer to a *const pointer how durable is the result?
If (for example) this is done inside a function and then the function returns, does the stack get cleared and the memory become invalid?
Are 'Box' pointers which are heap allocated generally speaking valid until they get destroyed (eg. using read())?
I've been told on IRC that this is generally speaking the correct approach to take:
unsafe fn fp(v:Box<Foo>) -> *const c_void {
return transmute(foo);
}
However, looking at libcore::raw::Box, Box isn't remotely the same as a *const T; is that really ok?
If you 'forget' a pointer, using std::mem::forget, or std::mem::transmute() a pointer to a *const pointer how durable is the result?
If you cast a Box with transmute via the fp function, the pointer will remain valid as long as you like, since transmute is consuming the value and thus the destructor, which frees the memory, does not run. (At least, it is valid until you transmute it back to Box<...> to let the destructor run and free the memory.)
forget has no return value, it just discards the value without running the destructors.
Note, however, transmuting to *const c_void requires extra care, e.g. the Foo inside the Box<Foo> may contain thread-local data, or references, and thus may not be valid to pass between threads, or live forever. (Meaning the pointer itself lives forever/is usable however you like, but the data it points to may not.)
If you start casting & pointers, you need to be very very careful about lifetimes and not letting them escape from the scope of the data to which they point (e.g. you can't return a pointer to a local variable from a function).
If (for example) this is done inside a function and then the function returns, does the stack get cleared and the memory become invalid?
The stack doesn't get 'cleared' (i.e. it's not explicitly zeroed), but it is invalid to use any pointers into a stack frame that doesn't exist any more.
Are 'Box' pointers which are heap allocated generally speaking valid until they get destroyed (eg. using read())?
You'll need to be more specific, ptr::read cannot be called on a Box directly, and calling ptr::read on a *const c_void certainly won't do anything useful.
However, looking at libcore::raw::Box, Box isn't remotely the same as a *const T; is that really ok?
raw::Box is not at all the representation of the normal Box. raw::Box is the representation of the old # (now Gc) type. Box<T> is literally a wrapper around a *mut T.