I've been experimenting with some unsafe code, and just recently hit an interesting observation. The following code creates some values and stores them in a vector. It also creates raw pointers which point to the values. Later on, it attempts to read the values again by using the pointers. This results in a segmentation fault in the second loop:
fn main() {
let mut values = vec![];
let mut pointers = vec![];
for i in 0..1_000_000 {
values.push(i);
pointers.push(values.last_mut().unwrap() as *mut usize);
let value = unsafe { &mut **pointers.get_mut(i).unwrap() };
if value == &0 {
// This line just exists to make sure the compiler does not optimize the check away
println!("0");
}
}
println!("first loop finished");
// Same logic as in the lower part of the first loop, just in another loop
for i in 0..1_000_000 {
let value = unsafe { &mut **pointers.get_mut(i).unwrap() };
if value == &0 {
println!("0");
}
}
}
Output
0
first loop finished
Segmentation fault (core dumped)
What I find interesting about this is that
it results in a segmentation fault at all
the segmentation fault only happens in the second loop, the first one runs without a problem
the segmentation fault seems to occur at a random iteration in the second loop, not at the same one every time
I'd be interested in what Rust is doing under the hood for this situation to happen. I know that multiple mutable references are undefined behaviour, but what exactly happens here for a segmentation fault to occur?
Your code exhibits undefined behavior since it accesses memory after it has been freed.
A vector does not immediately have space for all 1,000,000 elements; it will grow exponentially when .push()-ing a value beyond its capacity. When this happens it will allocate a new region of memory, move existing elements over, and deallocate the previous region of memory. This means any pointer to the prior elements are now invalid since the data no longer exists at that address.
Your first loop does not encounter this error because values is not modified after getting a pointer to values[i] and printing it with pointers[i]. So you know pointer[i] is still valid, but all previous pointers 0..i may not be.
When or where you get a segmentaion fault is not defined, you may trigger one on the first invalid access, or may not trigger it at all! This is pretty much what "undefined behavior" means. You can fix your problem by initializing the values vector with a pre-set allocation size: Vec::with_capacity().
That illustrates perfectly the kind of unsafety Rust prevents us from.
In your first loop you memorise the address of each integer but at each iteration the vector grows and may decide to reallocate its storage somewhere else.
(you changed your code, now this is a hashmap, but the problem is the same, this container reallocates its storage when it needs to)
When that happens, the previous pointers are dangling!
This is not harmful until you actually use them to access memory (in the second loop).
Related
I'm writing some GPU code for macOS using the metal crate. In doing so, I allocate a Buffer object by calling:
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared)
This FFIs to Apple's Metal API, which allocates a region of memory that both the CPU and GPU can access and the Rust wrapper returns a Buffer object. I can then get a pointer to this region of memory by doing:
let data = buffer.contents() as *mut u32
In the colloquial sense, this region of memory is uninitialized. However, is this region of memory "uninitialized" in the Rust sense?
Is this sound?
let num_bytes = num_u32 * std::mem::size_of::<u32>();
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut u32;
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
for i in as_slice {
*i = 42u32;
}
Here I'm writing u32s to a region of memory returned to me by FFI. From the nomicon:
...The subtle aspect of this is that usually, when we use = to assign to a value that the Rust type checker considers to already be initialized (like x[i]), the old value stored on the left-hand side gets dropped. This would be a disaster. However, in this case, the type of the left-hand side is MaybeUninit<Box>, and dropping that does not do anything! See below for some more discussion of this drop issue.
None of the from_raw_parts rules are violated and u32 doesn't have a drop method.
Nonetheless, is this sound?
Would reading from the region (as u32s) before writing to it be sound (nonsense values aside)? The region of memory is valid and u32 is defined for all bit patterns.
Best practices
Now consider a type T that does have a drop method (and you've done all the bindgen and #[repr(C)] nonsense so that it can go across FFI boundaries).
In this situation, should one:
Initialize the buffer in Rust by scanning the region with pointers and calling .write()?
Do:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };
for i in as_slice {
*i = unsafe { MaybeUninit::new(T::new()).assume_init() };
}
Furthermore, after initializing the region, how does the Rust compiler remember this region is initialized on subsequent calls to .contents() later in the program?
Thought experiment
In some cases, the buffer is the output of a GPU kernel and I want to read the results. All the writes occurred in code outside of Rust's control and when I call .contents(), the pointer at the region of memory contains the correct uint32_t values. This thought experiment should relay my concern with this.
Suppose I call C's malloc, which returns an allocated buffer of uninitialized data. Does reading u32 values from this buffer (pointers are properly aligned and in bounds) as any type should fall squarely into undefined behavior.
However, suppose I instead call calloc, which zeros the buffer before returning it. If you don't like calloc, then suppose I have an FFI function that calls malloc, explicitly writes 0 uint32_t types in C, then returns this buffer to Rust. This buffer is initialized with valid u32 bit patterns.
From Rust's perspective, does malloc return "uninitialized" data while calloc returns initialized data?
If the cases are different, how would the Rust compiler know the difference between the two with respect to soundness?
There are multiple parameters to consider when you have an area of memory:
The size of it is the most obvious.
Its alignment is still somewhat obvious.
Whether or not it's initialized -- and notably, for types like bool whether it's initialized with valid values as not all bit-patterns are valid.
Whether it's concurrently read/written.
Focusing on the trickier aspects, the recommendation is:
If the memory is potentially uninitialized, use MaybeUninit.
If the memory is potentially concurrently read/written, use a synchronization method -- be it a Mutex or AtomicXXX or ....
And that's it. Doing so will always be sound, no need to look for "excuses" or "exceptions".
Hence, in your case:
let num_bytes = num_u32 * std::mem::size_of::<u32>();
assert!(num_bytes <= isize::MAX as usize);
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut MaybeUninit<u32>;
// Safety:
// - `data` is valid for reads and writes.
// - `data` points to `num_u32` elements.
// - Access to `data` is exclusive for the duration.
// - `num_u32 * size_of::<u32>() <= isize::MAX`.
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
for i in as_slice {
i.write(42); // Yes you can write `*i = MaybeUninit::new(42);` too,
// but why would you?
}
// OR with nightly:
as_slice.write_slice(some_slice_of_u32s);
This is very similar to this post on the users forum mentioned in the comment on your question. (here's some links from that post: 2 3)
The answers there aren't the most organized, but it seems like there's four main issues with uninitialized memory:
Rust assumes it is initialized
Rust assumes the memory is a valid bit pattern for the type
The OS may overwrite it
Security vulnerabilities from reading freed memory
For #1, this seems to me to not be an issue, since if there was another version of the FFI function that returned initialized memory instead of uninitialized memory, it would look identical to rust.
I think most people understand #2, and that's not an issue for u32.
#3 could be a problem, but since this is for a specific OS you may be able to ignore this if MacOS guarantees it does not do this.
#4 may or may not be undefined behavior, but it is highly undesirable. This is why you should treat it as uninitialized even if rust thinks it's a list of valid u32s. You don't want rust to think it's valid. Therefore, you should use MaybeUninit even for u32.
MaybeUninit
It's correct to cast the pointer to a slice of MaybeUninit. Your example isn't written correctly, though. assume_init returns T, and you can't assign that to an element from [MaybeUninit<T>]. Fixed:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };
for i in as_slice {
i.write(T::new());
}
Then, turning that slice of MaybeUninit into a slice of T:
let init_slice = unsafe { &mut *(as_slice as *mut [MaybeUninit<T>] as *mut [T]) };
Another issue is that &mut may not be correct to have at all here since you say it's shared between GPU and CPU. Rust depends on your rust code being the only thing that can access &mut data, so you need to ensure any &mut are gone while the GPU accesses the memory. If you want to interlace rust access and GPU access, you need to synchronize them somehow, and only store *mut while the GPU has access (or reacquire it from FFI).
Notes
The code is mainly taken from Initializing an array element-by-element in the MaybeUninit doc, plus the very useful Alternatives section from transmute. The conversion from &mut [MaybeUninit<T>] to &mut [T] is how slice_assume_init_mut is written as well. You don't need to transmute like in the other examples since it is behind a pointer. Another similar example is in the nomicon: Unchecked Uninitialized Memory. That one accesses the elements by index, but it seems like doing that, using * on each &mut MaybeUninit<T>, and calling write are all valid. I used write since it's shortest and is easy to understand. The nomicon also says that using ptr methods like write is also valid, which should be equivalent to using MaybeUninit::write.
There's some nightly [MaybeUninit] methods that will be helpful in the future, like slice_assume_init_mut
I'm playing with unsafe rust and trying to implement and I've found something I don't understand. I thought for sure I'd have a dangling pointer and that I'd get some kind of runtime error when trying to run this, but I don't.
fn main() {
let s1 = String::from("s1");
let s1_raw_ptr: *const String = &s1;
drop(s1);
unsafe {
let s = &*s1_raw_ptr;
println!("s recovered from raw pointer: {:?}", s);
}
}
This outputs:
s recovered from raw pointer: "s1"
I thought that when a value goes out of scope in Rust that it is immediately cleaned up. How is it that dereferencing a raw pointer to a now-dropped value is working?
When a String is dropped in Rust, ultimately what ends up getting called is Allocator::deallocate on the system allocator. After this, using the data is undefined behaviour, so anything could happen! But in practice what tends to happen if there aren't any funky compiler optimizations is that you just get whatever data is stored in memory there. If there aren't any new allocations at that place, then you just get whatever data was there before.
When a memory allocation is freed, nothing happens to the allocation. Clearing the data by setting it to all zero (or some other value) would be pointless, since any newly allocated memory always needs to be initialized by the user of that memory.
Take a look at the following simple example:
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
}
The official documentation of Rc says:
The type Rc<T> provides shared ownership of a value of type T,
allocated in the heap.
Theoretically it is clear. But, firstly my_struct is not immediately wrapped into Rc, and secondly MyStruct is a very simple type. I can see 2 scenarios here.
When my_struct is moved into the Rc the memory content is literally copied from the stack to the heap.
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
If number 1 is true, then there might be a hidden performance bottleneck as when reading the code one does not explicitly see memory being copied (I am assuming MyStruct being much more complex).
If number 2 is true, I wonder whether the compiler is always able to resolve such things. The provided example is very simple, but I can imagine that my_struct is much more complex and is mutated several times by different functions before being moved to the Rc.
Tl;dr It could be either scenario, but for the most part, you should just write code in the most obvious way and let the compiler worry about it.
According to the semantics of the abstract machine, that is, the theoretical model of computation that defines Rust's behavior, there is always a copy. In fact, there are at least two: my_struct is first created in the stack frame of main, but then has to be moved into the stack frame of Rc::new. Then Rc::new has to create an allocation and move my_struct a second time, from its own stack frame into the newly allocated memory*. Each of these moves is conceptually a copy.
However, this analysis isn't particularly useful for predicting the performance of code in practice, for three reasons:
Copies are actually pretty darn cheap. Moving my_struct from one place to another may actually be much cheaper, in the long run, than referencing it with a pointer. Copying a chunk of bytes is easy to optimize on modern processors; following a pointer to some arbitrary location is not. (Bear in mind also that the complexity of the structure is irrelevant because all moves are bytewise copies; for instance, moving any Vec is just copying three usizes regardless of the contents.)
If you haven't measured the performance and shown that excessive copying is a problem, you must not assume that it is without evidence: you may accidentally pessimize instead of optimizing your code. Measure first.
The semantics of the abstract machine is not the semantics of your real machine. The whole point of an optimizing compiler is to figure out the best way to transform one to the other. Under reasonable assumptions, it's very unlikely that the code here would result in 2 copies with optimizations turned on. But how the compiler eliminates one or both copies may be dependent on the rest of the code: not just on the snippet that contains them but on how the data is initialized and so forth. Real machine performance is complicated and generally requires analysis of more than just a few lines at a time. Again, this is the whole point of an optimizing compiler: it can do a much more comprehensive analysis, much faster than you or I can.
Even if the compiler leaves a copy "on the table", you shouldn't assume without evidence that removing the copy would make things better simply because it is a copy. Measure first.
It probably doesn't matter anyway, in this case. Requesting a new allocation from the heap is likely† more expensive than copying a bunch of bytes from one place to another, so fiddling around with 1 fast copy vs. no copies while ignoring a (plausible) big bottleneck is probably a waste of time. Don't try to optimize things before you've profiled your application or library to see where the most performance is being lost. Measure first.
See also
Questions about overflowing the stack by accidentally putting large data on it (to which the solution is usually to use Vec instead of an array):
How to allocate arrays on the heap in Rust 1.0?
Thread '<main>' has overflowed its stack when allocating a large array using Box
* Rc, although part of the standard library, is written in plain Rust code, which is how I analyze it here. Rc could theoretically be subject to guaranteed optimizations that aren't available to ordinary code, but that doesn't happen to be relevant to this case.
† Depending at least on the allocator and on whether new memory must be acquired from the OS or if a recently freed allocation can be re-used.
You can just test what happens:
Try to use my_struct after creating an Rc out of it. The value has been moved, so you can't use it.
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
// Add this line. Compilation error "borrow of moved value"
println!("my_struct.a = {}", my_struct.a);
}
Make your struct implement the Copy trait, and it will be automatically copied into the Rc::new function. Now the code above works, because the my_struct variable is not moved anywhere, just copied.
#[derive(Clone, Copy)]
struct MyStruct {
a: i8,
}
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
Take a look at Rc::new source code (removed the comment which is irrelevant).
struct RcBox<T: ?Sized> {
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
// ...
pub fn new(value: T) -> Rc<T> {
Self::from_inner(Box::into_raw_non_null(box RcBox {
strong: Cell::new(1),
weak: Cell::new(1),
value,
}))
}
It takes the value you pass to it, and creates a Box, so it's always put on the heap. This is plain Rust and I don't think it performs too many sophisticated optimizations, but that may change.
Note that "move" in Rust may also copy data implicitly, and this may depend on the current compiler's behavior. In that case, if you are concerned about performance you can try to make the struct as small as possible, and store some information on the heap. For example when a Vec<T> is moved, as far as I know it only copies the capacity, length and pointer to the heap, but the actual array which is on the heap is not copied element by element, so only a few bytes are copied when moving a vector (assuming the data is copied, because that's also subject to compiler optimizations in case copying is not actually needed).
The phrase "when scope exits the values get automatically popped from stack" is repeated many times, but the example I provide here disproves the statement:
fn main() {
let foo = foobar();
println!("The address in main {:p}", &foo);
}
fn foobar() -> Employee {
let emp = Employee {
company: String::from("xyz"),
name: String::from("somename"),
age: 50,
};
println!("The address inside func {:p}", &emp);
emp
}
#[derive(Debug)]
struct Employee {
name: String,
company: String,
age: u32,
}
The output is:
The address inside func 0x7fffc34011e8
The address in main 0x7fffc34011e8
This makes sense. When I use Box to create the struct the address differs as I expected.
If the function returns ownership (move) of the return value to the caller, then after the function execution the memory corresponds to that function gets popped which is not safe, then how is the struct created inside the function accessible even after the function exits?
The same things happens when returning an array. Where are these elements stored in memory, whether in the stack or on the heap?
Will the compiler do escape analysis at compile time and move the values to the heap like Go does?
I'm sure that Employee doesn't implement the Copy trait.
In many languages, variables are just a convenient means for
humans to name some values.
Even if on a logical point of view we can assume that there is one
specific storage for each specific variable, and we can reason about
this in terms of copy, move... it does not imply that these copies
and moves physically happen (and notably because of the optimizer).
Moreover, when reading various documents about Rust, we often find
the term binding instead of variable; this reinforces the idea
that we just refer to a value that exists somewhere.
It is exactly the same as writing let a=something(); then let b=a;,
the again let c=b;... we simply change our mind about the name
but no data is actually moved.
When it comes to debugging, the generated code is generally
sub-optimal by giving each variable its own storage
in order to inspect these variables in memory.
This can be misleading about the true nature of the optimised code.
Back to your example, you detected that Rust decided to perform
a kind of return-value-optimization (common C++ term nowadays)
because it knows that a temporary value must appear in the calling
context to provide the result, and this result comes from a local
variable inside the function.
So, instead of creating two different storages and copying or moving from
one to another, it is better to use the same storage: the local
variable is stored outside the function (where the result is
expected).
On the logical point of view it does not change anything but it
is much more efficient.
And when code inlining comes into play, no one can predict where
our variables/values/bindings are actually stored.
Some comments below state that this return-value-optimisation
can be counted on since it takes place in the Rust ABI.
(I was not aware of that, still a beginner ;^)
Does the variable s in print_struct refer to data on the heap or on the stack?
struct Structure {
x: f64,
y: u32,
/* Use a box, so that Structure isn't copy */
z: Box<char>,
}
fn main() {
let my_struct_boxed = Box::new(Structure {
x: 2.0,
y: 325,
z: Box::new('b'),
});
let my_struct_unboxed = *my_struct_boxed;
print_struct(my_struct_unboxed);
}
fn print_struct(s: Structure) {
println!("{} {} {}", s.x, s.y, s.z);
}
As I understand it, let my_struct_unboxed = *my_struct_boxed; transfers the ownership away from the box, to my_struct_unboxed, and then to s in the function print_struct.
What happens with the actual data? Initially it is copied from the stack onto the heap by calling Box::new(...), but is the data some how moved or copied back to the stack at some point? If so, how? And when is drop called? When s goes out of scope?
The Structure data in my_struct_boxed exists on the heap and the Structure data in my_struct_unboxed exists on the stack.
Therefore naïvely speaking (no compiler optimizations), a move or copy operation when dereferencing (*) your Box will always involve copying of the data. On the borrow-checker/static-analysis side, since the Copy trait is not implemented for Structure, this represents a transfer of ownership of the data to the my_struct_unboxed variable.
When you call print_struct, another copy would take place that would copy the bits in memory representing your Structure from the local variable to the function's arguments call-stack. Semantically, this again represents a transfer of ownership into the print_struct function.
Finally when print_struct goes out of scope, it drops the Structure which it owns.
Reference: std::marker::Copy
Excerpt
It's important to note that in these two examples, the only difference
is whether you are allowed to access [your variable] after the assignment. Under the
hood, both a copy and a move can result in bits being copied in
memory, although this is sometimes optimized away.
Note the last part "this is sometimes optimized away". This is why the earlier descriptions were simplified to assume no compiler optimizations i.e. naïve. In a lot of cases, the compiler will aggressively optimize and inline the code especially with higher values for the opt-level flag.
If so, how?
Both "copy" and "move" are semantically memcpy (though that may be optimised to something else, or even nothing whatsoever).
And when is drop called? When s goes out of scope?
Yes. When print_struct ends it cleans up its local scope, and drops s.