Rust - unsafely share between threads a mutable data without mutex - multithreading

How can old school multi-threading (no wrapping mutex) can be achieve in Rust? And why is it undefined behavior?
I have to build a highly concurrent physic simulation. I am supposed to do it in C, but I chose Rust (I really needed higher level features).
By using Rust, I should opt for a safe communication between threads, however, I must use a mutable buffer shared between the threads. (actually, I have to implement different techniques and benchmark them)
First approach
Use Arc<Data> to share non-mutable state.
Use transmute to promote & to &mut when needed.
It was straightforward but the compiler would prevent this from compiling even with unsafe block. It is because the compiler can apply optimizations knowing this data is supposedly non-mutable (maybe cached and never updated, not an expert about that).
This kind of optimizations can be stopped by Cell wrapper and others.
Second approach
Use Arc<UnsafeCell<Data>>.
Then data.get() to access data.
This does not compile either. The reason is that UnsafeCell is not Send. The solution is to use SyncUnsafeCell but it is unstable for the moment (1.66), and the program will be compile and put to production on a machine with only the stable version.
Third approach
Use Arc<Mutex<Data>>.
At the beginning of each threads:
Lock the mutex.
Keep a *mut by coercing a &mut.
Release the mutex.
Use the *mut when needed
I haven't tried this one yet, but even if it compiles, is it safe (not talking about data race) as it would be with SyncUnsafeCell ?
PS: The values concurrently mutated are just f32, there are absolutely no memory allocation or any complex operations happening concurrently. Worst case scenario, I have scrambled some f32.

Disclaimer: There are probably many ways to solve this, this is just one of them, based on the idea of #Caesar.
Two main points of this post:
You can use AtomicU32 to share f32 between threads without any performance penalty (given an architecture where u32 is already atomic)
You can use std::thread::scope to avoid the overhead of Arc.
use std::{
fmt::Debug,
ops::Range,
sync::atomic::{AtomicU32, Ordering},
};
struct AtomicF32(AtomicU32);
impl AtomicF32 {
pub fn new(val: f32) -> Self {
Self(AtomicU32::new(val.to_bits()))
}
pub fn load(&self, order: Ordering) -> f32 {
f32::from_bits(self.0.load(order))
}
pub fn store(&self, val: f32, order: Ordering) {
self.0.store(val.to_bits(), order)
}
}
impl Debug for AtomicF32 {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.load(Ordering::Relaxed).fmt(f)
}
}
fn perform_action(data: &Vec<AtomicF32>, range: Range<usize>) {
for value_raw in &data[range] {
let mut value = value_raw.load(Ordering::Relaxed);
value *= 2.5;
value_raw.store(value, Ordering::Relaxed);
}
}
fn main() {
let data = (1..=10)
.map(|v| AtomicF32::new(v as f32))
.collect::<Vec<_>>();
println!("Before: {:?}", data);
std::thread::scope(|s| {
s.spawn(|| perform_action(&data, 0..5));
s.spawn(|| perform_action(&data, 5..10));
});
println!("After: {:?}", data);
}
Before: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
After: [2.5, 5.0, 7.5, 10.0, 12.5, 15.0, 17.5, 20.0, 22.5, 25.0]
To demonstrate how leightweight this is, here is what this compiles to:
use std::{
sync::atomic::{AtomicU32, Ordering},
};
pub struct AtomicF32(AtomicU32);
impl AtomicF32 {
fn load(&self, order: Ordering) -> f32 {
f32::from_bits(self.0.load(order))
}
fn store(&self, val: f32, order: Ordering) {
self.0.store(val.to_bits(), order)
}
}
pub fn perform_action(value_raw: &AtomicF32) {
let mut value = value_raw.load(Ordering::Relaxed);
value *= 2.5;
value_raw.store(value, Ordering::Relaxed);
}
.LCPI0_0:
.long 0x40200000
example::perform_action:
movss xmm0, dword ptr [rdi]
mulss xmm0, dword ptr [rip + .LCPI0_0]
movss dword ptr [rdi], xmm0
ret
Note that while this contains zero undefined behaviour, it still is the programmer's responsibility to avoid read-modify-write race conditions.

Related

How to allocate buffer for C library call

The question is not new and there are two approaches, as far as I can tell:
Use Vec<T>, as suggested here
Manage the heap memory yourself, using std::alloc::alloc, as shown here
My question is whether these are indeed the two (good) alternatives.
Just to make this perfectly clear: Both approaches work. The question is whether there is another, maybe preferred way. The example below is introduced to identify where use of Vec is not good, and where other approaches therefore may be better.
Let's state the problem: Suppose there's a C library that requires some buffer to write into. This could be a compression library, for example. It is easiest to have Rust allocate the heap memory and manage it instead of allocating in C/C++ with malloc/new and then somehow passing ownership to Rust.
Let's go with the compression example. If the library allows incremental (streaming) compression, then I would need a buffer that keeps track of some offset.
Following approach 1 (that is: "abuse" Vec<T>) I would wrap Vec and use len and capacity for my purposes:
/// `Buffer` is basically a Vec
pub struct Buffer<T>(Vec<T>);
impl<T> Buffer<T> {
/// Create new buffer of length `len`
pub fn new(len: usize) -> Self {
Buffer(Vec::with_capacity(len))
}
/// Return length of `Buffer`
pub fn len(&self) -> usize {
return self.0.len()
}
/// Return total allocated size of `Buffer`
pub fn capacity(&self) -> usize {
return self.0.capacity()
}
/// Return remaining length of `Buffer`
pub fn remaining(&self) -> usize {
return self.0.capacity() - self.len()
}
/// Increment the offset
pub fn increment(&mut self, by:usize) {
unsafe { self.0.set_len(self.0.len()+by); }
}
/// Returns an unsafe mutable pointer to the buffer
pub fn as_mut_ptr(&mut self) -> *mut T {
unsafe { self.0.as_mut_ptr().add(self.0.len()) }
}
/// Returns ref to `Vec<T>` inside `Buffer`
pub fn as_vec(&self) -> &Vec<T> {
&self.0
}
}
The only interesting functions are increment and as_mut_ptr.
Buffer would be used like this
fn main() {
// allocate buffer for compressed data
let mut buf: Buffer<u8> = Buffer::new(1024);
loop {
// perform C function call
let compressed_len: usize = compress(some_input, buf.as_mut_ptr(), buf.remaining());
// increment
buf.increment(compressed_len);
}
// get Vec inside buf
let compressed_data = buf.as_vec();
}
Buffer<T> as shown here is clearly dangerous, for example if any reference type is used. Even T=bool may result in undefined behaviour. But the problems with uninitialised instance of T can be avoided by introducing a trait that limits the possible types T.
Also, if alignment matters, then Buffer<T> is not a good idea.
But otherwise, is such a Buffer<T> really the best way to do this?
There doesn't seem to be an out-of-the box solution. The bytes crate comes close, it offers a "container for storing and operating on contiguous slices of memory", but the interface is not flexible enough.
You absolutely can use a Vec's spare capacity as to write into manually. That is why .set_len() is available. However, compress() must know that the given pointer is pointing to uninitialized memory and thus is not allowed to read from it (unless written to first) and you must guarantee that the returned length is the number of bytes initialized. I think these rules are roughly the same between Rust and C or C++ in this regard.
Writing this in Rust would look like this:
pub struct Buffer<T>(Vec<T>);
impl<T> Buffer<T> {
pub fn new(len: usize) -> Self {
Buffer(Vec::with_capacity(len))
}
/// SAFETY: `by` must be less than or equal to `space_len()` and the bytes at
/// `space_ptr_mut()` to `space_ptr_mut() + by` must be initialized
pub unsafe fn increment(&mut self, by: usize) {
self.0.set_len(self.0.len() + by);
}
pub fn space_len(&self) -> usize {
self.0.capacity() - self.0.len()
}
pub fn space_ptr_mut(&mut self) -> *mut T {
unsafe { self.0.as_mut_ptr().add(self.0.len()) }
}
pub fn as_vec(&self) -> &Vec<T> {
&self.0
}
}
unsafe fn compress(_input: i32, ptr: *mut u8, len: usize) -> usize {
// right now just writes 5 bytes if there's space for them
let written = usize::min(5, len);
for i in 0..written {
ptr.add(i).write(0);
}
written
}
fn main() {
let mut buf: Buffer<u8> = Buffer::new(1024);
let some_input = 5i32;
unsafe {
let compressed_len: usize = compress(some_input, buf.space_ptr_mut(), buf.space_len());
buf.increment(compressed_len);
}
let compressed_data = buf.as_vec();
println!("{:?}", compressed_data);
}
You can see it on the playground. If you run it through Miri, you'll see it picks up no undefined behavior, but if you over-advertise how much you've written (say return written + 10) then it does produce an error that reading uninitialized memory was detected.
One of the reasons there isn't an out-of-the-box type for this is because Vec is that type:
fn main() {
let mut buf: Vec<u8> = Vec::with_capacity(1024);
let some_input = 5i32;
let spare_capacity = buf.spare_capacity_mut();
unsafe {
let compressed_len: usize = compress(
some_input,
spare_capacity.as_mut_ptr().cast(),
spare_capacity.len(),
);
buf.set_len(buf.len() + compressed_len);
}
println!("{:?}", buf);
}
Your Buffer type doesn't really add any convenience or safety and a third-party crate can't do so because it relies on the correctness of compress().
Is such a Buffer really the best way to do this?
Yes, this is pretty much the lowest cost ways to provide a buffer for writing. Looking at the generated release assembly, it is just one call to allocate and that's it. You can get tricky by using a special allocator or simply pre-allocate and reuse allocations if you're doing this many times (but be sure to measure since the built-in allocator will do this anyway, just more generally).

Are mutable static primitives actually `unsafe` if single-threaded?

I'm developing for a single-core embedded chip. In C & C++ it's common to statically-define mutable values that can be used globally. The Rust equivalent is roughly this:
static mut MY_VALUE: usize = 0;
pub fn set_value(val: usize) {
unsafe { MY_VALUE = val }
}
pub fn get_value() -> usize {
unsafe { MY_VALUE }
}
Now anywhere can call the free functions get_value and set_value.
I think that this should be entirely safe in single-threaded embedded Rust, but I've not been able to find a definitive answer. I'm only interested in types that don't require allocation or destruction (like the primitive in the example here).
The only gotcha I can see is with the compiler or processor reordering accesses in unexpected ways (which could be solves using the volatile access methods), but is that unsafe per se?
Edit:
The book suggests that this is safe so long as we can guarantee no multi-threaded data races (obviously the case here)
With mutable data that is globally accessible, it’s difficult to ensure there are no data races, which is why Rust considers mutable static variables to be unsafe.
The docs are phrased less definitively, suggesting that data races are only one way this can be unsafe but not expanding on other examples
accessing mutable statics can cause undefined behavior in a number of ways, for example due to data races in a multithreaded context
The nomicon suggests that this should be safe so long as you don't somehow dereference a bad pointer.
Be aware as there is no such thing as single-threaded code as long as interrupts are enabled. So even for microcontrollers, mutable statics are unsafe.
If you really can guarantee single-threaded access, your assumption is correct that accessing primitive types should be safe. That's why the Cell type exists, which allows mutability of primitive types with the exception that it is not Sync (meaning it explicitely prevents threaded access).
That said, to create a safe static variable, it needs to implement Sync for exactly the reason mentioned above; which Cell doesn't do, for obvious reasons.
To actually have a mutable global variable with a primitive type without using an unsafe block, I personally would use an Atomic. Atomics do not allocate and are available in the core library, meaning they work on microcontrollers.
use core::sync::atomic::{AtomicUsize, Ordering};
static MY_VALUE: AtomicUsize = AtomicUsize::new(0);
pub fn set_value(val: usize) {
MY_VALUE.store(val, Ordering::Relaxed)
}
pub fn get_value() -> usize {
MY_VALUE.load(Ordering::Relaxed)
}
fn main() {
println!("{}", get_value());
set_value(42);
println!("{}", get_value());
}
Atomics with Relaxed are zero-overhead on almost all architectures.
In this case it's not unsound, but you still should avoid it because it is too easy to misuse it in a way that is UB.
Instead, use a wrapper around UnsafeCell that is Sync:
pub struct SyncCell<T>(UnsafeCell<T>);
unsafe impl<T> Sync for SyncCell<T> {}
impl<T> SyncCell<T> {
pub const fn new(v: T) -> Self { Self(UnsafeCell::new(v)); }
pub unsafe fn set(&self, v: T) { *self.0.get() = v; }
}
impl<T: Copy> SyncCell<T> {
pub unsafe fn get(&self) -> T { *self.0.get() }
}
If you use nightly, you can use SyncUnsafeCell.
Mutable statics are unsafe in general because they circumvent the normal borrow checker rules that enforce either exactly 1 mutable borrow exists or any number of immutable borrows exist (including 0), which allows you to write code which causes undefined behavior. For instance, the following compiles and prints 2 2:
static mut COUNTER: i32 = 0;
fn main() {
unsafe {
let mut_ref1 = &mut COUNTER;
let mut_ref2 = &mut COUNTER;
*mut_ref1 += 1;
*mut_ref2 += 1;
println!("{mut_ref1} {mut_ref2}");
}
}
However we have two mutable references to the same location in memory existing concurrently, which is UB.
I believe the code that you posted there is safe, but I generally would not recommend using static mut. Use an atomic, SyncUnsafeCell/UnsafeCell, a wrapper around a Cell that implements Sync which is safe since your environment is single-threaded, or honestly just about anything else. static mut is wildly unsafe and its use is highly discouraged.
In order to sidestep the issue of exactly how mutable statics can be used safely in single-threaded code, another option is to use thread-local storage:
use std::cell::Cell;
thread_local! (static MY_VALUE: Cell<usize> = {
Cell::new(0)
});
pub fn set_value(val: usize) {
MY_VALUE.with(|cell| cell.set(val))
}
pub fn get_value() -> usize {
MY_VALUE.with(|cell| cell.get())
}

Is it undefined behavior to do runtime borrow management with the help of raw pointers in Rust?

As part of binding a C API to Rust, I have a mutable reference ph: &mut Ph, a struct struct EnsureValidContext<'a> { ph: &'a mut Ph }, and some methods:
impl Ph {
pub fn print(&mut self, s: &str) {
/*...*/
}
pub fn with_context<F, R>(&mut self, ctx: &Context, f: F) -> Result<R, InvalidContextError>
where
F: Fn(EnsureValidContext) -> R,
{
/*...*/
}
/* some others */
}
impl<'a> EnsureValidContext<'a> {
pub fn print(&mut self, s: &str) {
self.ph.print(s)
}
pub fn close(self) {}
/* some others */
}
I don't control these. I can only use these.
Now, the closure API is nice if you want the compiler to force you to think about performance (and the tradeoffs you have to make between performance and the behaviour you want. Context validation is expensive). However, let's say you just don't care about that and want it to just work.
I was thinking of making a wrapper that handles it for you:
enum ValidPh<'a> {
Ph(&'a mut Ph),
Valid(*mut Ph, EnsureValidContext<'a>),
Poisoned,
}
impl<'a> ValidPh<'a> {
pub fn print(&mut self) {
/* whatever the case, just call .print() on the inner object */
}
pub fn set_context(&mut self, ctx: &Context) {
/*...*/
}
pub fn close(&mut self) {
/*...*/
}
/* some others */
}
This would work by, whenever necessary, checking if we're a Ph or a Valid, and if we're a Ph we'd upgrade to a Valid by going:
fn upgrade(&mut self) {
if let Ph(_) = self { // don't call mem::replace unless we need to
if let Ph(ph) = mem::replace(self, Poisoned) {
let ptr = ph as *mut _;
let evc = ph.with_context(ph.get_context(), |evc| evc);
*self = Valid(ptr, evc);
}
}
}
Downgrading is different for each method, as it has to call the target method, but here's an example close:
pub fn close(&mut self) {
if let Valid(_, _) = self {
/* ok */
} else {
self.upgrade()
}
if let Valid(ptr, evc) = mem::replace(self, Invalid) {
evc.close(); // consume the evc, dropping the borrow.
// we can now use our original borrow, but since we don't have it anymore, bring it back using our trusty ptr
*self = unsafe { Ph(&mut *ptr) };
} else {
// this can only happen due to a bug in our code
unreachable!();
}
}
You get to use a ValidPh like:
/* given a &mut vph */
vph.print("hello world!");
if vph.set_context(ctx) {
vph.print("closing existing context");
vph.close();
}
vph.print("opening new context");
vph.open("context_name");
vph.print("printing in new context");
Without vph, you'd have to juggle &mut Ph and EnsureValidContext around on your own. While the Rust compiler makes this trivial (just follow the errors), you may want to let the library handle it automatically for you. Otherwise you might end up just calling the very expensive with_context for every operation, regardless of whether the operation can invalidate the context or not.
Note that this code is rough pseudocode. I haven't compiled or tested it yet.
One might argue I need an UnsafeCell or a RefCell or some other Cell. However, from reading this it appears UnsafeCell is only a lang item because of interior mutability — it's only necessary if you're mutating state through an &T, while in this case I have &mut T all the way.
However, my reading may be flawed. Does this code invoke UB?
(Full code of Ph and EnsureValidContext, including FFI bits, available here.)
Taking a step back, the guarantees upheld by Rust are:
&T is a reference to T which is potentially aliased,
&mut T is a reference to T which is guaranteed not to be aliased.
The crux of the question therefore is: what does guaranteed not to be aliased means?
Let's consider a safe Rust sample:
struct Foo(u32);
impl Foo {
fn foo(&mut self) { self.bar(); }
fn bar(&mut self) { *self.0 += 1; }
}
fn main() { Foo(0).foo(); }
If we take a peek at the stack when Foo::bar is being executed, we'll see at least two pointers to Foo: one in bar and one in foo, and there may be further copies on the stack or in other registers.
So, clearly, there are aliases in existence. How come! It's guaranteed NOT to be aliased!
Take a deep breath: how many of those aliases can you access at the time?
Only 1. The guarantee of no aliasing is not spatial but temporal.
I would think, therefore, that at any point in time, if a &mut T is accessible, then no other reference to this instance must be accessible.
Having a raw pointer (*mut T) is perfectly fine, it requires unsafe to access; however forming a second reference may or may not be safe, even without using it, so I would avoid it.
Rust's memory model is not rigorously defined yet, so it's hard to say for sure, but I believe it's not undefined behavior to:
carry a *mut Ph around while a &'a mut Ph is also reachable from another path, so long as you don't dereference the *mut Ph, even just for reading, and don't convert it to a &Ph or &mut Ph, because mutable references grant exclusive access to the pointee.
cast the *mut Ph back to a &'a mut Ph once the other &'a mut Ph falls out of scope.

How do I shuffle a VecDeque?

I can shuffle a regular vector quite simply like this:
extern crate rand;
use rand::Rng;
fn shuffle(coll: &mut Vec<i32>) {
rand::thread_rng().shuffle(coll);
}
The problem is, my code now requires the use of a std::collections::VecDeque instead, which causes this code to not compile.
What's the simplest way of getting around this?
As of Rust 1.48, VecDeque supports the make_contiguous() method. That method doesn't allocate and has complexity of O(n), like shuffling itself. Therefore you can shuffle a VecDeque by calling make_contiguous() and then shuffling the returned slice:
use rand::prelude::*;
use std::collections::VecDeque;
pub fn shuffle<T>(v: &mut VecDeque<T>, rng: &mut impl Rng) {
v.make_contiguous().shuffle(rng);
}
Playground
Historical answer follows below.
Unfortunately, the rand::Rng::shuffle method is defined to shuffle slices. Due to its own complexity constraints a VecDeque cannot store its elements in a slice, so shuffle can never be directly invoked on a VecDeque.
The real requirement of the values argument to shuffle algorithm are finite sequence length, O(1) element access, and the ability to swap elements, all of which VecDeque fulfills. It would be nice if there were a trait that incorporates these, so that values could be generic on that, but there isn't one.
With the current library, you have two options:
Use Vec::from(deque) to copy the VecDeque into a temporary Vec, shuffle the vector, and return the contents back to VecDeque. The complexity of the operation will remain O(n), but it will require a potentially large and costly heap allocation of the temporary vector.
Implement the shuffle on VecDeque yourself. The Fisher-Yates shuffle used by rand::Rng is well understood and easy to implement. While in theory the standard library could switch to a different shuffle algorithm, that is not likely to happen in practice.
A generic form of the second option, using a trait to express the len-and-swap requirement, and taking the code of rand::Rng::shuffle, could look like this:
use std::collections::VecDeque;
// Real requirement for shuffle
trait LenAndSwap {
fn len(&self) -> usize;
fn swap(&mut self, i: usize, j: usize);
}
// A copy of an earlier version of rand::Rng::shuffle, with the signature
// modified to accept any type that implements LenAndSwap
fn shuffle(values: &mut impl LenAndSwap, rng: &mut impl rand::Rng) {
let mut i = values.len();
while i >= 2 {
// invariant: elements with index >= i have been locked in place.
i -= 1;
// lock element i in place.
values.swap(i, rng.gen_range(0..=i));
}
}
// VecDeque trivially fulfills the LenAndSwap requirement, but
// we have to spell it out.
impl<T> LenAndSwap for VecDeque<T> {
fn len(&self) -> usize {
self.len()
}
fn swap(&mut self, i: usize, j: usize) {
self.swap(i, j)
}
}
fn main() {
let mut v: VecDeque<u64> = [1, 2, 3, 4].into_iter().collect();
shuffle(&mut v, &mut rand::thread_rng());
println!("{:?}", v);
}
You can use make_contiguous (documentation) to create a mutable slice that you can then shuffle:
use rand::prelude::*;
use std::collections::VecDeque;
fn main() {
let mut deque = VecDeque::new();
for p in 0..10 {
deque.push_back(p);
}
deque.make_contiguous().shuffle(&mut rand::thread_rng());
println!("Random deque: {:?}", deque)
}
Playground Link if you want to try it out online.
Shuffle the components of the VecDeque separately, starting with VecDeque.html::as_mut_slices:
use rand::seq::SliceRandom; // 0.6.5;
use std::collections::VecDeque;
fn shuffle(coll: &mut VecDeque<i32>) {
let mut rng = rand::thread_rng();
let (a, b) = coll.as_mut_slices();
a.shuffle(&mut rng);
b.shuffle(&mut rng);
}
As Lukas Kalbertodt points out, this solution never swaps elements between the two slices so a certain amount of randomization will not happen. Depending on your needs of randomization, this may be unnoticeable or a deal breaker.

Is there any safe way to ensure an arbitrary drop happens before some expensive computation?

I've found that mem::drop does not necessary run near where it gets called, which likely results in Mutex or RwLock guards being held during expensive computations. How can I control when drop gets called?
As a simple example, I've made the following test for a zeroing drop of cryptographic material work by using unsafe { ::std::intrinsics::drop_in_place(&mut s); } instead of simply ::std::mem::drop(s).
#[derive(Debug, Default)]
pub struct Secret<T>(pub T);
impl<T> Drop for Secret<T> {
fn drop(&mut self) {
unsafe { ::std::intrinsics::volatile_set_memory::<Secret<T>>(self, 0, 1); }
}
}
#[derive(Debug, Default)]
pub struct AnotherSecret(pub [u8; 32]);
impl Drop for AnotherSecret {
fn drop(&mut self) {
unsafe { ::std::ptr::write_volatile::<$t>(self, AnotherSecret([0u8; 32])); }
assert_eq!(self.0,[0u8; 32]);
}
}
#[cfg(test)]
mod tests {
macro_rules! zeroing_drop_test {
($n:path) => {
let p : *const $n;
{
let mut s = $n([3u8; 32]); p = &s;
unsafe { ::std::intrinsics::drop_in_place(&mut s); }
}
unsafe { assert_eq!((*p).0,[0u8; 32]); }
}
}
#[test]
fn zeroing_drops() {
zeroing_drop_test!(super::Secret<[u8; 32]>);
zeroing_drop_test!(super::AnotherSecret);
}
}
This test fails if I use ::std::mem::drop(s) or even
#[inline(never)]
pub fn drop_now<T>(_x: T) { }
It's obviously fine to use drop_in_place for a test that a buffer gets zeroed, but I'd worry that calling drop_in_place on a Mutex or RwLock guard might result in use after free.
These two guards could maybe be handled with this approach :
#[inline(never)]
pub fn drop_now<T>(t: mut T) {
unsafe { ::std::intrinsics::drop_in_place(&mut t); }
unsafe { ::std::intrinsics::volatile_set_memory::<Secret<T>>(&t, 0, 1); }
}
Answer from https://github.com/rust-lang/rfcs/issues/1850 :
In debug mode, any call to ::std::mem::drop(s) physically moves s on the stack, so p points to an old copy that does not get erased. And unsafe { ::std::intrinsics::drop_in_place(&mut s); } works because it does not move s.
In general, there is no good way to either prevent LLVM from moving values around on the stack, or else to zero after moving them, so you must never put cryptographically sensitive data on the stack. Instead you must Box any sensitive data, like say
#[derive(Debug, Default)]
pub struct AnotherSecret(Box<[u8; 32]>);
impl Drop for AnotherSecret {
fn drop(&mut self) {
*self.0 = [0u8; 32];
}
}
There should be no problem with Mutex or RwLock because they can safely leave residue on the stack when they are droped.
Yes: side effects.
Optimizers in general, and LLVM in particular, operate under the as-if rule: you build a program which has a specific observable behavior, and the optimizer is given free reign to produce whatever binary it wants as long as it has the very same observable behavior.
Note that the burden of proof is on the compiler. That is, when calling an opaque function (defined in another library, for example) the compiler has to assume it may have side effects. Furthermore, side effects cannot be re-ordered, as this could change the observable behavior.
In the case of Mutex, for example, acquiring and releasing the Mutex is generally opaque for the compiler (it requires an OS call), so it is seen as a side effect. I would expect compilers not to fiddle with those.
On the other hand, your Secret is a tricky case: most of the time there is no side-effect in dropping the secret (zeroing out to-be-released memory is a dead-write, to be optimized out), which is why you need to go out of your way to ensure it occurs... by convincing the compiler that there are side effects using a volatile write.

Resources