Ensuring value lives for its entire scope - rust

I have a type (specifically CFData from core-foundation), whose memory is managed by C APIs and that I need to pass and receive from C functions as a *c_void. For simplicity, consider the following struct:
struct Data {
ptr: *mut ffi::c_void,
}
impl Data {
pub fn new() -> Self {
// allocate memory via an unsafe C-API
Self {
ptr: std::ptr::null(), // Just so this compiles.
}
}
pub fn to_raw(&self) -> *const ffi::c_void {
self.ptr
}
}
impl Drop for Data {
fn drop(&mut self) {
unsafe {
// Free memory via a C-API
}
}
}
Its interface is safe, including to_raw(), since it only returns a raw pointer. It doesn't dereference it. And the caller doesn't dereference it. It's just used in a callback.
pub extern "C" fn called_from_C_ok(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// Do things to compute Data.
// This is actually async code, which is why there's a completion handler.
on_complete(data.to_raw()); // data survives through the function call
}
This is fine. Data is safe to manipulate, and (I strongly believe) Rust promises that data will live until the end of the on_complete function call.
On the other hand, this is not ok:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
In my code, I made this mistake and it started crashing. It's easy to see why and it's easy to fix. But it's subtle, and it's easy for a future developer (me) to modify the ok version into the broken version without realizing the problem (and it may not always crash).
What I'd like to do is to ensure data lives as long as ptr. In Swift, I'd do this with:
withExtendedLifetime(&data) { data in
// ...data cannot be dropped until this closure ends...
}
Is there a similar construct in Rust that explicitly marks the minimum lifetime for a variable to a scope (that the optimizer may not reorder), even if it's not directly accessed? (I'm sure it's trivial to build a custom with_extended_lifetime in Rust, but I'm looking for a more standard solution so that it will be obvious to other developers what's going on).
Playground
I do believe the following "works" but I'm not sure how flexible it is, or if it's just replacing a more standard solution:
fn with_extended_lifetime<T, U, F>(value: &T, f: F) -> U
where
F: Fn(&T) -> U,
{
f(value)
}
with_extended_lifetime(&data, |data| {
let ptr = data.to_raw();
on_complete(ptr)
});

The optimizer is not allowed to change when a value is dropped. If you assign a value to a variable (and that value is not then moved elsewhere or overwritten by assignment), it will always be dropped at the end of the block, not earlier.
You say that this code is incorrect:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
but in fact data may not be dropped at that point, and this code is sound. What you may be confusing this with is the mistake of not assigning the value to a variable:
let ptr = Data::new().to_raw();
on_complete(ptr);
In this case, the pointer is dangling, because the result of Data::new() is stored in a temporary variable within the statement, which is dropped at the end of the statement, not a local variable, which is dropped at the end of the block.
If you want to adopt a programming style which makes explicit when values are dropped, the usual pattern is to use the standard drop() function to mark the exact time of drop:
let data = Data::new();
...
on_complete(data.to_raw());
drop(data); // We have stopped using the raw pointer now
(Note that drop() is not magic: it is simply a function which takes one argument and does nothing with it other than dropping. Its presence in the standard library is to give a name to this pattern, not to provide special functionality.)
However, if you want to, there isn't anything wrong with using your with_extended_lifetime (other than nonstandard style and arguably a misleading name) if you want to make the code structure even more strongly indicate the scope of the value. One nitpick: the function parameter should be FnOnce, not Fn, for maximum generality (this allows it to be passed functions that can't be called more than once).

Other than explicitly dropping as the other answer mentions, there is another way to help prevent these types of accidental drops: use a wrapper around a raw pointer that has lifetime information.
use std::marker::PhantomData;
#[repr(transparent)]
struct PtrWithLifetime<'a>{
ptr: *mut ffi::c_void,
_data: PhantomData<&'a ffi::c_void>,
}
impl Data {
fn to_raw(&self) -> PtrWithLife<'_>{
PtrWithLifetime{
ptr: self.ptr,
_data: PhantomData,
}
}
}
The #[repr(transparent)] guarantees that PtrWithLife is stored in memory the same as *const ffi::c_void is, so you can adjust the declaration of on_complete to
fn called_from_c(on_complete: extern "C" fn(PtrWithLifetime<'_>)){
//stuff
}
without causing any major inconvenience to any downstream users, especially since the ffi bindings can be adjusted in a similar fashion.

Related

Transferring ownership between enum variants [duplicate]

I'm tring to replace a value in a mutable borrow; moving part of it into the new value:
enum Foo<T> {
Bar(T),
Baz(T),
}
impl<T> Foo<T> {
fn switch(&mut self) {
*self = match self {
&mut Foo::Bar(val) => Foo::Baz(val),
&mut Foo::Baz(val) => Foo::Bar(val),
}
}
}
The code above doesn't work, and understandibly so, moving the value out of self breaks the integrity of it. But since that value is dropped immediately afterwards, I (if not the compiler) could guarantee it's safety.
Is there some way to achieve this? I feel like this is a job for unsafe code, but I'm not sure how that would work.
mem:uninitialized has been deprecated since Rust 1.39, replaced by MaybeUninit.
However, uninitialized data is not required here. Instead, you can use ptr::read to get the data referred to by self.
At this point, tmp has ownership of the data in the enum, but if we were to drop self, that data would attempt to be read by the destructor, causing memory unsafety.
We then perform our transformation and put the value back, restoring the safety of the type.
use std::ptr;
enum Foo<T> {
Bar(T),
Baz(T),
}
impl<T> Foo<T> {
fn switch(&mut self) {
// I copied this code from Stack Overflow without reading
// the surrounding text that explains why this is safe.
unsafe {
let tmp = ptr::read(self);
// Must not panic before we get to `ptr::write`
let new = match tmp {
Foo::Bar(val) => Foo::Baz(val),
Foo::Baz(val) => Foo::Bar(val),
};
ptr::write(self, new);
}
}
}
More advanced versions of this code would prevent a panic from bubbling out of this code and instead cause the program to abort.
See also:
replace_with, a crate that wraps this logic up.
take_mut, a crate that wraps this logic up.
Change enum variant while moving the field to the new variant
How can I swap in a new value for a field in a mutable reference to a structure?
The code above doesn't work, and understandibly so, moving the value
out of self breaks the integrity of it.
This is not exactly what happens here. For example, same thing with self would work nicely:
impl<T> Foo<T> {
fn switch(self) {
self = match self {
Foo::Bar(val) => Foo::Baz(val),
Foo::Baz(val) => Foo::Bar(val),
}
}
}
Rust is absolutely fine with partial and total moves. The problem here is that you do not own the value you're trying to move - you only have a mutable borrowed reference. You cannot move out of any reference, including mutable ones.
This is in fact one of the frequently requested features - a special kind of reference which would allow moving out of it. It would allow several kinds of useful patterns. You can find more here and here.
In the meantime for some cases you can use std::mem::replace and std::mem::swap. These functions allow you to "take" a value out of mutable reference, provided you give something in exchange.
Okay, I figured out how to do it with a bit of unsafeness and std::mem.
I replace self with an uninitialized temporary value. Since I now "own" what used to be self, I can safely move the value out of it and replace it:
use std::mem;
enum Foo<T> {
Bar(T),
Baz(T),
}
impl<T> Foo<T> {
fn switch(&mut self) {
// This is safe since we will overwrite it without ever reading it.
let tmp = mem::replace(self, unsafe { mem::uninitialized() });
// We absolutely must **never** panic while the uninitialized value is around!
let new = match tmp {
Foo::Bar(val) => Foo::Baz(val),
Foo::Baz(val) => Foo::Bar(val),
};
let uninitialized = mem::replace(self, new);
mem::forget(uninitialized);
}
}
fn main() {}

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.
No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...
The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.

Why is the destructor not called for Box::from_raw()?

I am passing a raw pointer to two different closures and converting the raw pointer to a reference using Box::from_raw() and the program is working fine.
However, after converting the raw pointer to reference, the destructor should be called automatically as the documentation says:
This function is unsafe because improper use may lead to memory problems. For example, a double-free may occur if the function is called twice on the same raw pointer.
However, I am able to access the reference to ABC even after calling Box::from_raw() on raw pointer twice and it's working fine.
struct ABC {}
impl ABC {
pub fn new() -> ABC {
ABC {}
}
pub fn print(&self, x: u32) {
println!("Inside handle {}", x);
}
}
fn main() {
let obj = ABC::new();
let const_obj: *const ABC = &obj;
let handle = |x| {
let abc = unsafe { Box::from_raw(const_obj as *mut ABC) };
abc.print(x);
};
handle(1);
let handle1 = |x| {
let abc = unsafe { Box::from_raw(const_obj as *mut ABC) };
abc.print(x);
};
handle1(2);
}
Rust Playground
Why is the destructor is not called for ABC after handle and before handle1 as the description for Box::from_raw() function specifies:
Specifically, the Box destructor will call the destructor of T and free the allocated memory.
Why is Box::from_raw() working multiple times on a raw pointer?
TL;DR you are doing it wrong.
converting the raw pointer to a reference
No, you are converting it into a Box, not a reference.
the program is working fine
It is not. You are merely being "lucky" that memory unsafety and undefined behavior isn't triggering a crash. This is likely because your type has no actual data.
to reference, the destructor should be called automatically
No, when references go out of scope, the destructor is not executed.
Why is the destructor is not called
It is, which is one of multiple reasons your code is completely and totally broken and unsafe.
Add code to be run during destruction:
impl Drop for ABC {
fn drop(&mut self) {
println!("drop")
}
}
And you will see it is called 3 times:
Inside handle 1
drop
Inside handle 2
drop
drop
I am able to access the reference to ABC
Yes, which is unsafe. You are breaking the rules that you are supposed to be upholding when writing unsafe code. You've taken a raw pointer, done something to make it invalid, and are then accessing the original, now invalid variable.
The documentation also states:
the only valid pointer to pass to this function is the one taken from another Box via the Box::into_raw function.
You are ignoring this aspect as well.

What are the use cases of the newly proposed Pin type?

There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}

How to make a struct where one of the fields refers to another field

I have the following problem: I have a have a data structure that is parsed from a buffer and contains some references into this buffer, so the parsing function looks something like
fn parse_bar<'a>(buf: &'a [u8]) -> Bar<'a>
So far, so good. However, to avoid certain lifetime issues I'd like to put the data structure and the underlying buffer into a struct as follows:
struct BarWithBuf<'a> {bar: Bar<'a>, buf: Box<[u8]>}
// not even sure if these lifetime annotations here make sense,
// but it won't compile unless I add some lifetime to Bar
However, now I don't know how to actually construct a BarWithBuf value.
fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
let my_bar = parse_bar(&*buf);
BarWithBuf {buf: buf, bar: my_bar}
}
doesn't work, since buf is moved in the construction of the BarWithBuf value, but we borrowed it for parsing.
I feel like it should be possible to do something along the lines of
fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
let mut bwb = BarWithBuf {buf: buf};
bwb.bar = parse_bar(&*bwb.buf);
bwb
}
to avoid moving the buffer after parsing the Bar, but I can't do that because the whole BarWithBuf struct has to be initalised in one go.
Now I suspect that I could use unsafe code to partially construct the struct, but I'd rather not do that.
What would be the best way to solve this problem? Do I need unsafe code? If I do, would it be safe do to this here? Or am I completely on the wrong track here and there is a better way to tie a data structure and its underlying buffer together?
I think you're right in that it's not possible to do this without unsafe code. I would consider the following two options:
Change the reference in Bar to an index. The contents of the box won't be protected by a borrow, so the index might become invalid if you're not careful. However, an index might convey the meaning of the reference in a clearer way.
Move Box<[u8]> into Bar, and add a function buf() -> &[u8] to the implementation of Bar; instead of references, store indices in Bar. Now Bar is the owner of the buffer, so it can control its modification and keep the indices valid (thereby avoiding the problem of option #1).
As per DK's suggestion below, store indices in BarWithBuf (or in a helper struct BarInternal) and add a function fn bar(&self) -> Bar to the implementation of BarWithBuf, which constructs a Bar on-the-fly.
Which of these options is the most appropriate one depends on the actual problem context. I agree that some form of "member-by-member construction" of structs would be immensely helpful in Rust.
Here's an approach that will work through a little bit of unsafe code. This approach requires that you are okay with putting the referred-to thing (here, your [u8]) on the heap, so it won't work for direct reference of a sibling field.
Let's start with a toy Bar<'a> implementation:
struct Bar<'a> {
refs: Vec<&'a [u8]>,
}
impl<'a> Bar<'a> {
pub fn parse(src: &'a [u8]) -> Self {
// placeholder for actually parsing goes here
Self { refs: vec![src] }
}
}
We'll make BarWithBuf that uses a Bar<'static>, as 'static is the only lifetime with an an accessible name. The buffer we store things in can be anything that doesn't move the target data around on us. I'm going to go with a Vec<u8>, but Box, Pin, whatever will work fine.
struct BarWithBuf {
buf: Vec<u8>,
bar: Bar<'static>,
}
The implementation requires a tiny bit of unsafe code.
impl BarWithBuf {
pub fn new(buf: Vec<u8>) -> Self {
// The `&'static [u8]` is inferred, but writing it here for demo
let buf_slice: &'static [u8] = unsafe {
// Going through a pointer is a "good" way to get around lifetime checks
std::slice::from_raw_parts(&buf[0], buf.len())
};
let bar = Bar::parse(buf_slice);
Self { buf, bar }
}
/// Access to Bar should always come through this function.
pub fn bar(&self) -> &Bar {
&self.bar
}
}
The BarWithBuf::bar is an important function to re-associate the proper lifetimes to the references. Rust's lifetime elision rules make the function equivalent to pub fn bar<'a>(&'a self) -> &'a Bar<'a>, which turns out to be exactly what we want. The lifetime of the slices in BarWithBuf::bar::refs are tied to the lifetime of BarWithBuf.
WARNING: You have to be very careful with your implementation here. You cannot make #[derive(Clone)] for BarWithBuf, since the default clone implementation will clone buf, but the elements of bar.refs will still point to the original. It is only one line of unsafe code, but the safety is still off in the "safe" bits.
For larger bits of self-referencing structures, there's the ouroboros crate, which wraps up a lot of unsafe bits for you. The techniques are similar to the one I described above, but they live behind macros, which is a more pleasant experience if you find yourself making a number of self-references.

Resources