Forcing the order in which struct fields are dropped

Forcing the order in which struct fields are dropped - rust

I'm implementing an object that owns several resources created from C libraries through FFI. In order to clean up what's already been done if the constructor panics, I'm wrapping each resource in its own struct and implementing Drop for them. However, when it comes to dropping the object itself, I cannot guarantee that resources will be dropped in a safe order because Rust doesn't define the order that a struct's fields are dropped.
Normally, you would solve this by making it so the object doesn't own the resources but rather borrows them (so that the resources may borrow each other). In effect, this pushes the problem up to the calling code, where the drop order is well defined and enforced with the semantics of borrowing. But this is inappropriate for my use case and in general a bit of a cop-out.
What's infuriating is that this would be incredibly easy if drop took self instead of &mut self for some reason. Then I could just call std::mem::drop in my desired order.
Is there any way to do this? If not, is there any way to clean up in the event of a constructor panic without manually catching and repanicking?

You can specify drop order of your struct fields in two ways:
Implicitly
I wrote RFC 1857 specifying drop order and it was merged 2017/07/03! According to the RFC, struct fields are dropped in the same order as they are declared.
You can check this by running the example below
struct PrintDrop(&'static str);
impl Drop for PrintDrop {
fn drop(&mut self) {
println!("Dropping {}", self.0)
}
}
struct Foo {
x: PrintDrop,
y: PrintDrop,
z: PrintDrop,
}
fn main() {
let foo = Foo {
x: PrintDrop("x"),
y: PrintDrop("y"),
z: PrintDrop("z"),
};
}
The output should be:
Dropping x
Dropping y
Dropping z
Explicitly
RFC 1860 introduces the ManuallyDrop type, which wraps another type and disables its destructor. The idea is that you can manually drop the object by calling a special function (ManuallyDrop::drop). This function is unsafe, since memory is left uninitialized after dropping the object.
You can use ManuallyDrop to explicitly specify the drop order of your fields in the destructor of your type:
#![feature(manually_drop)]
use std::mem::ManuallyDrop;
struct Foo {
x: ManuallyDrop<String>,
y: ManuallyDrop<String>
}
impl Drop for Foo {
fn drop(&mut self) {
// Drop in reverse order!
unsafe {
ManuallyDrop::drop(&mut self.y);
ManuallyDrop::drop(&mut self.x);
}
}
}
fn main() {
Foo {
x: ManuallyDrop::new("x".into()),
y: ManuallyDrop::new("y".into())
};
}
If you need this behavior without being able to use either of the newer methods, keep on reading...
The issue with drop
The drop method cannot take its parameter by value, since the parameter would be dropped again at the end of the scope. This would result in infinite recursion for all destructors of the language.
A possible solution/workaround
A pattern that I have seen in some codebases is to wrap the values that are being dropped in an Option<T>. Then, in the destructor, you can replace each option with None and drop the resulting value in the right order.
For instance, in the scoped-threadpool crate, the Pool object contains threads and a sender that will schedule new work. In order to join the threads correctly upon dropping, the sender should be dropped first and the threads second.
pub struct Pool {
threads: Vec<ThreadData>,
job_sender: Option<Sender<Message>>
}
impl Drop for Pool {
fn drop(&mut self) {
// By setting job_sender to `None`, the job_sender is dropped first.
self.job_sender = None;
}
}
A note on ergonomics
Of course, doing things this way is more of a workaround than a proper solution. Also, if the optimizer cannot prove that the option will always be Some, you now have an extra branch for each access to your struct field.
Fortunately, nothing prevents a future version of Rust to implement a feature that allows specifying drop order. It would probably require an RFC, but seems certainly doable. There is an ongoing discussion on the issue tracker about specifying drop order for the language, though it has been inactive last months.
A note on safety
If destroying your structs in the wrong order is unsafe, you should probably consider making their constructors unsafe and document this fact (in case you haven't done that already). Otherwise it would be possible to trigger unsafe behavior just by creating the structs and letting them fall out of scope.

Related

How Can I Hash By A Raw Pointer?

I want to create a function that provides a two step write and commit, like so:
// Omitting locking for brevity
struct States {
commited_state: u64,
// By reference is just a placeholder - I don't know how to do this
pending_states: HashSet<i64>
}
impl States {
fn read_dirty(&self) -> {
// Sum committed state and all non committed states
self.commited_state +
pending_states.into_iter().fold(sum_all_values).unwrap_or(0)
}
fn read_committed(&self) {
self.commited_state
}
}
let state_container = States::default();
async fn update_state(state_container: States, new_state: i64) -> Future {
// This is just pseudo code missing locking and such
// I'd like to add a reference to new_state
state_container.pending_states.insert(
new_state
)
async move {
// I would like to defer the commit
// I add the state to the commited state
state_container.commited_state =+ new_state;
// Then remove it *by reference* from the pending states
state_container.remove(new_state)
}
}
I'd like to be in a situation where I can call it like so
let commit_handler = update_state(state_container, 3).await;
// Do some external transactional stuff
third_party_transactional_service(...)?
// Commit if the above line does not error
commit_handler.await;
The problem I have is that HashMaps and HashSets, hash values based of their value and not their actual reference - so I can't remove them by reference.
I appreciate this a bit of a long question, but I'm just trying to give a bit more context as to what I'm trying to do. I know that in a typical database you'd generally have an atomic counter to generate the transaction ID, but that feels a bit overkill when the pointer reference would be enough.
However, I don't want to get the pointer value using unsafe, because it just seems a bit off to do something relatively simple.

Values in rust don't have an identity like they do in other languages. You need to ascribe them an identity somehow. You've hit on two ways to do this in your question: an ID contained within the value, or the address of the value as a pointer.
Option 1: An ID contained in the value
It's trivial to have a usize ID with a static AtomicUsize (atomics have interior mutability).
use std::sync::atomic::{AtomicUsize, Ordering};
// No impl of clone/copy as we want these IDs to be unique.
#[derive(Debug, Hash, PartialEq, Eq)]
#[repr(transparent)]
pub struct OpaqueIdentifier(usize);
impl OpaqueIdentifier {
pub fn new() -> Self {
static COUNTER: AtomicUsize = AtomicUsize::new(0);
Self(COUNTER.fetch_add(1, Ordering::Relaxed))
}
pub fn id(&self) -> usize {
self.0
}
}
Now your map key becomes usize, and you're done.
Having this be a separate type that doesn't implement Copy or Clone allows you to have a concept of an "owned unique ID" and then every type with one of these IDs is forced not to be Copy, and a Clone impl would require obtaining a new ID.
(You can use a different integer type than usize. I chose it semi-arbitrarily.)
Option 2: A pointer to the value
This is more challenging in Rust since values in Rust are movable by default. In order for this approach to be viable, you have to remove this capability by pinning.
To make this work, both of the following must be true:
You pin the value you're using to provide identity, and
The pinned value is !Unpin (otherwise pinning still allows moves!), which can be forced by adding a PhantomPinned member to the value's type.
Note that the pin contract is only upheld if the object remains pinned for its entire lifetime. To enforce this, your factory for such objects should only dispense pinned boxes.
This could complicate your API as you cannot obtain a mutable reference to a pinned value without unsafe. The pin documentation has examples of how to do this properly.
Assuming that you have done all of this, you can then use *const T as the key in your map (where T is the pinned type). Note that conversion to a pointer is safe -- it's conversion back to a reference that isn't. So you can just use some_pin_box.get_ref() as *const _ to obtain the pointer you'll use for lookup.
The pinned box approach comes with pretty significant drawbacks:
All values being used to provide identity have to be allocated on the heap (unless using local pinning, which is unlikely to be ergonomic -- the pin! macro making this simpler is experimental).
The implementation of the type providing identity has to accept self as a &Pin or &mut Pin, requiring unsafe code to mutate the contents.
In my opinion, it's not even a good semantic fit for the problem. "Location in memory" and "identity" are different things, and it's only kind of by accident that the former can sometimes be used to implement the latter. It's a bit silly that moving a value in memory would change its identity, no?
I'd just go with adding an ID to the value. This is a substantially more obvious pattern, and it has no serious drawbacks.

how can I create a throwaway mutable reference?

I'm trying to wrap a vector to change its index behaviour, so that when an out of bounds access happens, instead of panicking it returns a reference to a dummy value, like so:
use std::ops::Index;
struct VecWrapper(Vec<()>);
impl Index<usize> for VecWrapper {
type Output = ();
fn index(&self, idx: usize) -> &() {
if idx < self.0.len() {
&self.0[idx]
} else {
&()
}
}
}
which works just fine for the implementation of Index, but trying to implement IndexMut the same way fails for obvious reasons. the type in my collection does not have a Drop implementation so no destructor needs to be called (other than freeing the memory).
the only solution I can think of is to have a static mutable array containing thousands of dummies, and distributing references to elements of this array, this is a horrible solution though it still leads to UB if there are ever more dummies borrowed than the size of the static array.

Give the wrapper an additional field that's the dummy. It's got the same lifetime restrictions, and so can't be aliased.

Refactoring out `clone` when Copy trait is not implemented?

Is there a way to get rid of clone(), given the restrictions I've noted in the comments? I would really like to know if it's possible to use borrowing in this case, where modifying the third-party function signature is not possible.
// We should keep the "data" hidden from the consumer
mod le_library {
pub struct Foobar {
data: Vec<i32> // Something that doesn't implement Copy
}
impl Foobar {
pub fn new() -> Foobar {
Foobar {
data: vec![1, 2, 3],
}
}
pub fn foo(&self) -> String {
let i = third_party(self.data.clone()); // Refactor out clone?
format!("{}{}", "foo!", i)
}
}
// Can't change the signature, suppose this comes from a crate
pub fn third_party(data:Vec<i32>) -> i32 {
data[0]
}
}
use le_library::Foobar;
fn main() {
let foobar = Foobar::new();
let foo = foobar.foo();
let foo2 = foobar.foo();
println!("{}", foo);
println!("{}", foo2);
}
playground

As long as your foo() method accepts &self, it is not possible, because the
pub fn third_party(data: Vec<i32>) -> i32
signature is quite unambiguous: regardless of what this third_party function does, it's API states that it needs its own instance of Vec, by value. This precludes using borrowing of any form, and because foo() accepts self by reference, you can't really do anything except for cloning.
Also, supposedly this third_party is written without any weird unsafe hacks, so it is quite safe to assume that the Vec which is passed into it is eventually dropped and deallocated. Therefore, unsafely creating a copy of the original Vec without cloning it (by copying internal pointers) is out of question - you'll definitely get a use-after-free if you do it.
While your question does not state it, the fact that you want to preserve the original value of data is kind of a natural assumption. If this assumption can be relaxed, and you're actually okay with giving the data instance out and e.g. replacing it with an empty vector internally, then there are several things you can potentially do:
Switch foo(&self) to foo(&mut self), then you can quite easily extract data and replace it with an empty vector.
Use Cell or RefCell to store the data. This way, you can continue to use foo(&self), at the cost of some runtime checks when you extract the value out of a cell and replace it with some default value.
Both these approaches, however, will result in you losing the original Vec. With the given third-party API there is no way around that.
If you still can somehow influence this external API, then the best solution would be to change it to accept &[i32], which can easily be obtained from Vec<i32> with borrowing.

No, you can't get rid of the call to clone here.
The problem here is with the third-party library. As the function third_party is written now, it's true that it could be using an &Vec<i32>; it doesn't require ownership, since it's just moving out a value that's Copy. However, since the implementation is outside of your control, there's nothing preventing the person maintaining the function from changing it to take advantage of owning the Vec. It's possible that whatever it is doing would be easier or require less memory if it were allowed to overwrite the provided memory, and the function writer is leaving the door open to do so in the future. If that's not the case, it might be worth suggesting a change to the third-party function's signature and relying on clone in the meantime.

How does Rust know whether to run the destructor during stack unwind?

The documentation for mem::uninitialized points out why it is dangerous/unsafe to use that function: calling drop on uninitialized memory is undefined behavior.
So this code should be, I believe, undefined:
let a: TypeWithDrop = unsafe { mem::uninitialized() };
panic!("=== Testing ==="); // Destructor of `a` will be run (U.B)
However, I wrote this piece of code which works in safe Rust and does not seem to suffer from undefined behavior:
#![feature(conservative_impl_trait)]
trait T {
fn disp(&mut self);
}
struct A;
impl T for A {
fn disp(&mut self) { println!("=== A ==="); }
}
impl Drop for A {
fn drop(&mut self) { println!("Dropping A"); }
}
struct B;
impl T for B {
fn disp(&mut self) { println!("=== B ==="); }
}
impl Drop for B {
fn drop(&mut self) { println!("Dropping B"); }
}
fn foo() -> impl T { return A; }
fn bar() -> impl T { return B; }
fn main() {
let mut a;
let mut b;
let i = 10;
let t: &mut T = if i % 2 == 0 {
a = foo();
&mut a
} else {
b = bar();
&mut b
};
t.disp();
panic!("=== Test ===");
}
It always seems to execute the right destructor, while ignoring the other one. If I tried using a or b (like a.disp() instead of t.disp()) it correctly errors out saying I might be possibly using uninitialized memory. What surprised me is while panicking, it always runs the right destructor (printing the expected string) no matter what the value of i is.
How does this happen? If the runtime can determine which destructor to run, should the part about memory mandatorily needing to be initialized for types with Drop implemented be removed from documentation of mem::uninitialized() as linked above?

Using drop flags.
Rust (up to and including version 1.12) stores a boolean flag in every value whose type implements Drop (and thus increases that type's size by one byte). That flag decides whether to run the destructor. So when you do b = bar() it sets the flag for the b variable, and thus only runs b's destructor. Vice versa with a.
Note that starting from Rust version 1.13 (at the time of this writing the beta compiler) that flag is not stored in the type, but on the stack for every variable or temporary. This is made possible by the advent of the MIR in the Rust compiler. The MIR significantly simplifies the translation of Rust code to machine code, and thus enabled this feature to move drop flags to the stack. Optimizations will usually eliminate that flag if they can figure out at compile time when which object will be dropped.
You can "observe" this flag in a Rust compiler up to version 1.12 by looking at the size of the type:
struct A;
struct B;
impl Drop for B {
fn drop(&mut self) {}
}
fn main() {
println!("{}", std::mem::size_of::<A>());
println!("{}", std::mem::size_of::<B>());
}
prints 0 and 1 respectively before stack flags, and 0 and 0 with stack flags.
Using mem::uninitialized is still unsafe, however, because the compiler still sees the assignment to the a variable and sets the drop flag. Thus the destructor will be called on uninitialized memory. Note that in your example the Drop impl does not access any memory of your type (except for the drop flag, but that is invisible to you). Therefor you are not accessing the uninitialized memory (which is zero bytes in size anyway, since your type is a zero sized struct). To the best of my knowledge that means that your unsafe { std::mem::uninitialized() } code is actually safe, because afterwards no memory unsafety can occur.

There are two questions hidden here:
How does the compiler track which variable is initialized or not?
Why may initializing with mem::uninitialized() lead to Undefined Behavior?
Let's tackle them in order.
How does the compiler track which variable is initialized or not?
The compiler injects so-called "drop flags": for each variable for which Drop must run at the end of the scope, a boolean flag is injected on the stack, stating whether this variable needs to be disposed of.
The flag starts off "no", moves to "yes" if the variable is initialized, and back to "no" if the variable is moved from.
Finally, when comes the time to drop this variable, the flag is checked and it is dropped if necessary.
This is unrelated as to whether the compiler's flow analysis complains about potentially uninitialized variables: only when the flow analysis is satisfied is code generated.
Why may initializing with mem::uninitialized() lead to Undefined Behavior?
When using mem::uninitialized() you make a promise to the compiler: don't worry, I'm definitely initializing this.
As far as the compiler is concerned, the variable is therefore fully initialized, and the drop flag is set to "yes" (until you move out of it).
This, in turn, means that Drop will be called.
Using an uninitialized object is Undefined Behavior, and the compiler calling Drop on an uninitialized object on your behalf counts as "using it".
Bonus:
In my tests, nothing weird happened!
Note that Undefined Behavior means that anything can happen; anything, unfortunately, also includes "seems to work" (or even "works as intended despite the odds").
In particular, if you do NOT access the object's memory in Drop::drop (just printing), then it's very likely that everything will just work. If you do access it, however, you might see weird integers, pointers pointing into the wild, etc...
And if the optimizer is clever, even without accessing it, it might do weird things! Since we are using LLVM, I invite you to read What every C programmer should know about Undefined Behavior by Chris Lattner (LLVM's father).

First, there are drop flags - runtime information for tracking which variables have been initialized. If a variable was not assigned to, drop() will not be executed for it.
In stable, the drop flag is currently stored within the type itself. Writing uninitialized memory to it can cause undefined behavior as to whether drop() will or will not be called. This will soon be out of date information because the drop flag is moved out of the type itself in nightly.
In nightly Rust, if you assign uninitialized memory to a variable, it would be safe to assume that drop() will be executed. However, any useful implementation of drop() will operate on the value. There is no way to detect if the type is properly initialized or not within the Drop trait implementation: it could result in trying to free an invalid pointer or any other random thing, depending on the Drop implementation of the type. Assigning uninitialized memory to a type with Drop is ill-advised anyway.

How does the Rust compiler know whether a value has been moved or not?

A simple example:
struct A;
fn main() {
test(2);
test(1);
}
fn test(i: i32) {
println!("test");
let a = A;
if i == 2 {
us(a);
}
println!("end");
}
impl Drop for A {
fn drop(&mut self) {
println!("drop");
}
}
#[allow(unused_variables)]
fn us(a: A){
println!("use");
}
When I run it, the output is:
test
use
drop
end
test
end
drop
I understand in the test(2) case, a is moved at us(a), so it's output is "test-use-drop-end".
However, in the test(1), the output is "test-end-drop", meaning that the compiler knows that a was not moved.
If us(a) is called, it will be unnecessary to drop a in test(i), it will be dropped in us(a); and if us(a) is not called, a must be dropped after println!("end").
Since it's impossible for the compiler to know whether us(a) is called or not, how does compiler know whether a.drop() shall be called or not after println!("end")?

This is explained in the Rustnomicon:
As of Rust 1.0, the drop flags are actually not-so-secretly stashed in a hidden field of any type that implements Drop.
The hidden field tells whether the current value has been dropped, or not, and if it has not then it is. Thus, this is known at run-time, and requires a bit of book keeping.
Looking to the future, there is a RFC to remove these hidden fields.
The idea of the RFC is to replace the hidden fields by:
Identifying unconditional drops (those don't need any run-time check)
Stash a hidden field on the stack, in the function frame, for those values conditionally being dropped
This new strategy has several advantages over the old one:
the main advantage being that #[repr(C)] will now always give a representation equivalent to the C's one even if the struct implements Drop
another important advantage is saving memory (by NOT inflating the struct size)
another slight advantage is a possible slight speed gain due to unconditional drops and better caching (from reducing memory size)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string