A cell with interior mutability allowing arbitrary mutation actions

A cell with interior mutability allowing arbitrary mutation actions - rust

Standard Cell struct provides interior mutability but allows only a few mutation methods such as set(), swap() and replace(). All of these methods change the whole content of the Cell.
However, sometimes more specific manipulations are needed, for example, to change only a part of data contained in the Cell.
So I tried to implement some kind of universal Cell, allowing arbitrary data manipulation.
The manipulation is represented by user-defined closure that accepts a single argument - &mut reference to the interior data of the Cell, so the user itself can deside what to do with the Cell interior. The code below demonstrates the idea:
use std::cell::UnsafeCell;
struct MtCell<Data>{
dcell: UnsafeCell<Data>,
}
impl<Data> MtCell<Data>{
fn new(d: Data) -> MtCell<Data> {
return MtCell{dcell: UnsafeCell::new(d)};
}
fn exec<F, RetType>(&self, func: F) -> RetType where
RetType: Copy,
F: Fn(&mut Data) -> RetType
{
let p = self.dcell.get();
let pd: &mut Data;
unsafe{ pd = &mut *p; }
return func(pd);
}
}
// test:
type MyCell = MtCell<usize>;
fn main(){
let c: MyCell = MyCell::new(5);
println!("initial state: {}", c.exec(|pd| {return *pd;}));
println!("state changed to {}", c.exec(|pd| {
*pd += 10; // modify the interior "in place"
return *pd;
}));
}
However, I have some concerns regarding the code.
Is it safe, i.e can some safe but malicious closure break Rust mutability/borrowing/lifetime rules by using this "universal" cell?
I consider it safe since lifetime of the interior reference parameter prohibits its exposition beyond the closure call time. But I still have doubts (I'm new to Rust).
Maybe I'm re-inventing the wheel and there exist some templates or techniques solving the problem?
Note: I posted the question here (not on code review) as it seems more related to the language rather than code itself (which represents just a concept).
[EDIT] I'd want zero cost abstraction without possibility of runtime failures, so RefCell is not perfect solution.

This is a very common pitfall for Rust beginners.
Is it safe, i.e can some safe but malicious closure break Rust mutability/borrowing/lifetime rules by using this "universal" cell? I consider it safe since lifetime of the interior reference parameter prohibits its exposition beyond the closure call time. But I still have doubts (I'm new to Rust).
In a word, no.
Playground
fn main() {
let mt_cell = MtCell::new(123i8);
mt_cell.exec(|ref1: &mut i8| {
mt_cell.exec(|ref2: &mut i8| {
println!("Double mutable ref!: {:?} {:?}", ref1, ref2);
})
})
}
You're absolutely right that the reference cannot be used outside of the closure, but inside the closure, all bets are off! In fact, pretty much any operation (read or write) on the cell within the closure is undefined behavior (UB), and may cause corruption/crashes anywhere in your program.
Maybe I'm re-inventing the wheel and there exist some templates or techniques solving the problem?
Using Cell is often not the best technique, but it's impossible to know what the best solution is without knowing more about the problem.
If you insist on Cell, there are safe ways to do this. The unstable (ie. beta) Cell::update() method is literally implemented with the following code (when T: Copy):
pub fn update<F>(&self, f: F) -> T
where
F: FnOnce(T) -> T,
{
let old = self.get();
let new = f(old);
self.set(new);
new
}
Or you could use Cell::get_mut(), but I guess that defeats the whole purpose of Cell.
However, usually the best way to change only part of a Cell is by breaking it up into separate Cells. For example, instead of Cell<(i8, i8, i8)>, use (Cell<i8>, Cell<i8>, Cell<i8>).
Still, IMO, Cell is rarely the best solution. Interior mutability is a common design in C and many other languages, but it is somewhat more rare in Rust, at least via shared references and Cell, for a number of reasons (e.g. it's not Sync, and in general people don't expect interior mutability without &mut). Ask yourself why you are using Cell and if it is really impossible to reorganize your code to use normal &mut references.
IMO the bottom line is actually about safety: if no matter what you do, the compiler complains and it seems that you need to use unsafe, then I guarantee you that 99% of the time either:
There's a safe (but possibly complex/unintuitive) way to do it, or
It's actually undefined behavior (like in this case).
EDIT: Frxstrem's answer also has better info about when to use Cell/RefCell.

Your code is not safe, since you can call c.exec inside c.exec to get two mutable references to the cell contents, as demonstrated by this snippet containing only safe code:
let c: MyCell = MyCell::new(5);
c.exec(|n| {
// need `RefCell` to access mutable reference from within `Fn` closure
let n = RefCell::new(n);
c.exec(|m| {
let n = &mut *n.borrow_mut();
// now `n` and `m` are mutable references to the same data, despite using
// no unsafe code. this is BAD!
})
})
In fact, this is exactly the reason why we have both Cell and RefCell:
Cell only allows you to get and set a value and does not allow you to get a mutable reference from an immutable one (thus avoiding the above issue), but it does not have any runtime cost.
RefCell allows you to get a mutable reference from an immutable one, but needs to perform checks at runtime to ensure that this is safe.
As far as I know, there's not really any safe way around this, so you need to make a choice in your code between no runtime cost but less flexibility, and more flexibility but with a small runtime cost.

Related

How to take, transform and replace a vector in a mutable reference?

I have a struct Database { events: Vec<Event> }. I would like to apply some maps and filters to events. What is a good way to do this?
Here's what I tried:
fn update(db: &mut Database) {
db.events = db.events.into_iter().filter(|e| !e.cancelled).collect();
}
This doesn't work:
cannot move out of `db.events` which is behind a mutable reference
...
move occurs because `db.events` has type `Vec<Event>`, which does not implement the `Copy` trait
Is there any way to persuade Rust compiler that I'm taking the field value only temporarily?

The conceptual issue of why this doesn't work is due to panics. If, for example, the filter callback panics, then db.events would have been moved out of, by into_iter, but would not have had a value to replace it with - it would be uninitialized, and therefore unsafe.
Joël Hecht has what you really want to do in your specific instance: Vec::retain lets you filter out elements in place, and also reuses the storage.
Alexey Larionov also has an answer involving Vec::drain, which will leave an empty vector until the replacement happens. It requires a new allocation, though.
However, in the general case, the replace_with and take_mut crates offer functions to help accomplish what you are doing. You provide the function a closure that takes the value and returns its replacement, and the crates will run that closure, and aborting the process if there are panics.

In the case you exposed, the safer way is to use Vec.retain :
fn update(db: &mut Database) {
db.events.retain(|e| !e.cancelled);
}

Alternatively to #Joël Hecht's answer, you can Vec::drain the elements to then recreate the vector. Playground
fn update(db: &mut Database) {
db.events = db.events
.drain(..)
.filter(|e| !e.cancelled)
.collect();
}

Is it available to drop a variable holding a primitive value in Rust?

Updated Question:
Or I can ask this way: for every type T, if it's Copy, then there is no way for it to be moved, right? I mean is there any way like the std::move in C++ can move a copyable value explicitly?
Original Question:
Presume we have below a piece of Rust code, in this code, I defined a variable x holding an i32 value. What I want to do is to drop its value and invalidate it. I tried to use ptr::drop_in_place to drop it through a pointer, but it doesn't work, why?
fn main() {
let mut x = 10;
use std::ptr;
unsafe {
ptr::drop_in_place(&mut x as *mut i32);
}
println!("{}", x); // x is still accessible here.
}

For every type T, if it's Copy, then there is no way for it to be moved, right?
That is one way to word it. The semantics of Copy are such that any move leaves the original object valid.
Because of this, and that Drop and Copy are mutually exclusive traits, there's no way to "drop" a Copy. The traditional method of calling std::mem::drop(x) won't work. The only meaningful thing you can do is let the variable fall out of scope:
fn main() {
{
let x = 10;
}
println!("{}", x); // x is no longer accessible here.
}
I mean is there any way like the std::move in C++ can move a copyable value explicitly?
The specifics of copying vs moving are quite different between C++ and Rust. All types are moveable in Rust, whereas its opt-in for C++. And moving and copying in Rust are always bitwise copies, there's no room for custom code. Moving in Rust leaves the source object invalid whereas its still useable as a value in C++.
I can go on, but I'll leave off one last bit: moving a primitive in C++ isn't different than a copy either.

Avoiding borrowing mutable and immutable at the same time

To add the elements of two Vecs I wrote a function like
fn add_components(dest: &mut Vec<i32>, first: &Vec<i32>, second: &Vec<i32>){
for i in 0..first.len() {
dest[i] = first[i] + second[i];
}
}
And this works fine when dest is another Vec.
let mut new_components = Vec::with_capacity(components.len());
Vector::add_components(&mut new_comps, &components, &other_components);
But it blows up when I am trying to add in-place:
Vector::add_components(&mut components, &components, &other_components);
because now I borrow components as mutable and immutable at the same time. But this obviously is what I am trying to achieve.
Are there any conventional and general (meaning not only concerning Vecs) solutions to this problem which don't involve unsafe code and pointer magic?
Another example of this problem:
Suppose I want to overload AddAssign for a numeric type like
impl AddAssign<Output=&NumericType> for NumericType {
fn add_assign(&mut self, other: &NumericType) {
unimplemented!() // concrete implementation is not important
}
}
Notice that I want to take a reference as second argument to avoid copying. This works fine when adding two different objects, but adding an object to itself creates the exact same scenario:
let mut num = NumericType{};
num += &num
I am borrowing num mutably and immutably at the same time. So obviously this should work and is safe, but it also is against Rust's borrowing rules.
What are the best practices (apart from copying of course) to deal with this issue, which arises in many forms?

There is no generic solution to this. Rust can't generically abstract over mutability in borrow checking.
You will need to have two versions of the function for in-place and destination versions.
Rust has strict aliasing rules, so dest[i] = first[i] + second[i] actually compiles to different code depending on whether the compiler has a guarantee that dest and first are different. Don't try to fudge it with unsafe, because it will be Undefined Behavior and will get miscompiled.

How to move data into multiple Rust closures?

I have a two widgets in a simple GTK app:
extern crate gdk;
extern crate gtk;
use super::desktop_entry::DesktopEntry;
use gdk::enums::key;
use gtk::prelude::*;
pub fn launch_ui(_desktop_entries: Vec<DesktopEntry>) {
gtk::init().unwrap();
let builder = gtk::Builder::new_from_string(include_str!("interface.glade"));
let window: gtk::Window = builder.get_object("main_window").unwrap();
let search_entry: gtk::SearchEntry = builder.get_object("search_entry").unwrap();
let list_box: gtk::ListBox = builder.get_object("list_box").unwrap();
window.show_all();
search_entry.connect_search_changed(move |_se| {
let _a = list_box.get_selected_rows();
});
window.connect_key_press_event(move |_, key| {
match key.get_keyval() {
key::Down => {
list_box.unselect_all();
}
_ => {}
}
gtk::Inhibit(false)
});
gtk::main();
}
I need to change list_box from both events. I have two closures that move, but it is not possible to move list_box to both closures simultaneously as I get the error:
error[E0382]: capture of moved value: `list_box`
What can I do?

As explained in Shepmaster's answer, you can only move a value out of a variable once, and the compiler will prevent you from doing it a second time. I'll try to add a bit of specific context for this use case. Most of this is from my memory of having used GTK from C ages ago, and a few bits I just looked up in the gtk-rs documentation, so I'm sure I got some details wrong, but I think the general gist is accurate.
Let's first take a look at why you need to move the value into the closures in the first place. The methods you call on list_box inside both closures take self by reference, so you don't actually consume the list box in the closures. This means it would be perfectly valid to define the two closures without the move specifiers – you only need read-only references to list_box, you are allowed to have more than one read-only reference at once, and list_box lives at least as long as the closures.
However, while you are allowed to define the two closures without moving list_box into them, you can't pass the closures defined this way to gtk-rs: all functions connecting event handlers only accept "static" functions, e.g.
fn connect_search_changed<F: Fn(&Self) + 'static>(
&self,
f: F
) -> SignalHandlerId
The type F of the handler has the trait bound Fn(&Self) + 'static, which means that the closure either can't hold any references at all, or all references it holds must have static lifetime. If we don't move list_box into the closure, the closure will hold a non-static reference to it. So we need to get rid of the reference before being able to use the function as an event handler.
Why does gtk-rs impose this limitation? The reason is that gtk-rs is a wrapper around a set of C libraries, and a pointer to the callback is eventually passed on to the underlying glib library. Since C does not have any concept of lifetimes, the only way to do this safely is to require that there aren't any references that may become invalid.
We have now established that our closures can't hold any references. We still need to access list_box from the closures, so what are our options? If you only have a single closure, using move does the trick – by moving list_box into the closure, the closure becomes its owner. However, we have seen that this doesn't work for more than one closure, because we can only move list_box once. We need to find a way to have multiple owners for it, and the Rust standard library provides such a way: the reference-counting pointers Rc and Arc. The former is used for values that are only accessed from the current thread, while the latter is safe to move to other threads.
If I remember correctly, glib executes all event handlers in the main thread, and the trait bounds for the closure reflect this: the closure isn't required to be Send or Sync, so we should be able to make do with Rc. Morevoer, we only need read access to list_box in the closures, so we don't need RefCell or Mutex for interior mutability in this case. In summary, all you need is probably this:
use std::rc::Rc;
let list_box: gtk::ListBox = builder.get_object("list_box").unwrap();
let list_box_1 = Rc::new(list_box);
let list_box_2 = list_box_1.clone();
Now you have two "owned" pointers to the same list box, and these pointers can be moved into the two closures.
Disclaimer: I couldn't really test any of this, since your example code isn't self-contained.

You can use cloning on the gtk-rs widgets.
In gtk-rs every object implementing gtk::Widget (so basically every GTK object you can use inside a gtk::Window) must also implement the Clone trait. Calling clone() is very cheap because it's just a pointer copy and a reference counter update.
Knowing this below is valid and cheap:
let list_box_clone = list_box.clone();
search_entry.connect_search_changed(move |_se| {
let _a = list_box.get_selected_rows();
});
But since this solution is verbose and gets very ugly very soon if you have more than one objects to move, the community came up with the following macro:
macro_rules! clone {
(#param _) => ( _ );
(#param $x:ident) => ( $x );
($($n:ident),+ => move || $body:expr) => (
{
$( let $n = $n.clone(); )+
move || $body
}
);
($($n:ident),+ => move |$($p:tt),+| $body:expr) => (
{
$( let $n = $n.clone(); )+
move |$(clone!(#param $p),)+| $body
}
);
}
The usage is very simple:
search_entry.connect_search_changed(clone!(list_box => move |_se| {
let _a = list_box.get_selected_rows();
}));
This macro is capable of cloning any number of objects that are moved into a closure.
For further explanation and examples check out this tutorial from the gtk-rs team: Callbacks and closures

You literally cannot do this. I encourage you to go back and re-read The Rust Programming Language to refresh yourself on ownership. When a non-Copy type is moved, it's gone — this is a giant reason that Rust even exists: to track this so the programmer doesn't have to.
If a type is Copy, the compiler will automatically make the copy for you. If a type is Clone, then you must invoke the clone explicitly.
You will need to change to shared ownership and most likely interior mutability.
Shared ownership allows a single piece of data to be jointly owned by multiple values, creating additional owners via cloning.
Interior mutability is needed because Rust disallows multiple mutable references to one item at the same time.
Wrap your list_box in a Mutex and then an Arc (Arc<Mutex<T>>). Clone the Arc for each handler and move that clone into the handler. You can then lock the list_box and make whatever changes you need.
See also:
What is the right way to share a reference between closures if the value outlives the closures?
How to share an Arc in multiple closures?

When is it necessary to circumvent Rust's borrow checker?

I'm implementing Conway's game of life to teach myself Rust. The idea is to implement a single-threaded version first, optimize it as much as possible, then do the same for a multi-threaded version.
I wanted to implement an alternative data layout which I thought might be more cache-friendly. The idea is to store the status of two cells for each point on a board next to each other in memory in a vector, one cell for reading the current generation's status from and one for writing the next generation's status to, alternating the access pattern for each
generation's computation (which can be determined at compile time).
The basic data structures are as follows:
#[repr(u8)]
pub enum CellStatus {
DEAD,
ALIVE,
}
/** 2 bytes */
pub struct CellRW(CellStatus, CellStatus);
pub struct TupleBoard {
width: usize,
height: usize,
cells: Vec<CellRW>,
}
/** used to keep track of current pos with iterator e.g. */
pub struct BoardPos {
x_pos: usize,
y_pos: usize,
offset: usize,
}
pub struct BoardEvo {
board: TupleBoard,
}
The function that is causing me troubles:
impl BoardEvo {
fn evolve_step<T: RWSelector>(&mut self) {
for (pos, cell) in self.board.iter_mut() {
//pos: BoardPos, cell: &mut CellRW
let read: &CellStatus = T::read(cell); //chooses the right tuple half for the current evolution step
let write: &mut CellStatus = T::write(cell);
let alive_count = pos.neighbours::<T>(&self.board).iter() //<- can't borrow self.board again!
.filter(|&&status| status == CellStatus::ALIVE)
.count();
*write = CellStatus::evolve(*read, alive_count);
}
}
}
impl BoardPos {
/* ... */
pub fn neighbours<T: RWSelector>(&self, board: &BoardTuple) -> [CellStatus; 8] {
/* ... */
}
}
The trait RWSelector has static functions for reading from and writing to a cell tuple (CellRW). It is implemented for two zero-sized types L and R and is mainly a way to avoid having to write different methods for the different access patterns.
The iter_mut() method returns a BoardIter struct which is a wrapper around a mutable slice iterator for the cells vector and thus has &mut CellRW as Item type. It is also aware of the current BoardPos (x and y coordinates, offset).
I thought I'd iterate over all cell tuples, keep track of the coordinates, count the number of alive neighbours (I need to know coordinates/offsets for this) for each (read) cell, compute the cell status for the next generation and write to the respective another half of the tuple.
Of course, in the end, the compiler showed me the fatal flaw in my design, as I borrow self.board mutably in the iter_mut() method and then try to borrow it again immutably to get all the neighbours of the read cell.
I have not been able to come up with a good solution for this problem so far. I did manage to get it working by making all
references immutable and then using an UnsafeCell to turn the immutable reference to the write cell into a mutable one.
I then write to the nominally immutable reference to the writing part of the tuple through the UnsafeCell.
However, that doesn't strike me as a sound design and I suspect I might run into issues with this when attempting to parallelize things.
Is there a way to implement the data layout I proposed in safe/idiomatic Rust or is this actually a case where you actually have to use tricks to circumvent Rust's aliasing/borrow restrictions?
Also, as a broader question, is there a recognizable pattern for problems which require you to circumvent Rust's borrow restrictions?

When is it necessary to circumvent Rust's borrow checker?
It is needed when:
the borrow checker is not advanced enough to see that your usage is safe
you do not wish to (or cannot) write the code in a different pattern
As a concrete case, the compiler cannot tell that this is safe:
let mut array = [1, 2];
let a = &mut array[0];
let b = &mut array[1];
The compiler doesn't know what the implementation of IndexMut for a slice does at this point of compilation (this is a deliberate design choice). For all it knows, arrays always return the exact same reference, regardless of the index argument. We can tell that this code is safe, but the compiler disallows it.
You can rewrite this in a way that is obviously safe to the compiler:
let mut array = [1, 2];
let (a, b) = array.split_at_mut(1);
let a = &mut a[0];
let b = &mut b[0];
How is this done? split_at_mut performs a runtime check to ensure that it actually is safe:
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
let len = self.len();
let ptr = self.as_mut_ptr();
unsafe {
assert!(mid <= len);
(from_raw_parts_mut(ptr, mid),
from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
}
}
For an example where the borrow checker is not yet as advanced as it can be, see What are non-lexical lifetimes?.
I borrow self.board mutably in the iter_mut() method and then try to borrow it again immutably to get all the neighbours of the read cell.
If you know that the references don't overlap, then you can choose to use unsafe code to express it. However, this means you are also choosing to take on the responsibility of upholding all of Rust's invariants and avoiding undefined behavior.
The good news is that this heavy burden is what every C and C++ programmer has to (or at least should) have on their shoulders for every single line of code they write. At least in Rust, you can let the compiler deal with 99% of the cases.
In many cases, there's tools like Cell and RefCell to allow for interior mutation. In other cases, you can rewrite your algorithm to take advantage of a value being a Copy type. In other cases you can use an index into a slice for a shorter period. In other cases you can have a multi-phase algorithm.
If you do need to resort to unsafe code, then try your best to hide it in a small area and expose safe interfaces.
Above all, many common problems have been asked about (many times) before:
How to iterate over mutable elements inside another mutable iteration over the same elements?
Mutating an item inside of nested loops
How can a nested loop with mutations on a HashMap be achieved in Rust?
What's the Rust way to modify a structure within nested loops?
Nesting an iterator's loops

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

A cell with interior mutability allowing arbitrary mutation actions - rust

Related

How to take, transform and replace a vector in a mutable reference?

Is it available to drop a variable holding a primitive value in Rust?

Avoiding borrowing mutable and immutable at the same time

How to move data into multiple Rust closures?

When is it necessary to circumvent Rust's borrow checker?

Categories

Resources