It might seem like a strange question, but "interior mutability" cells in Rust are always explained and used in a context where they're used together with some kind of shared pointer. And indeed it seems pretty useless on its own, since you get interior mutability with no way to share it.
So here is my question: Is it ever useful ? And if it is not (and never used), what's the point of not providing a RefCell that is also a shared pointer ?
The RefCell is not always directly inside the Rc. For example, this kind of struct can be quite useful:
use std::cell::RefCell;
struct SharedVec<T> {
inner: RefCell<Vec<T>>,
}
impl<T> SharedVec<T> {
fn push(&self, value: T) {
self.inner.borrow_mut().push(value);
}
fn pop(&self) -> Option<T> {
self.inner.borrow_mut().pop()
}
}
And then, you might have an Rc<SharedVec<T>>.
Related
My question has already been somewhat discussed here.
The problem is that I want to access multiple distinct fields of a struct in order to use them, but I don't want to work on the fields directly. Instead I'd like to encapsulate the access to them, to gain more flexibility.
I tried to achieve this by writing methods for this struct, but as you can see in the question mentioned above or my older question here this approach fails in Rust because the borrow checker will only allow you to borrow different parts of a struct if you do so directly, or if you use a method borrowing them together, since all the borrow checker knows from its signature is that self is borrowed and only one mutable reference to self can exist at any time.
Losing the flexibility to borrow different parts of a struct as mutable at the same time is of course not acceptable. Therefore I'd like to know whether there is any idiomatic way to do this in Rust.
My naive approach would be to write a macro instead of a function, performing (and encapsulating) the same functionality.
EDIT:
Because Frxstrem suggested that the question which I linked to at the top may answer my question I want to make it clear, that I'm not searching for some way to solve this. My question is which of the proposed solutions (if any) is the right way to do it?
After some more research it seems that there is no beautiful way to achieve partial borrowing without accessing the struct fields directly, at least as of now.
There are, however, multiple imperfect solutions to this which I'd like to discuss a little, so that anyone who might end up here may weigh the pros and cons for themself.
I'll use the code from my original question as an example to illustrate them here.
So let's say you have a struct:
struct GraphState {
nodes: Vec<Node>,
edges: Vec<Edge>,
}
and you're trying to do borrow one part of this struct mutably and the other immutably:
// THIS CODE DOESN'T PASS THE BORROW-CHECKER
impl GraphState {
pub fn edge_at(&self, edge_index: u16) -> &Edge {
&self.edges[usize::from(edge_index)]
}
pub fn node_at_mut(&mut self, node_index: u8) -> &mut Node {
&mut self.nodes[usize::from(node_index)]
}
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = self.edge_at(edge_index); // first (immutable) borrow here
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // second (mutable)
// borrow here -> ERROR
}
}
}
}
But of course this fails, since you're not allowed to borrow self as mutable and immutable at the same time.
So to get around this you could simply access the fields directly:
impl GraphState {
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = &self.edges[usize::from(edge_index)];
for i in 0..2 {
let node_index = edge.node_indices[i];
self.nodes[usize::from(node_index)].remove_edge(edge_index);
}
}
}
This approach works, but it has two major drawbacks:
The accessed fields need to be public (at least if you want to allow access to them from another scope). If they're implementation details that you'd rather have private you're in bad luck.
You always need to operate on the fields directly. This means that code like usize::from(node_index) needs to be repeated everywhere, which makes this approach both brittle and cumbersome.
So how can you solve this?
A) Borrow everything at once
Since borrowing self mutably multiple times is not allowed, mutably borrowing all the parts you want at once instead is one straightforward way to solve this problem:
pub fn edge_at(edges: &[Edge], edge_index: u16) -> &Edge {
&edges[usize::from(edge_index)]
}
pub fn node_at_mut(nodes: &mut [Node], node_index: u8) -> &mut Node {
&mut nodes[usize::from(node_index)]
}
impl GraphState {
pub fn data_mut(&mut self) -> (&mut [Node], &mut [Edge]) {
(&mut self.nodes, &mut self.edges)
}
pub fn remove_edge(&mut self, edge_index: u16) {
let (nodes, edges) = self.data_mut(); // first (mutable) borrow here
let edge = edge_at(edges, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut(nodes, node_index).remove_edge(edge_index); // no borrow here
// -> no error
}
}
}
}
It's clearly a workaround and far from ideal, but it works and it allows you to keep the fields themselves private (though you'll probably need to expose the implementation somewhat, as the user has to hand the necessary data to the other functions by hand).
B) Use a macro
If code reuse is all you worry about and visibility is not an issue to you you can write macros like these:
macro_rules! node_at_mut {
($this:ident, $index:expr) => {
&mut self.nodes[usize::from($index)]
}
}
macro_rules! edge_at {
($this:ident, $index:expr) => {
&mut self.edges[usize::from($index)]
}
}
...
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = edge_at!(self, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut!(self, node_index).remove_edge(edge_index);
}
}
}
If your fields are public anyway I'd probably go with this solution, as it seems the most elegant to me.
C) Go unsafe
What we want to do here is obviously safe.
Sadly the borrow checker cannot see this, since the function signatures tell him no more than that self is being borrowed. Luckily Rust allows us to use the unsafe keyword for cases like these:
pub fn remove_edge(&mut self, edge_index: u16) {
let edge: *const Edge = self.edge_at(edge_index);
for i in 0..2 {
unsafe {
let node_index = (*edge).node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // first borrow here
// (safe though since remove_edge will not invalidate the first pointer)
}
}
}
This works and gives us all the benefits of solution A), but using unsafe for something that could easily be done without, if only the language had some syntax for actual partial borrowing, seems a bit ugly to me. On the other hand it may (in some cases) be preferable to solution A), due to its sheer clunkyness...
EDIT: After some thought I realised, that we know this approach to be safe here, only because we know about the implementation. In a different use case the data in question might actually be contained in one single field (say a Map), even if it looks like we are accessing two very distinct kinds of data when calling from outside.
This is why this last approach is rightfully unsafe, as there is no way for the borrow checker to check whether we are actually borrowing different things without exposing the private fields, rendering our effort pointless.
Contrary to my initial belief, this couldn't even really be changed by expanding the language. The reason is that one would still need to expose information about private fields in some way for this to work.
After writing this answer I also found this blog post which goes a bit deeper into possible solutions (and also mentions some advanced techniques that didn't come to my mind, none of which is universally applicable though). If you happen to know another solution, or a way to improve the ones proposed here, please let me know.
I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.
No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...
The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.
There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}
Background
I know that in Rust people prefer &str rather than &String. But in some case we were only given &String.
One example is when you call std::iter::Iterator::peekable. The return value is a Peekable<I> object that wraps the original iterator into it and gives you one extra method peek.
The point here is that peek only gives you a reference to the iterator item. So if you have an iterator that contains Strings, you only have &String in this case. Of cause, you can easily use as_str to get a &str but in the code I will show below it is equivalent to a call to clone.
The question
This code
#[derive(Debug)]
struct MyStruct(String);
impl MyStruct {
fn new<T>(t: T) -> MyStruct
where
T: Into<String>,
{
MyStruct(t.into())
}
}
fn main() {
let s: String = "Hello world!".into();
let st: MyStruct = MyStruct::new(&s);
println!("{:?}", st);
}
doesn't compile because String doesn't implement From<&String>. This is not intuitive.
Why does this not work? Is it just a missing feature of the standard library or there are some other reasons that prevent the standard library from implementing it?
In the real code, I only have a reference to a String and I know to make it work I only need to call clone instead, but I want to know why.
To solve your problem, one could imagine adding a new generic impl to the standard library:
impl<'a, T: Clone> From<&'a T> for T { ... }
Or to make it more generic:
impl<B, O> From<B> for O where B: ToOwned<Owned=O> { ... }
However, there are two problems with doing that:
Specialization: the specialization feature that allows to overlapping trait-impls is still unstable. It turns out that designing specialization in a sound way is way more difficult than expected (mostly due to lifetimes).
Without it being stable, the Rust devs are very careful not to expose that feature somewhere in the standard library's public API. This doesn't mean that it isn't used at all in std! A famous example is the specialized ToString impl for str. It was introduced in this PR. As you can read in the PR's discussion, they only accepted it because it does not change the API (to_string() was already implemented for str).
However, it's different when we would add the generic impl above: it would change the API. Thus, it's not allowed in std yet.
core vs std: the traits From and Into are defined in the
core library, whereas Clone and ToOwned are defined in std. This means that we can't add a generic impl in core, because core doesn't know anything about std. But we also can't add the generic impl in std, because generic impls need to be in the same crate as the trait (it's a consequence of the orphan rules).
Thus, it would required some form of refactoring and moving around definitions (which may or may not be difficult) before able to add such a generic impl.
Note that adding
impl<'a> From<&'a String> for String { ... }
... works just fine. It doesn't require specialization and doesn't have problems with orphan rules. But of course, we wouldn't want to add a specific impl, when the generic impl would make sense.
(thanks to the lovely people on IRC for explaining stuff to me)
Since String does implement From<&str>, you can make a simple change:
fn main() {
let s: String = "Hello world!".into();
// Replace &s with s.as_str()
let st: MyStruct = MyStruct::new(s.as_str());
println!("{:?}", st);
}
All &Strings can be trivially converted into &str via as_str, which is why all APIs should prefer to use &str; it's a strict superset of accepting &String.
I feel like rc::Weak could use a (sort of) AsRef trait implementation. I'm trying to borrow some shared content from a weak pointer, but this won't compile:
use std::rc::Weak;
struct Thing<T>(Weak<T>);
impl<T> Thing<T> {
fn as_ref(&self) -> Option<&T> {
self.0.upgrade().map(|rc| {
rc.as_ref()
})
}
// For clarity, without a confusing closure
fn unwrapped_as_ref(&self) -> &T {
self.0.upgrade().unwrap().as_ref()
}
}
I understand why: the upgraded Rc does not survive the as_ref call. However it seems to me that it is perfectly sound. A possible magic trick using unsafe that does compile:
impl<T> Thing<T> {
fn unwrapped_as_ref<'a>(&'a self) -> &'a T {
let rc = self.0.upgrade().unwrap();
unsafe {
std::mem::transmute(rc.as_ref())
}
}
}
So:
Are there any downsides to this solution? Is it sound? Can you think of a simpler alternative?
Would it make sense to implement a as_ref(&self) -> Option<&T> in the standard library?
You can’t borrow from a weak reference, you just can’t. It’s weak, it does not guarantee that the underlying object exists (that’s why upgrade() returns an Option). And even if you were lucky and the value was still alive at the point you accessed it through the weak reference (upgrade() returned Some), it can be freed the next moment, as soon as the upgraded reference goes out of scope.
In order to get a reference to the underlying value you need something that will keep it alive (e.g. a strong reference), but this means you’ll have to return it along with the reference.