Rust mutable container of immutable elements? - rust

With Rust, is it in general possible to have a mutable container of immutable values?
Example:
struct TestStruct { value: i32 }
fn test_fn()
{
let immutable_instance = TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
Here, my TestStruct instance is wrapped in two containers: a Box, then a Vec. From the perspective of a new Rust user it's surprising that moving the Box into the Vec makes both the Box and the TestStruct instance mutable.
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?

Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Not really. You could easily create one (just create a wrapper object which implements Deref but not DerefMut), but the reality is that Rust doesn't really see (im)mutability that way, because its main concern is controlling sharing / visibility.
After all, for an external observer what difference is there between
mutable_vector[0].value = 456;
and
mutable_vector[0] = Box::new(TestStruct{value: 456});
?
None is the answer, because Rust's ownership system means it's not possible for an observer to have kept a handle on the original TestStruct, thus they can't know whether that structure was replaced or modified in place[1][2].
If you want to secure your internal state, use visibility instead: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8a9346072b32cedcf2fccc0eeb9f55c5
mod foo {
pub struct TestStruct { value: i32 }
impl TestStruct {
pub fn new(value: i32) -> Self { Self { value } }
}
}
fn test_fn() {
let immutable_instance = foo::TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
does not compile because from the point of view of test_fn, TestStruct::value is not accessible. Therefore test_fn has no way to mutate a TestStruct unless you add an &mut method on it.
[1]: technically they could check the address in memory and that might tell them, but even then it's not a sure thing (in either direction) hence pinning being a thing.
[2]: this observability distinction is also embraced by other languages, for instance the Clojure language largely falls on the "immutable all the things" side, however it has a concept of transients which allow locally mutable objects

Related

References in rust self referential structs

Given the code snippet below:
use std::{io::BufWriter, pin::Pin};
pub struct SelfReferential {
pub writer: BufWriter<&'static mut [u8]>, // borrowed from buffer
pub buffer: Pin<Box<[u8]>>,
}
#[cfg(test)]
mod tests {
use std::io::Write;
use super::*;
fn init() -> SelfReferential {
let mut buffer = Pin::new(vec![0; 12].into_boxed_slice());
let writer = unsafe { buffer.as_mut().get_unchecked_mut() };
let writer = unsafe { (writer as *mut [u8]).as_mut().unwrap() };
let writer = BufWriter::new(writer);
SelfReferential { writer, buffer }
}
#[test]
fn move_works() {
let mut sr = init();
sr.writer.write(b"hello ").unwrap();
sr.writer.flush().unwrap();
let mut slice = &mut sr.buffer[6..];
slice.write(b"world!").unwrap();
assert_eq!(&sr.buffer[..], b"hello world!".as_ref());
let mut sr_moved = sr;
sr_moved.writer.write(b"W").unwrap();
sr_moved.writer.flush().unwrap();
assert_eq!(&sr_moved.buffer[..], b"hello World!".as_ref());
}
}
The first question: is it OK to assign 'static lifetime to mutable slice reference in BufWriter? As technically speaking, it's bound to the lifetime of struct instances themselves, and AFAIK there's no safe way to invalidate it.
The second question: besides the fact that unsafe instantiation of this type, in test example, creates two mutable references into the underlying buffer, is there any other potential dangers associated with such an "unidiomatic" (for the lack of better word) type?
is it OK to assign 'static lifetime to mutable slice reference in BufWriter?
Sort of, but there's a bigger problem. The lifetime itself is not worse than any other choice, because there is no lifetime that you can use here which is really accurate. But it is not safe to expose that reference, because then it can be taken:
let w = BufWriter<&'static mut [u8]> = {
let sr = init();
sr.writer
};
// `sr.buffer` has now been dropped, so `w` has a dangling reference
is there any other potential dangers associated with such an "unidiomatic" (for the lack of better word) type?
Yes, it's undefined behavior. Box isn't just managing an allocation; it also (currently) signals a claim of unique, non-aliasing access to the contents. You violate that non-aliasing by creating the writer and then moving the buffer — even though the heap memory is not actually touched, the move of buffer is counted invalidating all references into it.
This is an area of Rust semantics which is not yet fully nailed down, but as far as the current compiler is concerned, this is UB. You can see this if you run your test code under the Miri interpreter.
The good news is, what you're trying to do is a very common desire and people have worked on the problem. I personally recommend using ouroboros — with the help of a macro, it allows you to create the struct you want without writing any new unsafe code. There will be some restrictions on how you use the writer, but nothing you can't tidy out of the way by writing an impl io::Write for SelfReferential. Another, newer library in this space is yoke; I haven't tried it.

How to create a Box<UnsafeCell<[T]>>

The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.
The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.
Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground

How to achieve encapsulation of struct fields without borrowing the struct as a whole

My question has already been somewhat discussed here.
The problem is that I want to access multiple distinct fields of a struct in order to use them, but I don't want to work on the fields directly. Instead I'd like to encapsulate the access to them, to gain more flexibility.
I tried to achieve this by writing methods for this struct, but as you can see in the question mentioned above or my older question here this approach fails in Rust because the borrow checker will only allow you to borrow different parts of a struct if you do so directly, or if you use a method borrowing them together, since all the borrow checker knows from its signature is that self is borrowed and only one mutable reference to self can exist at any time.
Losing the flexibility to borrow different parts of a struct as mutable at the same time is of course not acceptable. Therefore I'd like to know whether there is any idiomatic way to do this in Rust.
My naive approach would be to write a macro instead of a function, performing (and encapsulating) the same functionality.
EDIT:
Because Frxstrem suggested that the question which I linked to at the top may answer my question I want to make it clear, that I'm not searching for some way to solve this. My question is which of the proposed solutions (if any) is the right way to do it?
After some more research it seems that there is no beautiful way to achieve partial borrowing without accessing the struct fields directly, at least as of now.
There are, however, multiple imperfect solutions to this which I'd like to discuss a little, so that anyone who might end up here may weigh the pros and cons for themself.
I'll use the code from my original question as an example to illustrate them here.
So let's say you have a struct:
struct GraphState {
nodes: Vec<Node>,
edges: Vec<Edge>,
}
and you're trying to do borrow one part of this struct mutably and the other immutably:
// THIS CODE DOESN'T PASS THE BORROW-CHECKER
impl GraphState {
pub fn edge_at(&self, edge_index: u16) -> &Edge {
&self.edges[usize::from(edge_index)]
}
pub fn node_at_mut(&mut self, node_index: u8) -> &mut Node {
&mut self.nodes[usize::from(node_index)]
}
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = self.edge_at(edge_index); // first (immutable) borrow here
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // second (mutable)
// borrow here -> ERROR
}
}
}
}
But of course this fails, since you're not allowed to borrow self as mutable and immutable at the same time.
So to get around this you could simply access the fields directly:
impl GraphState {
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = &self.edges[usize::from(edge_index)];
for i in 0..2 {
let node_index = edge.node_indices[i];
self.nodes[usize::from(node_index)].remove_edge(edge_index);
}
}
}
This approach works, but it has two major drawbacks:
The accessed fields need to be public (at least if you want to allow access to them from another scope). If they're implementation details that you'd rather have private you're in bad luck.
You always need to operate on the fields directly. This means that code like usize::from(node_index) needs to be repeated everywhere, which makes this approach both brittle and cumbersome.
So how can you solve this?
A) Borrow everything at once
Since borrowing self mutably multiple times is not allowed, mutably borrowing all the parts you want at once instead is one straightforward way to solve this problem:
pub fn edge_at(edges: &[Edge], edge_index: u16) -> &Edge {
&edges[usize::from(edge_index)]
}
pub fn node_at_mut(nodes: &mut [Node], node_index: u8) -> &mut Node {
&mut nodes[usize::from(node_index)]
}
impl GraphState {
pub fn data_mut(&mut self) -> (&mut [Node], &mut [Edge]) {
(&mut self.nodes, &mut self.edges)
}
pub fn remove_edge(&mut self, edge_index: u16) {
let (nodes, edges) = self.data_mut(); // first (mutable) borrow here
let edge = edge_at(edges, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut(nodes, node_index).remove_edge(edge_index); // no borrow here
// -> no error
}
}
}
}
It's clearly a workaround and far from ideal, but it works and it allows you to keep the fields themselves private (though you'll probably need to expose the implementation somewhat, as the user has to hand the necessary data to the other functions by hand).
B) Use a macro
If code reuse is all you worry about and visibility is not an issue to you you can write macros like these:
macro_rules! node_at_mut {
($this:ident, $index:expr) => {
&mut self.nodes[usize::from($index)]
}
}
macro_rules! edge_at {
($this:ident, $index:expr) => {
&mut self.edges[usize::from($index)]
}
}
...
pub fn remove_edge(&mut self, edge_index: u16) {
let edge = edge_at!(self, edge_index);
// update the edge-index collection of the nodes connected by this edge
for i in 0..2 {
let node_index = edge.node_indices[i];
node_at_mut!(self, node_index).remove_edge(edge_index);
}
}
}
If your fields are public anyway I'd probably go with this solution, as it seems the most elegant to me.
C) Go unsafe
What we want to do here is obviously safe.
Sadly the borrow checker cannot see this, since the function signatures tell him no more than that self is being borrowed. Luckily Rust allows us to use the unsafe keyword for cases like these:
pub fn remove_edge(&mut self, edge_index: u16) {
let edge: *const Edge = self.edge_at(edge_index);
for i in 0..2 {
unsafe {
let node_index = (*edge).node_indices[i];
self.node_at_mut(node_index).remove_edge(edge_index); // first borrow here
// (safe though since remove_edge will not invalidate the first pointer)
}
}
}
This works and gives us all the benefits of solution A), but using unsafe for something that could easily be done without, if only the language had some syntax for actual partial borrowing, seems a bit ugly to me. On the other hand it may (in some cases) be preferable to solution A), due to its sheer clunkyness...
EDIT: After some thought I realised, that we know this approach to be safe here, only because we know about the implementation. In a different use case the data in question might actually be contained in one single field (say a Map), even if it looks like we are accessing two very distinct kinds of data when calling from outside.
This is why this last approach is rightfully unsafe, as there is no way for the borrow checker to check whether we are actually borrowing different things without exposing the private fields, rendering our effort pointless.
Contrary to my initial belief, this couldn't even really be changed by expanding the language. The reason is that one would still need to expose information about private fields in some way for this to work.
After writing this answer I also found this blog post which goes a bit deeper into possible solutions (and also mentions some advanced techniques that didn't come to my mind, none of which is universally applicable though). If you happen to know another solution, or a way to improve the ones proposed here, please let me know.

What are the use cases of the newly proposed Pin type?

There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}

How can I specify an iterator over traits?

I've been trying to make a websocket client, but one that has tons of options! I thought of using a builder style since the configuration can be stored in a nice way:
let client = Client::new()
.options(5)
.stuff(true)
// now users can store the config before calling build
.build();
I am having trouble creating a function that takes in a list of strings. Of course I have a few options:
fn strings(self, list: &[&str]) -> Self;
fn strings(self, list: Vec<String>) -> Self;
fn strings(self, list: &[&String]) -> Self;
// etc...
I would like to accept generously so I would like to accept &String, &str, and hopefully keys in a HashMap (since this might be used with a large routing table) so I thought I would accept an iterator over items that implement Borrow<str> like so:
fn strings<P, Sp>(self, P)
where P: Iterator<Item = &'p Sp>,
Sp: Borrow<str> + 'p;
A full example is available here.
This was great until I needed to add another optional list of strings (extensions) to the builder.
This meant that if I created a builder without specifying both lists of strings that the compiler would complain that it couldn't infer the type of the Builder, which makes sense. The only reason this is not OK is that both these fields are optional so the user might never know the type of a field it hasn't yet set.
Does anyone have any ideas on how to specify an iterator over traits? Then I wouldn't have to specify the type fully at compile time. Or maybe just a better way to do this entirely?
A pragmatic solution is to simply discard the concrete types of the types and introduce some indirection. We can Box the trait object and store that as a known type:
use std::borrow::Borrow;
struct Builder {
strings: Option<Box<Iterator<Item = Box<Borrow<str>>>>>,
}
impl Builder {
fn new() -> Self {
Builder { strings: None }
}
fn strings<I>(mut self, iter: I) -> Self
where I: IntoIterator + 'static,
I::Item: Borrow<str> + 'static,
{
let i = iter.into_iter().map(|x| Box::new(x) as Box<Borrow<str>>);
self.strings = Some(Box::new(i));
self
}
fn build(self) -> String {
match self.strings {
Some(iter) => {
let mut s = String::new();
for i in iter {
s.push_str((*i).borrow());
}
s
},
None => format!("No strings here!"),
}
}
}
fn main() {
let s =
Builder::new()
.strings(vec!["a", "b"])
.build();
println!("{}", s);
}
Here we convert the input iterator to a boxed iterator of boxed things that implement Borrow. We have to do some gyrations to convert the specific concrete type we have into a conceptually higher level type but that is still concrete.
This remainder doesn't directly answer your question about an iterator of traits, but it provides an alternate solution that I would use.
You have to pick between that might be a bit more optimal and have a worse user experience, or something that might be a bit suboptimal but a nicer user experience.
You are currently storing the iterator in the builder struct:
struct Builder
where I: Iterator
{
things: Option<I>,
}
This requires that the concrete type of I be known in order to instantiate a Builder. Specifically, the size of that type needs to be known in order to allocate enough space. There's nothing around this; if you want to store a generic type, you need to know what type it is.
For the same reasons, you cannot have this standalone statement:
let foo = None;
How much space needs to be allocated for foo? You cannot know until you know what type the Some might hold.
The way I would go would be to not add type parameters for the struct, but have them on the function. This means that the struct has to have a fixed type to store the values. In your example, a String is a good fit:
struct Builder {
strings: Vec<String>,
}
impl Builder {
fn strings<I>(mut self, iter: I) -> Self
where I: IntoIterator,
I::Item: Into<String>,
{
self.strings.extend(iter.into_iter().map(Into::into));
self
}
}
A Vec has very compact storage (it only takes 3 machine-sized values), and doesn't allocate any heap memory when it is empty. For that reason, I wouldn't wrap it in an Option unless you needed to tell 0 items from the absence of a provided value.
If you are just appending each value to one big string, you might as well do that in the strings method. That depends on your application.
You mention that you might be providing a large amount of data, but I'm not sure that holding the iterator until the build call will really help. You are going to pay the cost earlier or later.
If you are going to reuse the builder, then it depends on what is expensive. If iterating is expensive, then doing it once and reusing that for each build call will be more efficient. If holding onto the memory is expensive, then you don't want to have multiple builders or built items around concurrently. Since the builder will transfer ownership of the memory to the new item, there shouldn't be any waste here.

Resources