I'm writing some code that shares buffers between Rust and the Linux kernel. After the buffer is registered, its memory location should never move. Since the API takes a *mut u8 (actually a libc::iovec) but doesn't enforce the fixed memory location constraint, it works fine if I represent the buffer as RefCell<Vec<u8>>, Arc<RefCell<Vec<u8>>>, or Arc<Mutex<Vec<u8>>>. (But the Rust code actually never writes to the buffers.)
Does Pin offer additional safety against data moving in this use case? I think it may not, since one of the major risks is calling resize() and growing the vector, but I do intend to call resize() without growing the vector. The code works, but I'm interested in writing this the most correct way.
I don't really know about Pin but it is mostly useful for self-referential structs and coroutines AFAIK.
For you case, I would just create a struct which enforces invariant about data movement. E.g.:
use std::ops::{Deref, DerefMut};
struct MyData{
data: Vec<u8>,
}
impl MyData{
fn with_capacity(cap: usize)->Self{
Self{data: Vec::with_capacity(cap)}
}
fn resize(&mut self, new_len: usize, val_to_set: u8){
let old_ptr = self.data.as_ptr();
// Enforce invariant that memory never moves.
assert!(new_len <= self.data.capacity());
self.data.resize(new_len, val_to_set);
debug_assert_eq!(old_ptr, self.data.as_ptr());
}
}
// To get access to bytes
impl Deref for MyData{
type Target = [u8];
fn deref(&self)->&[u8]{
&*self.data
}
}
impl DerefMut for MyData{
fn deref_mut(&mut self)->&mut[u8]{
&mut *self.data
}
}
This struct enforces that bytes never move without involving Pin.
And you can wrap it into RefCell.
Related
I'm developing for a single-core embedded chip. In C & C++ it's common to statically-define mutable values that can be used globally. The Rust equivalent is roughly this:
static mut MY_VALUE: usize = 0;
pub fn set_value(val: usize) {
unsafe { MY_VALUE = val }
}
pub fn get_value() -> usize {
unsafe { MY_VALUE }
}
Now anywhere can call the free functions get_value and set_value.
I think that this should be entirely safe in single-threaded embedded Rust, but I've not been able to find a definitive answer. I'm only interested in types that don't require allocation or destruction (like the primitive in the example here).
The only gotcha I can see is with the compiler or processor reordering accesses in unexpected ways (which could be solves using the volatile access methods), but is that unsafe per se?
Edit:
The book suggests that this is safe so long as we can guarantee no multi-threaded data races (obviously the case here)
With mutable data that is globally accessible, it’s difficult to ensure there are no data races, which is why Rust considers mutable static variables to be unsafe.
The docs are phrased less definitively, suggesting that data races are only one way this can be unsafe but not expanding on other examples
accessing mutable statics can cause undefined behavior in a number of ways, for example due to data races in a multithreaded context
The nomicon suggests that this should be safe so long as you don't somehow dereference a bad pointer.
Be aware as there is no such thing as single-threaded code as long as interrupts are enabled. So even for microcontrollers, mutable statics are unsafe.
If you really can guarantee single-threaded access, your assumption is correct that accessing primitive types should be safe. That's why the Cell type exists, which allows mutability of primitive types with the exception that it is not Sync (meaning it explicitely prevents threaded access).
That said, to create a safe static variable, it needs to implement Sync for exactly the reason mentioned above; which Cell doesn't do, for obvious reasons.
To actually have a mutable global variable with a primitive type without using an unsafe block, I personally would use an Atomic. Atomics do not allocate and are available in the core library, meaning they work on microcontrollers.
use core::sync::atomic::{AtomicUsize, Ordering};
static MY_VALUE: AtomicUsize = AtomicUsize::new(0);
pub fn set_value(val: usize) {
MY_VALUE.store(val, Ordering::Relaxed)
}
pub fn get_value() -> usize {
MY_VALUE.load(Ordering::Relaxed)
}
fn main() {
println!("{}", get_value());
set_value(42);
println!("{}", get_value());
}
Atomics with Relaxed are zero-overhead on almost all architectures.
In this case it's not unsound, but you still should avoid it because it is too easy to misuse it in a way that is UB.
Instead, use a wrapper around UnsafeCell that is Sync:
pub struct SyncCell<T>(UnsafeCell<T>);
unsafe impl<T> Sync for SyncCell<T> {}
impl<T> SyncCell<T> {
pub const fn new(v: T) -> Self { Self(UnsafeCell::new(v)); }
pub unsafe fn set(&self, v: T) { *self.0.get() = v; }
}
impl<T: Copy> SyncCell<T> {
pub unsafe fn get(&self) -> T { *self.0.get() }
}
If you use nightly, you can use SyncUnsafeCell.
Mutable statics are unsafe in general because they circumvent the normal borrow checker rules that enforce either exactly 1 mutable borrow exists or any number of immutable borrows exist (including 0), which allows you to write code which causes undefined behavior. For instance, the following compiles and prints 2 2:
static mut COUNTER: i32 = 0;
fn main() {
unsafe {
let mut_ref1 = &mut COUNTER;
let mut_ref2 = &mut COUNTER;
*mut_ref1 += 1;
*mut_ref2 += 1;
println!("{mut_ref1} {mut_ref2}");
}
}
However we have two mutable references to the same location in memory existing concurrently, which is UB.
I believe the code that you posted there is safe, but I generally would not recommend using static mut. Use an atomic, SyncUnsafeCell/UnsafeCell, a wrapper around a Cell that implements Sync which is safe since your environment is single-threaded, or honestly just about anything else. static mut is wildly unsafe and its use is highly discouraged.
In order to sidestep the issue of exactly how mutable statics can be used safely in single-threaded code, another option is to use thread-local storage:
use std::cell::Cell;
thread_local! (static MY_VALUE: Cell<usize> = {
Cell::new(0)
});
pub fn set_value(val: usize) {
MY_VALUE.with(|cell| cell.set(val))
}
pub fn get_value() -> usize {
MY_VALUE.with(|cell| cell.get())
}
I'm considering making Rust my primary development language instead of Go, so I've been reading the docs.
One thing that my work requires a lot of is reading and writing from multiple threads into a custom database that is stored in memory as a single massive array and could be 32GB in size. The database functions are designed to avoid race conditions, and mutexes or atomic primitives are used where necessary.
The Rust doc implies that an array can only be either mutable (writeable) on a single thread, or non-mutable (read only) by many threads, and cannot be writable on one and readable on another simultaneously. How then can an in-memory database be used...? It doesn't make sense!
Do I have this wrong?
Forgive me that I can't give any specific Rust example, because I'm still learning the Rust syntax, and to be honest I need to know the answer to this question before using all my time learning a language I will be unable to use.
There is an unsafe way to do it, namely using the UnsafeCell, which returns mutable raw pointers to its interior data. These are not tracked by the Borrow-Checker and so you have to make sure the invariants are upheld
pub struct UnsafeVec<T> {
data: UnsafeCell<Vec<T>>
}
impl<T> UnsafeVec<T> {
pub fn new() -> Self {
UnsafeVec { data: UnsafeCell::new(Vec::new()) }
}
pub fn push(&mut self, arg: T) {
self.data.get_mut().push(arg)
}
pub unsafe fn index_mut(&self, index: usize) -> &mut T {
&mut (*self.data.get())[index]
}
}
unsafe impl<T> Sync for UnsafeVec<T> {}
unsafe impl<T> Send for UnsafeVec<T> {}
which allows you to write
fn main() {
let mut unsafe_vec = UnsafeVec::<i32>::new();
unsafe_vec.push(15);
unsafe {
*unsafe_vec.index_mut(0) += 1;
}
}
The method index_mut allows to modify the interior vector with an immutable reference.
The Sync and Send traits signal the compiler that the type can be safely shared across threads, which is only true if you prevent possible data races manually!.
Again, this is an unsafe option that requires you to uphold the invariants yourself.
I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.
No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...
The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.
There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?
What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.
One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}
I have the following problem: I have a have a data structure that is parsed from a buffer and contains some references into this buffer, so the parsing function looks something like
fn parse_bar<'a>(buf: &'a [u8]) -> Bar<'a>
So far, so good. However, to avoid certain lifetime issues I'd like to put the data structure and the underlying buffer into a struct as follows:
struct BarWithBuf<'a> {bar: Bar<'a>, buf: Box<[u8]>}
// not even sure if these lifetime annotations here make sense,
// but it won't compile unless I add some lifetime to Bar
However, now I don't know how to actually construct a BarWithBuf value.
fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
let my_bar = parse_bar(&*buf);
BarWithBuf {buf: buf, bar: my_bar}
}
doesn't work, since buf is moved in the construction of the BarWithBuf value, but we borrowed it for parsing.
I feel like it should be possible to do something along the lines of
fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
let mut bwb = BarWithBuf {buf: buf};
bwb.bar = parse_bar(&*bwb.buf);
bwb
}
to avoid moving the buffer after parsing the Bar, but I can't do that because the whole BarWithBuf struct has to be initalised in one go.
Now I suspect that I could use unsafe code to partially construct the struct, but I'd rather not do that.
What would be the best way to solve this problem? Do I need unsafe code? If I do, would it be safe do to this here? Or am I completely on the wrong track here and there is a better way to tie a data structure and its underlying buffer together?
I think you're right in that it's not possible to do this without unsafe code. I would consider the following two options:
Change the reference in Bar to an index. The contents of the box won't be protected by a borrow, so the index might become invalid if you're not careful. However, an index might convey the meaning of the reference in a clearer way.
Move Box<[u8]> into Bar, and add a function buf() -> &[u8] to the implementation of Bar; instead of references, store indices in Bar. Now Bar is the owner of the buffer, so it can control its modification and keep the indices valid (thereby avoiding the problem of option #1).
As per DK's suggestion below, store indices in BarWithBuf (or in a helper struct BarInternal) and add a function fn bar(&self) -> Bar to the implementation of BarWithBuf, which constructs a Bar on-the-fly.
Which of these options is the most appropriate one depends on the actual problem context. I agree that some form of "member-by-member construction" of structs would be immensely helpful in Rust.
Here's an approach that will work through a little bit of unsafe code. This approach requires that you are okay with putting the referred-to thing (here, your [u8]) on the heap, so it won't work for direct reference of a sibling field.
Let's start with a toy Bar<'a> implementation:
struct Bar<'a> {
refs: Vec<&'a [u8]>,
}
impl<'a> Bar<'a> {
pub fn parse(src: &'a [u8]) -> Self {
// placeholder for actually parsing goes here
Self { refs: vec![src] }
}
}
We'll make BarWithBuf that uses a Bar<'static>, as 'static is the only lifetime with an an accessible name. The buffer we store things in can be anything that doesn't move the target data around on us. I'm going to go with a Vec<u8>, but Box, Pin, whatever will work fine.
struct BarWithBuf {
buf: Vec<u8>,
bar: Bar<'static>,
}
The implementation requires a tiny bit of unsafe code.
impl BarWithBuf {
pub fn new(buf: Vec<u8>) -> Self {
// The `&'static [u8]` is inferred, but writing it here for demo
let buf_slice: &'static [u8] = unsafe {
// Going through a pointer is a "good" way to get around lifetime checks
std::slice::from_raw_parts(&buf[0], buf.len())
};
let bar = Bar::parse(buf_slice);
Self { buf, bar }
}
/// Access to Bar should always come through this function.
pub fn bar(&self) -> &Bar {
&self.bar
}
}
The BarWithBuf::bar is an important function to re-associate the proper lifetimes to the references. Rust's lifetime elision rules make the function equivalent to pub fn bar<'a>(&'a self) -> &'a Bar<'a>, which turns out to be exactly what we want. The lifetime of the slices in BarWithBuf::bar::refs are tied to the lifetime of BarWithBuf.
WARNING: You have to be very careful with your implementation here. You cannot make #[derive(Clone)] for BarWithBuf, since the default clone implementation will clone buf, but the elements of bar.refs will still point to the original. It is only one line of unsafe code, but the safety is still off in the "safe" bits.
For larger bits of self-referencing structures, there's the ouroboros crate, which wraps up a lot of unsafe bits for you. The techniques are similar to the one I described above, but they live behind macros, which is a more pleasant experience if you find yourself making a number of self-references.