Dealing with lifetimes in global variables

Dealing with lifetimes in global variables - rust

I am currently writing a board support package for an embedded board and would like to set up a serial output over USB. I am basing my design on the hifive BSP.
The process to do this is in three steps:
Set up the USB bus (UsbBusAllocator, which refers to a UsbPeripheral)
Initialize a SerialPort instance, which refers to UsbBusAllocator
Initialize a UsbDevice instance, which refers to UsbBusAllocator
To make my life simpler, I wrapped the SerialPort and UsbDevice in a SerialWrapper struct:
pub struct SerialWrapper<'a> {
port: SerialPort<'a, Usbd<UsbPeripheral<'a>>>,
dev: UsbDevice<'a, Usbd<UsbPeripheral<'a>>>,
}
impl<'a> SerialWrapper<'a> {
pub fn new(bus: &'a UsbPort<'a>) -> Self {
// create the wrapper ...
}
}
I would like a way make the structure created by the SerialWrapper::new global.
I tried to use:
static mut STDOUT: Option<SerialWrapper<'a>> = None;
but I can't use this as lifetime 'a is not declared.
I thought about using MaybeUninit or PhantomData but both will still need to have SerialWrapper<'a> as type parameter and I will get the same issue.
My goal is to be able to use code similar to this:
struct A;
struct B<'a> {
s: &'a A,
}
static mut STDOUT: Option<B> = None;
fn init_stdout() {
let a = A {};
unsafe {
STDOUT = Some(B {s: &a});
}
}
// access_stdout is the important function
// all the rest can be changed without issue
fn access_stdout() -> Result<(), ()> {
unsafe {
if let Some(_stdout) = STDOUT {
// do stuff is system ready
Ok(())
} else {
// do stuff is system not ready
Err(())
}
}
}
fn main() {
init_stdout();
let _ = access_stdout();
}
Do you have any suggestions on how to proceed?
I don't mind having a solution requiring unsafe code, as long as I can have safe functions to access my serial port.

Short answer: whenever you have a lifetime in the type of a static variable, that lifetime needs to be 'static.
Rust does not allow for dangling references, so if you having anything living for shorter than the static lifetime in a static variable, there's the possibility for dangling. I think your code will need a fair amount of refactoring to satisfy this requirement. I would recommend figuring out a way to store the data you need without references since that will make your life much easier. If it is absolutely imperative that you store references you'll need to figure out a way to leak the data to extend its lifetime to 'static.
I am not terribly familiar with embedded development, and I know that static mut has some use-cases there, however usage of that language feature is pretty universally frowned upon. static muts are wildly unsafe and even bypass some borrow checker mechanisms by allowing multiple mutable references at the same time. If you were to encode this properly in Rust's type system you'd probably want to make it just static STDOUT: SyncWrapperForUnsafeCell<Option<T>>, and then provide a safe interface (that might involve locking) for your wrapper with comments explaining why your current environment makes it safe. However if you think that a static mut is the appropriate option there I trust your judgement.

Related

Are mutable static primitives actually `unsafe` if single-threaded?

I'm developing for a single-core embedded chip. In C & C++ it's common to statically-define mutable values that can be used globally. The Rust equivalent is roughly this:
static mut MY_VALUE: usize = 0;
pub fn set_value(val: usize) {
unsafe { MY_VALUE = val }
}
pub fn get_value() -> usize {
unsafe { MY_VALUE }
}
Now anywhere can call the free functions get_value and set_value.
I think that this should be entirely safe in single-threaded embedded Rust, but I've not been able to find a definitive answer. I'm only interested in types that don't require allocation or destruction (like the primitive in the example here).
The only gotcha I can see is with the compiler or processor reordering accesses in unexpected ways (which could be solves using the volatile access methods), but is that unsafe per se?
Edit:
The book suggests that this is safe so long as we can guarantee no multi-threaded data races (obviously the case here)
With mutable data that is globally accessible, it’s difficult to ensure there are no data races, which is why Rust considers mutable static variables to be unsafe.
The docs are phrased less definitively, suggesting that data races are only one way this can be unsafe but not expanding on other examples
accessing mutable statics can cause undefined behavior in a number of ways, for example due to data races in a multithreaded context
The nomicon suggests that this should be safe so long as you don't somehow dereference a bad pointer.

Be aware as there is no such thing as single-threaded code as long as interrupts are enabled. So even for microcontrollers, mutable statics are unsafe.
If you really can guarantee single-threaded access, your assumption is correct that accessing primitive types should be safe. That's why the Cell type exists, which allows mutability of primitive types with the exception that it is not Sync (meaning it explicitely prevents threaded access).
That said, to create a safe static variable, it needs to implement Sync for exactly the reason mentioned above; which Cell doesn't do, for obvious reasons.
To actually have a mutable global variable with a primitive type without using an unsafe block, I personally would use an Atomic. Atomics do not allocate and are available in the core library, meaning they work on microcontrollers.
use core::sync::atomic::{AtomicUsize, Ordering};
static MY_VALUE: AtomicUsize = AtomicUsize::new(0);
pub fn set_value(val: usize) {
MY_VALUE.store(val, Ordering::Relaxed)
}
pub fn get_value() -> usize {
MY_VALUE.load(Ordering::Relaxed)
}
fn main() {
println!("{}", get_value());
set_value(42);
println!("{}", get_value());
}
Atomics with Relaxed are zero-overhead on almost all architectures.

In this case it's not unsound, but you still should avoid it because it is too easy to misuse it in a way that is UB.
Instead, use a wrapper around UnsafeCell that is Sync:
pub struct SyncCell<T>(UnsafeCell<T>);
unsafe impl<T> Sync for SyncCell<T> {}
impl<T> SyncCell<T> {
pub const fn new(v: T) -> Self { Self(UnsafeCell::new(v)); }
pub unsafe fn set(&self, v: T) { *self.0.get() = v; }
}
impl<T: Copy> SyncCell<T> {
pub unsafe fn get(&self) -> T { *self.0.get() }
}
If you use nightly, you can use SyncUnsafeCell.

Mutable statics are unsafe in general because they circumvent the normal borrow checker rules that enforce either exactly 1 mutable borrow exists or any number of immutable borrows exist (including 0), which allows you to write code which causes undefined behavior. For instance, the following compiles and prints 2 2:
static mut COUNTER: i32 = 0;
fn main() {
unsafe {
let mut_ref1 = &mut COUNTER;
let mut_ref2 = &mut COUNTER;
*mut_ref1 += 1;
*mut_ref2 += 1;
println!("{mut_ref1} {mut_ref2}");
}
}
However we have two mutable references to the same location in memory existing concurrently, which is UB.
I believe the code that you posted there is safe, but I generally would not recommend using static mut. Use an atomic, SyncUnsafeCell/UnsafeCell, a wrapper around a Cell that implements Sync which is safe since your environment is single-threaded, or honestly just about anything else. static mut is wildly unsafe and its use is highly discouraged.

In order to sidestep the issue of exactly how mutable statics can be used safely in single-threaded code, another option is to use thread-local storage:
use std::cell::Cell;
thread_local! (static MY_VALUE: Cell<usize> = {
Cell::new(0)
});
pub fn set_value(val: usize) {
MY_VALUE.with(|cell| cell.set(val))
}
pub fn get_value() -> usize {
MY_VALUE.with(|cell| cell.get())
}

having two struct reference each other - rust

I'm quite new to Rust programming, and I'm trying to convert a code that I had in js to Rust.
A plain concept of it is as below:
fn main() {
let mut ds=DataSource::new();
let mut pp =Processor::new(&mut ds);
}
struct DataSource {
st2r: Option<&Processor>,
}
struct Processor {
st1r: &DataSource,
}
impl DataSource {
pub fn new() -> Self {
DataSource {
st2r: None,
}
}
}
impl Processor {
pub fn new(ds: &mut DataSource) -> Self {
let pp = Processor {
st1r: ds,
};
ds.st2r = Some(&pp);
pp
}
}
As you can see I have two main modules in my system that are inter-connected to each other and I need a reference of each in another.
Well, this code would complain about lifetimes and such stuff, of course 😑. So I started throwing lifetime specifiers around like a madman and even after all that, it still complains that in "Processor::new" I can't return something that has been borrowed. Legit. But I can't find any solution around it! No matter how I try to handle the referencing of each other, it ends with this borrowing error.
So, can anyone point out a solution for this situation? Is my app's structure not valid in Rust and I should do it in another way? or there's a trick to this that my inexperienced mind can't find?
Thanks.

What you're trying to do can't be expressed with references and lifetimes because:
The DataSource must live longer than the Processor so that pp.st1r is guaranteed to be valid,
and the Processor must live longer than the DataSource so that ds.st2r is guaranteed to be valid. You might think that since ds.st2r is an Option and since the None variant doesn't contain a reference this allows a DataSource with a None value in st2r to outlive any Processors, but unfortunately the compiler can't know at compile-time whether st2r contains Some value, and therefore must assume it does.
Your problem is compounded by the fact that you need a mutable reference to the DataSource so that you can set its st2r field at a time when you also have an immutable outstanding reference inside the Processor, which Rust won't allow.
You can make your code work by switching to dynamic lifetime and mutability tracking using Rc (for dynamic lifetime tracking) and RefCell (for dynamic mutability tracking):
use std::cell::RefCell;
use std::rc::{ Rc, Weak };
fn main() {
let ds = Rc::new (RefCell::new (DataSource::new()));
let pp = Processor::new (Rc::clone (&ds));
}
struct DataSource {
st2r: Weak<Processor>,
}
struct Processor {
st1r: Rc<RefCell<DataSource>>,
}
impl DataSource {
pub fn new() -> Self {
DataSource {
st2r: Weak::new(),
}
}
}
impl Processor {
pub fn new(ds: Rc::<RefCell::<DataSource>>) -> Rc<Self> {
let pp = Rc::new (Processor {
st1r: ds,
});
pp.st1r.borrow_mut().st2r = Rc::downgrade (&pp);
pp
}
}
Playground
Note that I've replaced your Option<&Processor> with a Weak<Processor>. It would be possible to use an Option<Rc<Processor>> but this would risk leaking memory if you dropped all references to DataSource without setting st2r to None first. The Weak<Processor> behaves more or less like an Option<Rc<Processor>> that is set to None automatically when all other references are dropped, ensuring that memory will be freed properly.

Ensuring value lives for its entire scope

I have a type (specifically CFData from core-foundation), whose memory is managed by C APIs and that I need to pass and receive from C functions as a *c_void. For simplicity, consider the following struct:
struct Data {
ptr: *mut ffi::c_void,
}
impl Data {
pub fn new() -> Self {
// allocate memory via an unsafe C-API
Self {
ptr: std::ptr::null(), // Just so this compiles.
}
}
pub fn to_raw(&self) -> *const ffi::c_void {
self.ptr
}
}
impl Drop for Data {
fn drop(&mut self) {
unsafe {
// Free memory via a C-API
}
}
}
Its interface is safe, including to_raw(), since it only returns a raw pointer. It doesn't dereference it. And the caller doesn't dereference it. It's just used in a callback.
pub extern "C" fn called_from_C_ok(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// Do things to compute Data.
// This is actually async code, which is why there's a completion handler.
on_complete(data.to_raw()); // data survives through the function call
}
This is fine. Data is safe to manipulate, and (I strongly believe) Rust promises that data will live until the end of the on_complete function call.
On the other hand, this is not ok:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
In my code, I made this mistake and it started crashing. It's easy to see why and it's easy to fix. But it's subtle, and it's easy for a future developer (me) to modify the ok version into the broken version without realizing the problem (and it may not always crash).
What I'd like to do is to ensure data lives as long as ptr. In Swift, I'd do this with:
withExtendedLifetime(&data) { data in
// ...data cannot be dropped until this closure ends...
}
Is there a similar construct in Rust that explicitly marks the minimum lifetime for a variable to a scope (that the optimizer may not reorder), even if it's not directly accessed? (I'm sure it's trivial to build a custom with_extended_lifetime in Rust, but I'm looking for a more standard solution so that it will be obvious to other developers what's going on).
Playground
I do believe the following "works" but I'm not sure how flexible it is, or if it's just replacing a more standard solution:
fn with_extended_lifetime<T, U, F>(value: &T, f: F) -> U
where
F: Fn(&T) -> U,
{
f(value)
}
with_extended_lifetime(&data, |data| {
let ptr = data.to_raw();
on_complete(ptr)
});

The optimizer is not allowed to change when a value is dropped. If you assign a value to a variable (and that value is not then moved elsewhere or overwritten by assignment), it will always be dropped at the end of the block, not earlier.
You say that this code is incorrect:
pub extern "C" fn called_from_C_broken(on_complete: extern "C" fn(*const ffi::c_void)) {
let data = Data::new();
// ...
let ptr = data.to_raw(); // data can be dropped at this point, so ptr is dangling.
on_complete(ptr); // This may crash when the receiver dereferences it.
}
but in fact data may not be dropped at that point, and this code is sound. What you may be confusing this with is the mistake of not assigning the value to a variable:
let ptr = Data::new().to_raw();
on_complete(ptr);
In this case, the pointer is dangling, because the result of Data::new() is stored in a temporary variable within the statement, which is dropped at the end of the statement, not a local variable, which is dropped at the end of the block.
If you want to adopt a programming style which makes explicit when values are dropped, the usual pattern is to use the standard drop() function to mark the exact time of drop:
let data = Data::new();
...
on_complete(data.to_raw());
drop(data); // We have stopped using the raw pointer now
(Note that drop() is not magic: it is simply a function which takes one argument and does nothing with it other than dropping. Its presence in the standard library is to give a name to this pattern, not to provide special functionality.)
However, if you want to, there isn't anything wrong with using your with_extended_lifetime (other than nonstandard style and arguably a misleading name) if you want to make the code structure even more strongly indicate the scope of the value. One nitpick: the function parameter should be FnOnce, not Fn, for maximum generality (this allows it to be passed functions that can't be called more than once).

Other than explicitly dropping as the other answer mentions, there is another way to help prevent these types of accidental drops: use a wrapper around a raw pointer that has lifetime information.
use std::marker::PhantomData;
#[repr(transparent)]
struct PtrWithLifetime<'a>{
ptr: *mut ffi::c_void,
_data: PhantomData<&'a ffi::c_void>,
}
impl Data {
fn to_raw(&self) -> PtrWithLife<'_>{
PtrWithLifetime{
ptr: self.ptr,
_data: PhantomData,
}
}
}
The #[repr(transparent)] guarantees that PtrWithLife is stored in memory the same as *const ffi::c_void is, so you can adjust the declaration of on_complete to
fn called_from_c(on_complete: extern "C" fn(PtrWithLifetime<'_>)){
//stuff
}
without causing any major inconvenience to any downstream users, especially since the ffi bindings can be adjusted in a similar fashion.

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.

No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...

The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.

What are the use cases of the newly proposed Pin type?

There is a new Pin type in unstable Rust and the RFC is already merged. It is said to be kind of a game changer when it comes to passing references, but I am not sure how and when one should use it.
Can anyone explain it in layman's terms?

What is pinning ?
In programming, pinning X means instructing X not to move.
For example:
Pinning a thread to a CPU core, to ensure it always executes on the same CPU,
Pinning an object in memory, to prevent a Garbage Collector to move it (in C# for example).
What is the Pin type about?
The Pin type's purpose is to pin an object in memory.
It enables taking the address of an object and having a guarantee that this address will remain valid for as long as the instance of Pin is alive.
What are the usecases?
The primary usecase, for which it was developed, is supporting Generators.
The idea of generators is to write a simple function, with yield, and have the compiler automatically translate this function into a state machine. The state that the generator carries around is the "stack" variables that need to be preserved from one invocation to another.
The key difficulty of Generators that Pin is designed to fix is that Generators may end up storing a reference to one of their own data members (after all, you can create references to stack values) or a reference to an object ultimately owned by their own data members (for example, a &T obtained from a Box<T>).
This is a subcase of self-referential structs, which until now required custom libraries (and lots of unsafe). The problem of self-referential structs is that if the struct move, the reference it contains still points to the old memory.
Pin apparently solves this years-old issue of Rust. As a library type. It creates the extra guarantee that as long as Pin exist the pinned value cannot be moved.
The usage, therefore, is to first create the struct you need, return it/move it at will, and then when you are satisfied with its place in memory initialize the pinned references.

One of the possible uses of the Pin type is self-referencing objects; an article by ralfj provides an example of a SelfReferential struct which would be very complicated without it:
use std::ptr;
use std::pin::Pin;
use std::marker::PhantomPinned;
struct SelfReferential {
data: i32,
self_ref: *const i32,
_pin: PhantomPinned,
}
impl SelfReferential {
fn new() -> SelfReferential {
SelfReferential { data: 42, self_ref: ptr::null(), _pin: PhantomPinned }
}
fn init(self: Pin<&mut Self>) {
let this : &mut Self = unsafe { self.get_unchecked_mut() };
// Set up self_ref to point to this.data.
this.self_ref = &mut this.data as *const i32;
}
fn read_ref(self: Pin<&Self>) -> Option<i32> {
let this : &Self= self.get_ref();
// Dereference self_ref if it is non-NULL.
if this.self_ref == ptr::null() {
None
} else {
Some(unsafe { *this.self_ref })
}
}
}
fn main() {
let mut data: Pin<Box<SelfReferential>> = Box::new(SelfReferential::new()).into();
data.as_mut().init();
println!("{:?}", data.as_ref().read_ref()); // prints Some(42)
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Dealing with lifetimes in global variables - rust

Related

Are mutable static primitives actually `unsafe` if single-threaded?

having two struct reference each other - rust

Ensuring value lives for its entire scope

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

What are the use cases of the newly proposed Pin type?

Categories

Resources