Why is a immutable pointer to a static immutable variable not Sync? - multithreading

A static global C string (as in this answer) doesn't have the Sync trait.
pub static MY_STRING: &'static *const u8
= "hello" as const *u8;
// TODO: Simple assertion showing it's not Sync ;)
Sync is described as
The precise definition is: a type T is Sync if &T is thread-safe. In other words, there is no possibility of data races when passing &T references between threads.
It seems like this is entirely readonly and has static lifetime, so why isn't it safe to pass a reference?

The chapter Send and Sync in The Rustonomicon describes what it means for a type to be Send or Sync. It mentions that:
raw pointers are neither Send nor Sync (because they have no safety guards).
But that just begs the question; why doesn't *const T implement Sync? Why do the safety guards matter?
Just before that, it says:
Send and Sync are also automatically derived traits. This means that, unlike every other trait, if a type is composed entirely of Send or Sync types, then it is Send or Sync. Almost all primitives are Send and Sync, and as a consequence pretty much all types you'll ever interact with are Send and Sync.
This is the key reason why raw pointers are neither Send nor Sync. If you defined a struct that encapsulates a raw pointer, but only expose it as a &T or &mut T in the struct's API, did you really make sure that your struct respects the contracts of Send and Sync? If raw pointers were Send, then Rc<T> would also be Send by default, so it would have to explicitly opt-out. (In the source, there is in fact an explicit opt-out for Rc<T>, but it's only for documentation purposes, because it's actually redundant.)
[...] they're unsafe traits. This means that they are unsafe to implement, and other unsafe code can assume that they are correctly implemented.
OK, let's recap: they're unsafe to implement, but they're automatically derived. Isn't that a weird combination? Actually, it's not as bad as it sounds. Most primitive types, like u32, are Send and Sync. Simply compounding primitive values into a struct or enum is not enough to disqualify the type for Send or Sync. Therefore, you need a struct or enum with non-Send or non-Sync before you need to write an unsafe impl.
Send and Sync are marker traits, which means they have no methods. Therefore, when a function or type puts a Send or Sync bound on a type parameter, it's relying on the type to respect a particular contract across all of its API. Because of this:
Incorrectly implementing Send or Sync can cause Undefined Behavior.

Related

Is there a difference between Pin<Box<T>> and Box<Pin<T>>?

In Rust, are there any functional differences between Pin<Box<T>> and Box<Pin<T>>? I think that they should behave the same, but I'm not sure.
Pin<Box<T>> is what you want. Box<Pin<T>> will not work at all.
Pin requires its type to be a pointer of some kind. It then prevents you from moving out of this pointer (if the pointee isn't Unpin), by requiring unsafe to access it mutably. In Pin<Box<T>> Box<T> is the pointer. It is common because you can create it safely (as opposed to Pin<&mut T> that without macros can only be created unsafely) because you give the ownership of the Box to it, and thus you cannot access the inner T not through the Pin. Box<Pin<T>>, on the other hand, is useless: it is impossible to create if T does not implement Deref (as Pin's constructors require that, because they are meant to use with pointers) and even if T does, the Box is redundant: you already have a pointer, there is no need to wrap it in Box. In addition, you cannot create an instance of Box<Pin<T>> if the <T as Deref>::Target does not implement Unpin without unsafe code, and there is little benefit in Pin with Unpin types (it can be passed to APIs that require it, such as Future::poll(), but in that case you don't need the Box).

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.
What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.
You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

Does implementing the Sync trait change the compiler output?

If I mark my struct as Sync will the compiler output differ? Will the compiler implement some Mutex-like magic?
struct MyStruct {
data: RefCell<u32>,
}
unsafe impl Sync for MyStruct {}
unsafe impl Send for MyStruct {}
The compiler uses a mechanism named "language items" to reference items (types, traits, etc.) that are defined in a library (usually core) but are used by the compiler, whether that be in code generated by the compiler, for validating the code or for producing specialized error messages.
Send and Sync are defined in the core library. Sync is a language item, but Send isn't. The only reference to Sync I could find in the compiler is where it checks that the type of a static variable implements Sync. (Send and Sync used to be more special to the compiler. Before auto traits were added to the language, they were implemented as "auto traits" explicitly.)
Other than that, the compiler doesn't care about what Send and Sync mean. It's the libraries (specifically, types/functions that are generic over Send/Sync types) that give the traits their meaning.
Neither trait influences what code is emitted by the compiler regarding a particular type. Making a type "thread-safe" is not something that can be done automatically. Consider a struct with many fields: even if the fields are all atomic types, a partially updated struct might not be in a valid state. The compiler doesn't know about the invariants of a particular type; only the programmer knows them. Therefore, it's the programmer's responsibility to make the type thread-safe.

Why does AtomicUsize not implement Send?

std::sync::atomic::AtomicUsize implements Sync which means immutable references are free of data races when shared between multiple threads. Why does AtomicUsize not implement Send? Is there state which is linked to the thread that created the atomic or is this a language design decision relating to the way atomics are intended to be used i.e. via a Arc<_> etc.
It's a trick! AtomicUsize does implement Send:
use std::sync::atomic::AtomicUsize;
fn checker<T>(_: T) where T: Send {}
fn main() {
checker(AtomicUsize::default());
}
In fact, there's even an automated test that ensures this is the case.
Rust 1.26
These auto traits are now documented, thanks to a change made to rustdoc.
Previous versions
The gotcha lies in how Send is implemented:
This trait is automatically derived when the compiler determines it's appropriate.
This means that Rustdoc doesn't know that Send is implemented for a type because most types don't implement it explicitly.
This explains why AtomicPtr<T> shows up in the implementers list: it has a special implementation that ignores the type of T.

When is it appropriate to mark a trait as unsafe, as opposed to marking all the functions in the trait as unsafe?

Saying the same thing in code, when would I pick either of the following examples?
unsafe trait MyCoolTrait {
fn method(&self) -> u8;
}
trait MyCoolTrait {
unsafe fn method(&self) -> u8;
}
The opt-in builtin traits (OIBIT) RFC states:
An unsafe trait is a trait that is unsafe to implement, because it represents some kind of trusted assertion. Note that unsafe traits are perfectly safe to use. Send and Share (note: now called Sync) are examples of unsafe traits: implementing these traits is effectively an assertion that your type is safe for threading.
There's another example of an unsafe trait in the standard library, Searcher. It says:
The trait is marked unsafe because the indices returned by the next() methods are required to lie on valid utf8 boundaries in the haystack. This enables consumers of this trait to slice the haystack without additional runtime checks.
Unfortunately, neither of these paragraphs really help my understanding of when it is correct to mark the entire trait unsafe instead of some or all of the methods.
I've asked about marking a function as unsafe before, but this seems different.
A function is marked unsafe to indicate that it is possible to violate memory safety by calling it. A trait is marked unsafe to indicate that it is possible to violate memory safety by implementing it at all. This is commonly because the trait has invariants that other unsafe code relies on being upheld, and that these invariants cannot be expressed any other way.
In the case of Searcher, the methods themselves should be safe to call. That is, users should not have to worry about whether or not they're using a Searcher correctly; the interface contract says all calls are safe. There's nothing you can do that will cause the methods to violate memory safety.
However, unsafe code will be calling the methods of a Searcher, and such unsafe code will be relying on a given Searcher implementation to return offsets that are on valid UTF-8 code point boundaries. If this assumption is violated, then the unsafe code could end up causing a memory safety violation itself.
To put it another way: the correctness of unsafe code using Searchers depends on every single Searcher implementation also being correct. Or: implementing this trait incorrectly allows for safe code to induce a memory safety violation is unrelated unsafe code.
So why not just mark the methods unsafe? Because they aren't unsafe at all! They don't do anything that could violate memory safety in and of themselves. next_match just scans for and returns an Option<(usize, usize)>. The danger only exists when unsafe code assumes that these usizes are valid indices into the string being searched.
So why not just check the result? Because that'd be slower. The searching code wants to be fast, which means it wants to avoid redundant checks. But those checks can't be expressed in the Searcher interface... so instead, the whole trait is flagged as unsafe to warn anyone implementing it that there are extra conditions not stated or enforced in the code that must be respected.
There's also Send and Sync: implementing those when you shouldn't violates the expectations of (among other things) code that has to deal with threads. The code that lets you create threads is safe, but only so long as Send and Sync are only implemented on types for which they're appropriate.
The rule of thumb will be like this:
Use unsafe fn method() if the method user need to wrap method call in unsafe block.
Use unsafe trait MyTrait if the trait implementor need to unsafe impl MyTrait.
unsafe is a tip to Rust user: unsafe code must be written carefully.
The key point is that unsafe should be used as dual: when an author declare a trait/function as unsafe, the implementor/user need to implement/use it with unsafe.
When function is marked as unsafe, it means the user needs to use the function carefully. The function author is making assumption that the function user must keep.
When trait is marked as unsafe, it means the trait implementor needs to implement carefully. The trait requires implementor keeps certain assumption. But users of the unsafe trait can insouciantly call methods defined in the trait.
For concrete example, unsafe trait Searcher requires all Searcher implementation should return valid utf8 boundary when calling next. And all implementation are marked as unsafe impl Searcher, indicating implementation code might be unsafe. But as a user of Searcher, one can call searcher.next() without wrapping it in unsafe block.

Resources