Does implementing the Sync trait change the compiler output? - rust

If I mark my struct as Sync will the compiler output differ? Will the compiler implement some Mutex-like magic?
struct MyStruct {
data: RefCell<u32>,
}
unsafe impl Sync for MyStruct {}
unsafe impl Send for MyStruct {}

The compiler uses a mechanism named "language items" to reference items (types, traits, etc.) that are defined in a library (usually core) but are used by the compiler, whether that be in code generated by the compiler, for validating the code or for producing specialized error messages.
Send and Sync are defined in the core library. Sync is a language item, but Send isn't. The only reference to Sync I could find in the compiler is where it checks that the type of a static variable implements Sync. (Send and Sync used to be more special to the compiler. Before auto traits were added to the language, they were implemented as "auto traits" explicitly.)
Other than that, the compiler doesn't care about what Send and Sync mean. It's the libraries (specifically, types/functions that are generic over Send/Sync types) that give the traits their meaning.
Neither trait influences what code is emitted by the compiler regarding a particular type. Making a type "thread-safe" is not something that can be done automatically. Consider a struct with many fields: even if the fields are all atomic types, a partially updated struct might not be in a valid state. The compiler doesn't know about the invariants of a particular type; only the programmer knows them. Therefore, it's the programmer's responsibility to make the type thread-safe.

Related

Why do traits in Rust require that no method have any type arguments to be object safe? [duplicate]

This question already has answers here:
Why are trait methods with generic type parameters object-unsafe?
(2 answers)
Closed 1 year ago.
Is this requirement really necessary for object safety or is it just an arbitrary limitation, enacted to make the compiler implementation simpler?
A method with type arguments is just a template for constructing multiple distinct methods with concrete types. It is known at compile time, which variants of the method are used. Therefore, in the context of a program, a typed method has the semantics of a finite collection of non-typed methods.
I would like to see if there are any mistakes in this reasoning.
I will take this opportunity to present withoutboat's nomenclature of Handshaking patterns, a set of ideas to reason about the decomposition of a functionality into two interconnected traits:
you want any type which implements trait Alpha to be composable with any type which implements trait Omega…
The example given is for serialization (although other use cases apply): a trait Serialize for types the values of which can be serialized (e.g. a data record type); and Serializer for types implementing a serialization format (e.g. a JSON serializer).
When the types of both can be statically inferred, designing the traits with the static handshake is ideal. The compiler will create only the necessary functions monomorphized against the types S needed by the program, while also providing the most room for optimizations.
trait Serialize {
fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
where S: Serializer;
}
trait Serializer {
//...
fn serialize_map_value<S>(&mut self, state: &mut Self::MapState, value: &S)
-> Result<(), Self::Error>
where S: Serialize;
fn serialize_seq_elt<S>(&mut self, state: &mut Self::SeqState, elt: &S)
-> Result<(), Self::Error>;
where S: Serialize;
//...
}
However, it is established that these traits cannot do dynamic dispatching. This is because once the concrete type is erased from the receiving type, that trait object is bound to a fixed table of its trait implementation, one entry per method. With this design, the compiler is unable to reason with a method containing type parameters, because it cannot monomorphize over that implementation at compile time.
A method with type arguments is just a template for constructing multiple distinct methods with concrete types. It is known at compile time, which variants of the method are used. Therefore, in the context of a program, a typed method has the semantics of a finite collection of non-typed methods.
One may be led to think that all trait implementations available are known, and therefore one could revamp the concept of a trait object to create a virtual table with multiple "layers" for a generic method, thus being able to do a form of one-sided monomorphization of that trait object. However, this does not account for two things:
The number of implementations can be huge. Just look, for example, at how many types implement Read and Write in the standard library. The number of monomorphized implementations that would have to be made present in the binary would be the product of all known implementations against the known parameter types of a given call. In the example above, it is particularly unwieldy: serializing dynamic data records to JSON and TOML would mean that there would have to be Serialize.serialize method implementations for both JSON and TOML, for each serializable type, regardless of how many of these types are effectively serialized in practice. This without accounting the other side of the handshake.
This expansion can only be done when all possible implementations are known at compile time, which is not necessarily the case. While not entirely common, it is currently possible for a trait object to be created from a dynamically linked shared object. In this case, there is never a chance to expand the method calls of that trait object against the target compilation item. With this in mind, the virtual function table created by a trait implementation is expected to be independent from the existence of other types and from how it is used.
To conclude: This is a conceptual limitation that actually makes sense when digging deeper. It is definitely not arbitrary or applied lightly. Generic method calls in trait objects is too unlikely to ever be supported, and so consumers should instead rely on employing the right interface design for the task. Thinking of handshake patterns is one possible way to mind-map these designs.
See also:
What is the cited problem with using generic type parameters in trait objects?
The Rust Programming Language, section 17.2: Object Safety Is Required for Trait Objects

Differences generic trait-bounded method vs 'direct' trait method

I have this code:
fn main() {
let p = Person;
let r = &p as &dyn Eatable;
Consumer::consume(r);
// Compile error
Consumer::consume_generic(r);
}
trait Eatable {}
struct Person;
impl Eatable for Person {}
struct Consumer;
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable>(eatable: &T) {}
}
Error:
the size for values of type dyn Eatable cannot be known at
compilation time
I think it is strange. I have a method that literally takes a dyn Eatable and compiles fine, so that method knows somehow the size of Eatable. The generic method (consume_generic) will properly compile down for every used type for performance and the consume method will not.
So a few questions arise: why the compiler error? Are there things inside the body of the methods in which I can do something which I can not do in the other method? When should I prefer the one over the other?
Sidenote: I asked this question for the language Swift as well: Differences generic protocol type parameter vs direct protocol type. In Swift I get the same compile error but the underlying error is different: protocols/traits do not conform to themselves (because Swift protocols can holds initializers, static things etc. which makes it harder to generically reference them). I also tried it in Java, I believe the generic type is erased and it makes absolutely no difference.
The problem is not with the functions themselves, but with the trait bounds on types.
Every generic types in Rust has an implicit Sized bound: since this is correct in the majority of cases, it was decided not to force the developer to write this out every time. But, if you are using this type only behind some kind of reference, as you do here, you may want to lift this restriction by specifying T: ?Sized. If you add this, your code will compile fine:
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable + ?Sized>(eatable: &T) {}
}
Playground as a proof
As for the other questions, the main difference is in static vs dynamic dispatch.
When you use the generic function (or the semantically equivalent impl Trait syntax), the function calls are dispatched statically. That is, for every type of argument you pass to the function, compiler generates the definition independently of others. This will likely result in more optimized code in most cases, but the drawbacks are possibly larger binary size and some limitations in API (e.g. you can't easily create a heterogeneous collection this way).
When you use dyn Trait syntax, you opt in for dynamic dispatch. The necessary data will be stored into the table attached to trait object, and the correct implementation for every trait method will be chosen at runtime. The consumer, however, needs to be compiled only once. This is usually slower, both due to the indirection and to the fact that individual optimizations are impossible, but more flexible.
As for the recommendations (note that this is an opinion, not the fact) - I'd say it's better to stick to generics whenever possible and only change it to trait objects if the goal is impossible to achieve otherwise.

Can #[inline] be used in both trait method declarations and implementations?

I have a trait with some small methods, which are generally implemented as one-line wrappers around other methods that the implementing structs have. If I want to make sure that the trait method is inlined, should I place #[inline(always)] inside the trait definition, or inside the impl for each struct? I'd prefer to simply put it in the trait definition, but as far as I can tell that doesn't work.
What does inline mean?
When a compiler inlines a call, it copies the body of the function at the call site. Essentially, it's as if the code had been copy/pasted at each call site where it's inlined.
What does #[inline(always)] mean?
This instructs the compiler to perform inlining, always.
Normally, the compiler performs inlining when:
the body of the function is known
the set of heuristics estimate that this is a good trade-off (it might not be, though) which notably depends on the size of the function body
Why can I not specify #[inline(always)] on a trait method?
Because there is no body.
This may sounds trite, I know, however this is nonetheless true.
In Rust, traits may be used in two ways:
as bounds, for generic parameters
as runtime interfaces, aka trait objects
When used as a trait object, there is literally no body: the function to be called is determined at runtime!
Now, there are specific optimizations (devirtualizations) where the compiler attempts to divine or track the actual dynamic type of variables to be able to avoid the dynamic dispatch. I've even seen partial devirtualization in GCC where the compiler computes a likeliness of each type and creates an if ladder for the sufficiently likely one (if A { A::call(x); } else if B { B::call(x); } else { x.call(); }). However those are not guaranteed to succeed, of course.
So, what would be the semantics of #[inline(always)] on a virtual call? Should the compiler just ignore the attribute silently (uh!)?
It seems to me that what you are looking for is a new attribute (require(inline(always))?) to enforce specific constraints on the implementations of trait methods.
As far as I know, this does not exist yet.

Why does AtomicUsize not implement Send?

std::sync::atomic::AtomicUsize implements Sync which means immutable references are free of data races when shared between multiple threads. Why does AtomicUsize not implement Send? Is there state which is linked to the thread that created the atomic or is this a language design decision relating to the way atomics are intended to be used i.e. via a Arc<_> etc.
It's a trick! AtomicUsize does implement Send:
use std::sync::atomic::AtomicUsize;
fn checker<T>(_: T) where T: Send {}
fn main() {
checker(AtomicUsize::default());
}
In fact, there's even an automated test that ensures this is the case.
Rust 1.26
These auto traits are now documented, thanks to a change made to rustdoc.
Previous versions
The gotcha lies in how Send is implemented:
This trait is automatically derived when the compiler determines it's appropriate.
This means that Rustdoc doesn't know that Send is implemented for a type because most types don't implement it explicitly.
This explains why AtomicPtr<T> shows up in the implementers list: it has a special implementation that ignores the type of T.

Why is a immutable pointer to a static immutable variable not Sync?

A static global C string (as in this answer) doesn't have the Sync trait.
pub static MY_STRING: &'static *const u8
= "hello" as const *u8;
// TODO: Simple assertion showing it's not Sync ;)
Sync is described as
The precise definition is: a type T is Sync if &T is thread-safe. In other words, there is no possibility of data races when passing &T references between threads.
It seems like this is entirely readonly and has static lifetime, so why isn't it safe to pass a reference?
The chapter Send and Sync in The Rustonomicon describes what it means for a type to be Send or Sync. It mentions that:
raw pointers are neither Send nor Sync (because they have no safety guards).
But that just begs the question; why doesn't *const T implement Sync? Why do the safety guards matter?
Just before that, it says:
Send and Sync are also automatically derived traits. This means that, unlike every other trait, if a type is composed entirely of Send or Sync types, then it is Send or Sync. Almost all primitives are Send and Sync, and as a consequence pretty much all types you'll ever interact with are Send and Sync.
This is the key reason why raw pointers are neither Send nor Sync. If you defined a struct that encapsulates a raw pointer, but only expose it as a &T or &mut T in the struct's API, did you really make sure that your struct respects the contracts of Send and Sync? If raw pointers were Send, then Rc<T> would also be Send by default, so it would have to explicitly opt-out. (In the source, there is in fact an explicit opt-out for Rc<T>, but it's only for documentation purposes, because it's actually redundant.)
[...] they're unsafe traits. This means that they are unsafe to implement, and other unsafe code can assume that they are correctly implemented.
OK, let's recap: they're unsafe to implement, but they're automatically derived. Isn't that a weird combination? Actually, it's not as bad as it sounds. Most primitive types, like u32, are Send and Sync. Simply compounding primitive values into a struct or enum is not enough to disqualify the type for Send or Sync. Therefore, you need a struct or enum with non-Send or non-Sync before you need to write an unsafe impl.
Send and Sync are marker traits, which means they have no methods. Therefore, when a function or type puts a Send or Sync bound on a type parameter, it's relying on the type to respect a particular contract across all of its API. Because of this:
Incorrectly implementing Send or Sync can cause Undefined Behavior.

Resources