The doc says
The spawned task may execute on the current thread, or it may be sent to a different thread to be executed.
So if you share a variable which is !Sync between two async blocks and spawn them, they may be executed simultaneously, which induces race condition.
How does tokio deal with it?
Spawning an async task (future) requires that it is Send. If two futures share access to something that is not Sync, then the future itself will not be Send. This is not something specific to Tokio or to async; notice that std::thread::spawn has the same requirement of Send + 'static on the spawned function.
The way this works is that anything that allows access to a value from multiple threads (such as Arc or &) will have a conditional implementation for Send like this one (from the standard library code for Arc):
unsafe impl<T: ?Sized + Sync + Send> Send for Arc<T> {}
This says: An Arc<T> will not be Send unless its contents T are Sync.
Anything you can find that will allow you to “share a variable” will similarly enforce that the data that is shared must be Sync.
Related
I'llI was thinking about the rust async infrastructure and at the heart of the API lies the Future trait which is:
pub trait Future {
type Output;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}
According to the docs
The core method of future, poll, attempts to resolve the future into a final value. This method does not block if the value is not ready. Instead, the current task is scheduled to be woken up when it’s possible to make further progress by polling again. The context passed to the poll method can provide a Waker, which is a handle for waking up the current task.
This implies that the Context value passed to .poll() (particularly the waker) needs some way to refer to the pinned future &mut self in order to wake it up. However &mut implies that the reference is not aliased. Am I misunderstanding something or is this a "special case" where aliasing &mut is allowed? If the latter, are there other such special cases besides UnsafeCell?
needs some way to refer to the pinned future &mut self in order to wake it up.
No, it doesn't. The key thing to understand is that a “task” is not the future: it is the future and what the executor knows about it. What exactly the waker mutates is up to the executor, but it must be something that isn't the Future. I say “must” not just because of Rust mutability rules, but because Futures don't contain any state that says whether they have been woken. So, there isn't anything there to usefully mutate; 100% of the bytes of the future's memory are dedicated to the specific Future implementation and none of the executor's business.
Well on the very next page if the book you will notice that task contains a boxed future and the waker is created from a reference to task. So there is a reference to future held from task albeit indirect.
OK, let's look at those data structures. Condensed:
struct Executor {
ready_queue: Receiver<Arc<Task>>,
}
struct Task {
future: Mutex<Option<BoxFuture<'static, ()>>>,
task_sender: SyncSender<Arc<Task>>,
}
The reference to the task is an Arc<Task>, and the future itself is inside a Mutex (interior mutability) in the task. Therefore,
It is not possible to get an &mut Task from the Arc<Task>, because Arc doesn't allow that.
The future is in a Mutex which does run-time checking that there is at most one mutable reference to it.
The only things you can do with an Arc<Task> are
clone it and send it
get & access to the future in a Mutex (which allows requesting run-time-checked mutation access to the Future)
get & access to the task_sender (which allows sending things to ready_queue).
So, in this case, when the waker is called, it sort-of doesn't even mutate anything specific to the Task at all: it makes a clone of the Arc<Task> (which increments an atomic reference count stored next to the Task) and puts it on the ready_queue (which mutates storage shared between the Sender and Receiver).
Another executor might indeed have task-specific state in the Task that is mutated, such as a flag marking that the task is already woken and doesn't need to be woken again. That flag might be stored in an AtomicBoolean field in the task. But still, it does not alias with any &mut of the Future because it's not part of the Future, but the task.
All that said, there actually is something special about Futures and noalias — but it's not about executors, it's about Pin. Pinning explicitly allows the pinned type to contain “self-referential” pointers into itself, so Rust does not declare noalias for Pin<&mut T>. However, exactly what the language rules around this are is still not quite rigorously specified; the current situation is just considered a kludge so that async functions can be correctly compiled, I think.
There is no such special case. Judging from your comment about the Rust executor, you are misunderstanding how interior mutability works.
The example in the Rust book uses an Arc wrapped Task structure, with the future contained in a Mutex. When the task is run, it locks the mutex and obtains the singular &mut reference that's allowed to exist.
Now look at how the example implements wake_by_ref, and notice how it never touches the future at all. Indeed, that function would not be able to lock the future at all, as the upper level already has the lock. It would not be able to safely get a &mut reference, and so it prevents you from doing so, therefore, no issue.
The restriction for UnsafeCell and its wrappers is that only one &mut reference may exist for an object at any point in time. However, multiple & immutable references may exist to the UnsafeCell or structures containing it just fine - that is the point of interior mutability.
I am reading https://doc.rust-lang.org/book/ch20-02-multithreaded.html and it states the following:
...taking a job off the channel queue involves mutating the receiver, so the threads need a safe way to share and modify receiver; otherwise, we might get race conditions...
However when I look at the Receiver::recv docs it shows that the method takes an immutable reference, so why is the book implying that: Receiving from the Receiver mutates it and thus should be Mutex'd?
Would it not work correctly just behind an Arc (no mutex)?
Yes.
We know because in the Trait Implementation section of the documentation we see that it implements:
#[stable(feature = "rust1", since = "1.0.0")]
impl<T> !Sync for Receiver<T> {}
This means it can not be shared safely without some form of synchronization structure such as a mutex.
That being said, you probably wont want to use a mutex. mpsc stands for Multiple Producer, Single Consumer. You likely want mpmc (Multiple Producer, Multiple Consumer) channels instead. The crossbeam-channel provides this type of functionality and is the most popular crate (that I know of) for channels. You can create multiple receivers for a single channel by cloning the first receiver.
I want to able to start a future running in the background, and not wait for it immediately in the parent function scope.
Something like a dynamic join_all where I can add new futures in a loop to a set, and then pass the set to another function which can .await the whole set (that is already running).
I want to be able to do something like this:
join_all(vec![
log_arg(&c),
log_arg(&c)
]).await;
But the issues are:
.await starts the future executing, but also waits for it at the current function.
How do I start the execution without waiting for it?
&c is not 'static
Seems to be a requirement for all of the Tokio API's that "start the future executing without waiting for the result in the current fn scope", e.g spawn_local
It is OK if all the futures are on a single thread.
Example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=85aa3a517bd1906b285f5a5586d7fa6d
How do I start the execution without waiting for it?
spawn a task.
Seems to be a requirement for all of the Tokio API's that "start the future executing without waiting for the result in the current fn scope", e.g spawn_local
Well yes, since you're spawning a task it's possible that the task outlives whatever owns the item, resulting in a dangling reference, which is not allowed. In fact it's pretty much a guarantee when using spawn_local: it's going to spawn a task on the same thread (/scheduler), and that will not be able to run at all until the current task yields or terminates.
The alternative would be to use "scoped tasks" (which don't have to be immediately waited on, but have to eventually be joined). However support for structured concurrency (scoped tasks) in tokio have so far died on the vine. So there is no way for the Rust compiler to know that a task does not "escape" from the scope which intiialised it, therefore it has to assume it does, and thus that whatever the task captures should be able to outlive the current scope.
I have a struct which needs to be Send + Sync:
struct Core {
machines_by_id: DashMap<String, StateMachineManager>,
}
In my current implementation, StateMachineManager looks like this:
struct StateMachineManager {
protected: Arc<Mutex<StateMachines>>,
}
This works fine as long as StateMachines is Send. However, it doesn't need to be, and it complicates the implementation where I'd like to use Rc.
Performance wise, there's no reason all the StateMachines can't live on one thread forever, so in theory there's no reason they need to be Send - they could be created on a thread dedicated to them and live there until no longer needed.
I know I can do this with channels, but that would seemingly mean recreating the API of StateMachines as messages sent back and forth over that channel. How might I avoid doing that, and tell Rust that all I need to do is serialize access to the thread they all live on?
Here is a minimal example (where I have added the Send + Sync bounds to Shepmaster's comment which omitted them) -- DashMap is a threadsafe map.
What I ended up doing here was using channels, but I was able to find a way to avoid needing to recreate the API of StateMachines.
The technique is to use channels to pass a closure to a dedicated thread the StateMachines instances live on which accepts a &mut StateMachines argument, and sends the response back down a different channel, which lives on the stack while the access is happening.
Here is a playground implementing the key part
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e07972b928d0f0b7680b1e5a988dae84
The details of instantiating the machines on their dedicated thread are elided.
Just to get better understanding of the Send and Sync traits, are there examples of types that either:
Implement Send and do not implement Sync.
Implement Sync and do not implement Send.
First of all, it is important to realize that most structs (or enums) are Send:
any struct that does not contain any reference can be Send + 'static
any struct that contain references with a lower-bound lifetime of 'a can be Send + 'a
As a result, you would generally expect any Sync struct to be Send too, because Send is such an easy bar to reach (compared to the much harder bar of being Sync which requires safe concurrent modification from multiple threads).
However, nothing prevents the creator of a type to specifically mark it as not Send. For example, let's resuscitate conditions!
The idea of conditions, in Lisp, is that you setup a handler for a given condition (say: FileNotFound) and then when deep in the stack this condition is met then your handler is called.
How would you implement this in Rust?
Well, to preserve threads independence, you would use thread-local storage for the condition handlers (see std::thread_local!). Each condition would be a stack of condition handlers, with either only the top one invoked or an iterative process starting from the top one but reaching down until one succeeds.
But then, how would you set them?
Personally, I'd use RAII! I would bind the condition handler in the thread-local stack and register it in the frame (for example, using an intrusive doubly-linked list as the stack).
This way, when I am done, the condition handler automatically un-registers itself.
Of course, the system has to account for users doing unexpected things (like storing the condition handlers in the heap and not dropping them in the order they were created), and this is why we use a doubly-linked list, so that the handler can un-register itself from the middle of the stack if necessary.
So we have a:
struct ConditionHandler<T> {
handler: T,
prev: Option<*mut ConditionHandler<T>>,
next: Option<*mut ConditionHandler<T>>,
}
and the "real" handler is passed by the user as T.
Would this handler be Sync?
Possibly, depends how you create it but there is no reason you could not create a handler so that a reference to it could not be shared between multiple threads.
Note: those threads could not access its prev/next data members, which are private, and need not be Sync.
Would this handler be Send?
Unless specific care is taken, no.
The prev and next fields are not protected against concurrent accesses, and even worse if the handler were to be dropped while another thread had obtained a reference to it (for example, another handler trying to un-register itself) then this now dangling reference would cause Undefined Behavior.
Note: the latter issue means that just switching Option<*mut Handler<T>> for AtomicPtr<ConditionHandler<T>> is not sufficient; see Common Pitfalls in Writing Lock-Free Algorithms for more details.
And there you have it: a ConditionHandler<T> is Sync if T is Sync but will never be Send (as is).
For completeness, many types implement Send but not Sync (most Send types, actually): Option or Vec for example.
Cell and RefCell implement Send but not Sync because they can be safely sent between threads but not shared between them.