Prevent `chan::Receiver` from blocking on empty buffer

Prevent `chan::Receiver` from blocking on empty buffer - multithreading

I'd like to build an Multi-Producer Multi-Consumer (MPMC) channel with different concurrent tasks processing and producing data in it. Some of these tasks have the responsibility to interface with the filesystem or network.
Two examples:
PrintOutput(String) would be consumed by a logger, a console output, or a GUI.
NewJson(String) would be consumed by a logger or a parser.
To achieve this, I've selected chan as the MPMC channel provider and tokio as the system to manage event loops for each listener on the channel.
After reading the example on tokio's site, I began to implement futures::stream::Stream for chan::Receiver. This would allow the use of a for each future to listen on the channel. However, the documentation of these two libraries highlights a conflict:
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error>
Attempt to pull out the next value of this stream, returning None if the stream is finished.
This method, like Future::poll, is the sole method of pulling out a value from a stream. This method must also be run within the context of a task typically and implementors of this trait must ensure that implementations of this method do not block, as it may cause consumers to behave badly.
fn recv(&self) -> Option<T>
Receive a value on this channel.
If this is an asnychronous channel, recv only blocks when the buffer is empty.
If this is a synchronous channel, recv only blocks when the buffer is empty.
If this is a rendezvous channel, recv blocks until a corresponding send sends a value.
For all channels, if the channel is closed and the buffer is empty, then recv always and immediately returns None. (If the buffer is non-empty on a closed channel, then values from the buffer are returned.)
Values are guaranteed to be received in the same order that they are sent.
This operation will never panic! but it can deadlock if the channel is never closed.
chan::Receiver may block when the buffer is empty, but futures::stream::Stream expects to never block when polled.
If an empty buffer blocks, there isn't a clear way to confirm that it is empty. How do I check if the buffer is empty to prevent blocking?
Although Kabuki is on my radar and seems to be the most mature of the actor model crates, it almost entirely lacks documentation.
This is my implementation so far:
extern crate chan;
extern crate futures;
struct RX<T>(chan::Receiver<T>);
impl<T> futures::stream::Stream for RX<T> {
type Item = T;
type Error = Box<std::error::Error>;
fn poll(&mut self) -> futures::Poll<Option<Self::Item>, Self::Error> {
let &mut RX(ref receiver) = self;
let item = receiver.recv();
match item {
Some(value) => Ok(futures::Async::Ready(Some(value))),
None => Ok(futures::Async::NotReady),
}
}
}
I've finished a quick test to see how it works. It seems alright, but as expected does block after finishing the buffer. While this should work, I'm somewhat worried about what it means for a consumer to "behave badly". For now I'll continue to test this approach and hopefully I don't encounter bad behaviour.
extern crate chan;
extern crate futures;
use futures::{Stream, Future};
fn my_test() {
let mut core = tokio_core::reactor::Core::new().unwrap();
let handle = core.handle();
let (tx, rx) = chan::async::<String>();
tx.send("Hello".to_string()); // fill the buffer before it blocks; single thread here.
let incoming = RX(rx).for_each(|s| {
println!("Result: {}", s);
Ok(())
});
core.run(incoming).unwrap()
}

The chan crate provides a chan_select macro that would allow a non-blocking recv; but to implement Future for such primitives you also need to wake up the task when the channel becomes ready (see futures::task::current()).
You can implement Future by using existing primitives; implementing new ones is usually more difficult. In this case you probably have to fork chan to make it Future compatible.
It seems the multiqueue crate has a Future compatible mpmc channel mpmc_fut_queue.

Related

Do I need to call close on an mpsc channel, even if I drop both Sender and Receiver?

Consider the following example:
use tokio::sync::mpsc;
#[tokio::main]
async fn main() {
let (mut sender, receiver) = mpsc::channel::<u32>(32);
sender.send(42).await.unwrap();
std::mem::drop(sender);
std::mem::drop(receiver);
}
Here I create an mpsc Sender / Receiver couple, I use sender to send a value, but I never receive that value on receiver. Then, without calling close on receiver, I drop both sender and receiver.
Tokio's mpsc documentation seems to state that dropping a Receiver without calling close and consuming all values is ill-advised, as values could linger forever in the channel without being dropped. I wonder if this applies to the above example, too. There I drop both the Receiver and all (i.e., the only) Senders. I somehow have a hard time imagining this could cause a memory-leak, but I want to make sure if what I'm doing is safe.

The point of the documentation is to advice good practice. It's fishy to have code that doesn't read all items produced. If you need a stop behavior it's should be implemented in the sender. The receiver should never stop by itself.
Close allow a middle ground by closing the receiver to prevent new message from the sender. Than the receiver can be read until there is no message, preventing any loose.
But there is no such requirement, you don't need to do it if you want. The drop implementation will clean memory anyway as we can see here:
impl<T, S: Semaphore> Drop for Rx<T, S> {
fn drop(&mut self) {
use super::block::Read::Value;
self.close();
self.inner.rx_fields.with_mut(|rx_fields_ptr| {
let rx_fields = unsafe { &mut *rx_fields_ptr };
while let Some(Value(_)) = rx_fields.list.pop(&self.inner.tx) {
self.inner.semaphore.add_permit();
}
})
}
}

Using thread unsafe values in an async block

In this code snippet (playground link), we have some simple communication between two threads. The main thread (which executes the second async block) sends 2 to thread 2 in the async move block, which receives it, adds its own value, and sends the result back over another channel to the main thread, which prints the value.
Thread 2 contains some local state, the thread_unsafe variable, which is neither Send nor Sync, and is maintained across an .await. Therefore the impl Future object that we are creating is itself neither Send nor Sync, and hence the call to pool.spawn_ok is a compile error.
However, this seems like it should be fine. I understand why spawn_ok() can't accept a future that is not Send, and I also understand why the compilation of the async block into a state machine results in a struct that contains a non-Send value, but in this example the only thing I want to send to the other thread is recv and send2. How do I express that the future should switch to non-thread safe mode only after it has been sent?
use std::rc::Rc;
use std::cell::RefCell;
use futures::channel::oneshot::channel;
use futures::executor::{ThreadPool, block_on};
fn main() {
let pool = ThreadPool::new().unwrap();
let (send, recv) = channel();
let (send2, recv2) = channel();
pool.spawn_ok(async move {
let thread_unsafe = Rc::new(RefCell::new(40));
let a = recv.await.unwrap();
send2.send(a + *thread_unsafe.borrow()).unwrap();
});
let r = block_on(async {
send.send(2).unwrap();
recv2.await.unwrap()
});
println!("the answer is {}", r)
}

but in this example the only thing I want to send to the other thread is recv and send2
There is also the local variable thread_unsafe which is used across an .await. Since .await can suspend an async function, and later resume it on another thread, this could send thread_unsafe to a different thread, which is not allowed.

Is it necessary to register a new waker every time Future::poll is called?

I am making my own channel implementation, but std::task::Context doesn't make it clear how the waker was generated.
My fake code:
struct MyAtomicWaker {
lock: SpinLock,
is_waked: AtomicBool,
waker: std::task::Waker,
}
struct WeakAtomicWaker (Weak<MyAtomicWaker>)
impl MyAtomicWaker {
fn is_waked(&self) -> bool {}
fn weak(self: Arc<MyAtomicWaker>) -> WeakAtomicWaker;
fn cancel(&self) {} // nullify WeakAtomicWaker, means the waker is not waked by a future
}
impl WeakAtomicWaker {
fn wake(self) {} // upgrade to arc and can wake only once when waker not cancelled
}
struct ReceiveFuture<T> {
waker: Option<Arc<MyAtomicWaker>>,
}
impl<T> Drop for ReceiveFuture<T> {
fn drop(&mut self) {
if let Some(waker) = self.waker.take() { waker.cancel(); }
}
}
impl<T> Future for ReceiveFuture<T> {
type Output = Result<(), SendError<T>>;
fn poll(self: Pin<&mut Self>, ctx: &mut Context) -> Poll<Self::Output> {
let _self = self.get_mut();
if _self.waker.is_none() {
let my_waker = _self.reg_waker(ctx.waker().clone()); // wrap the waker with Arc, store it inside _self, and send the weak ref to other side of channel
_self.waker.replace(my_waker);
}
// do some polling
match _self.recv.try_recv() {
Ok(item)=>{
if let Some(waker) = _self.waker.take() {
waker.cancel();
}
return Poll::Ready(item); //canncel my waker and ready
},
Err(TryRecvError)=>{
if let Some(waker) = _self.waker.as_ref() {
if waker.is_wake() { // the waker is triggered but it's a false alarm, should make a new one.
let my_waker = self.reg_waker(ctx.waker().clone());
_self.waker.replace(my_waker);
} else { // the waker has not trigger, we do not have to make a new one ?
}
}
return Poll::Pending;
},
Err(...)
}
}
}
Is it necessary to register a new waker every time poll() is called? In my code, there's a lot of timeouts and looping selects due to the combination of different futures.
I have a little experiment that works on the playground, but I'm not sure whether it will always work fine for both Tokio and async-std in various settings.
In my production code, I register a new waker and cancel the old waker in every poll() call. I don't know whether it is safe to only register a waker the first time and reuse it on the next polls.
Given the following order:
f.reg_waker(waker1)
f.poll() gets Poll::Pending
combined future (or future::select) wakeup due to other future selecting, but waker1 has not woken up.
f.poll() gets Poll::Pending
some outsider call waker1.wake();
will waker1.wake() guarantee to wake up f after that?
I'm asking this because:
I have a Stream that multiplexes multiple receiving channels
My MPMC and MPSC channel implementations are lockless. Some channels inside a multiplex selection may be used as a close notification channel and seldom gets message. When I'm polling it a lot (say a million times), it will lead to a million waker thrown to the other side (which looks like a memory leak). Canceling previous wakers produced in the same future without lock is more complex logic than an implementation with lock
For these reasons, I have a waker canceling solution that leads to fairness problem, which needs to be avoided as much as possible
I'm not interested in what the book states, or what the API laws declare; I'm only interested in how the low level is implemented. Show some code why this works or why this does not work would be helpful. I code to implement product; if necessary I will stick to a specified dependency or do some hacking in order to get the job done until I have a better way.

Yes, it is required to re-set the waker each time. Future::poll states (emphasis mine):
Note that on multiple calls to poll, only the Waker from the Context passed to the most recent call should be scheduled to receive a wakeup.
See also:
Why do I not get a wakeup for multiple futures when they use the same underlying socket?
Is it valid to wake a Rust future while it's being polled?

Given the fact that "a waker can be waked in parallel of Future::poll.
Counter-evidence: Presuming every time a waker must be clone() and re-registered in order for this future to wake up properly, this will make previous waker invalid, so it's not possible for "concurrent wake up from different thread, e.g. future::select block". The conclusion is not true, so it counter-proves this statement:
"A waker is always valid from the time starting from ctx.waker().clone() until waker.wake()". This serves my motive. waker is not needed to reset every time, if it's not used to woken yet.
In addition to investigate tokio waker implementation, all RawWaker produce by ctx.waker.clone() is just a ref_count to a manual drop memory entry on the heap, if wakers clone outside prevent the ref_count dropping zero, the real waker entry always exists.

How can I send a message to a specific thread?

I need to create some threads where some of them are going to run until their runner variable value has been changed. This is my minimal code.
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
let mut log_runner = Arc::new(Mutex::new(true));
println!("{}", *log_runner.lock().unwrap());
let mut threads = Vec::new();
{
let mut log_runner_ref = Arc::clone(&log_runner);
// log runner thread
let handle = thread::spawn(move || {
while *log_runner_ref.lock().unwrap() == true {
// DO SOME THINGS CONTINUOUSLY
println!("I'm a separate thread!");
}
});
threads.push(handle);
}
// let the main thread to sleep for x time
thread::sleep(Duration::from_millis(1));
// stop the log_runner thread
*log_runner.lock().unwrap() = false;
// join all threads
for handle in threads {
handle.join().unwrap();
println!("Thread joined!");
}
println!("{}", *log_runner.lock().unwrap());
}
It looks like I'm able to set the log_runner_ref in the log runner thread after 1 second to false. Is there a way to mark the treads with some name / ID or something similar and send a message to a specific thread using its specific marker (name / ID)?
If I understand it correctly, then the let (tx, rx) = mpsc::channel(); can be used for sending messages to all the threads simultaneously rather than to a specific one. I could send some identifier with the messages and each thread will be looking for its own identifier for the decision if to act on received message or not, but I would like to avoid the broadcasting effect.

MPSC stands for Multiple Producers, Single Consumer. As such, no, you cannot use that by itself to send a message to all threads, since for that you'd have to be able to duplicate the consumer. There are tools for this, but the choice of them requires a bit more info than just "MPMC" or "SPMC".
Honestly, if you can rely on channels for messaging (there are cases where it'd be a bad idea), you can create a channel per thread, assign the ID outside of the thread, and keep a HashMap instead of a Vec with the IDs associated to the threads. Receiver<T> can be moved into the thread (it implements Send if T implements Send), so you can quite literally move it in.
You then keep the Sender outside and send stuff to it :-)

Forced to use of Mutex when it's not required

I am writing a game and have a player list defined as follows:
pub struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
This struct has methods for adding, removing, getting players, and getting the player count.
The NetworkServer and Server shares this list as follows:
NetworkServer {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
Server {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
This is inside an Arc<Mutex> because the NetworkServer accesses the list in a different thread (network loop).
When a player joins, a thread is spawned for them and they are added to the player_list.
Although the only operation I'm doing is adding to player_list, I'm forced to use Arc<Mutex<Player>> instead of the more natural Rc<RefCell<Player>> in the HashMaps because Mutex<PlayerList> requires it. I am not accessing players from the network thread (or any other thread) so it makes no sense to put them under a Mutex. Only the HashMaps need to be locked, which I am doing using Mutex<PlayerList>. But Rust is pedantic and wants to protect against all misuses.
As I'm only accessing Players in the main thread, locking every time to do that is both annoying and less performant. Is there a workaround instead of using unsafe or something?
Here's an example:
use std::cell::Cell;
use std::collections::HashMap;
use std::ffi::CString;
use std::rc::Rc;
use std::sync::{Arc, Mutex};
use std::thread;
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Uuid([u8; 16]);
struct Player {
pub name: String,
pub uuid: Uuid,
}
struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Arc::new(Mutex::new(p));
self.by_name.insert(name, Arc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
struct NetworkServer {
player_list: Arc<Mutex<PlayerList>>,
}
impl NetworkServer {
fn start(&mut self) {
let player_list = Arc::clone(&self.player_list);
thread::spawn(move || {
loop {
// fake network loop
// listen for incoming connections, accept player and add them to player_list.
player_list.lock().unwrap().add_player(Player {
name: "blahblah".into(),
uuid: Uuid([0; 16]),
});
}
});
}
}
struct Server {
player_list: Arc<Mutex<PlayerList>>,
network_server: NetworkServer,
}
impl Server {
fn start(&mut self) {
self.network_server.start();
// main game loop
loop {
// I am only accessing players in this loop in this thread. (main thread)
// so Mutex for individual player is not needed although rust requires it.
}
}
}
fn main() {
let player_list = Arc::new(Mutex::new(PlayerList {
by_name: HashMap::new(),
by_uuid: HashMap::new(),
}));
let network_server = NetworkServer {
player_list: Arc::clone(&player_list),
};
let mut server = Server {
player_list,
network_server,
};
server.start();
}

As I'm only accessing Players in the main thread, locking everytime to do that is both annoying and less performant.
You mean, as right now you are only accessing Players in the main thread, but at any time later you may accidentally introduce an access to them in another thread?
From the point of view of the language, if you can get a reference to a value, you may use the value. Therefore, if multiple threads have a reference to a value, this value should be safe to use from multiple threads. There is no way to enforce, at compile-time, that a particular value, although accessible, is actually never used.
This raises the question, however:
If the value is never used by a given thread, why does this thread have access to it in the first place?
It seems to me that you have a design issue. If you can manage to redesign your program so that only the main thread has access to the PlayerList, then you will immediately be able to use Rc<RefCell<...>>.
For example, you could instead have the network thread send a message to the main thread announcing that a new player connected.
At the moment, you are "Communicating by Sharing", and you could shift toward "Sharing by Communicating" instead. The former usually has synchronization primitives (such as mutexes, atomics, ...) all over the place, and may face contention/dead-lock issues, while the latter usually has communication queues (channels) and requires an "asynchronous" style of programming.

Send is a marker trait that governs which objects can have ownership transferred across thread boundaries. It is automatically implemented for any type that is entirely composed of Send types. It is also an unsafe trait because manually implementing this trait can cause the compiler to not enforce the concurrency safety that we love about Rust.
The problem is that Rc<RefCell<Player>> isn't Send and thus your PlayerList isn't Send and thus can't be sent to another thread, even when wrapped in an Arc<Mutex<>>. The unsafe workaround would be to unsafe impl Send for your PlayerList struct.
Putting this code into your playground example allows it to compile the same way as the original with Arc<Mutex<Player>>
struct PlayerList {
by_name: HashMap<String, Rc<RefCell<Player>>>,
by_uuid: HashMap<Uuid, Rc<RefCell<Player>>>,
}
unsafe impl Send for PlayerList {}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Rc::new(RefCell::new(p));
self.by_name.insert(name, Rc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
Playground
The Nomicon is sadly a little sparse at explaining what rules have have to be enforced by the programmer when unsafely implementing Send for a type containing Rcs, but accessing in only one thread seems safe enough...
For completeness, here's TRPL's bit on Send and Sync

I suggest solving this threading problem using a multi-sender-single-receiver channel. The network threads get a Sender<Player> and no direct access to the player list.
The Receiver<Player> gets stored inside the PlayerList. The only thread accessing the PlayerList is the main thread, so you can remove the Mutex around it. Instead in the place where the main-thread used to lock the mutexit dequeue all pending players from the Receiver<Player>, wraps them in an Rc<RefCell<>> and adds them to the appropriate collections.
Though looking at the bigger designing, I wouldn't use a per-player thread in the first place. Instead I'd use some kind single threaded event-loop based design. (I didn't look into which Rust libraries are good in that area, but tokio seems popular)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Prevent `chan::Receiver` from blocking on empty buffer - multithreading

Related

Do I need to call close on an mpsc channel, even if I drop both Sender and Receiver?

Using thread unsafe values in an async block

Is it necessary to register a new waker every time Future::poll is called?

How can I send a message to a specific thread?

Forced to use of Mutex when it's not required

Categories

Resources