In "The Rust Programming Language" in chapter 20 you go through an exercise of building a simple multi-threaded web server. In the exercise you use a single std::sync::mpsc channel. The worker threads all access a single Receiver which is contained like: Arc<Mutex<mpsc::Receiver<Message>>>.
If we write the worker thread like:
let thread = thread::spawn(move || loop {
match receiver.lock().unwrap().recv().unwrap() {
Message::NewJob(job) => {
println!("Worker {} got a job; executing.", id);
job.call_box();
println!("Worker {} job complete.", id);
}
Message::Terminate => {
println!("Worker {} was told to terminate.", id);
break;
}
};
println!("hello, loop");
});
Then we do not achieve concurrency, apparently the worker holds on to the mutex lock I supposed because no worker is able to pull off another job until the previous one is complete. However if we simply change it to this (how the book shows the code):
let thread = thread::spawn(move || loop {
let message = receiver.lock().unwrap().recv().unwrap();
match message {
Message::NewJob(job) => {
println!("Worker {} got a job; executing.", id);
job.call_box();
println!("Worker {} job complete.", id);
}
Message::Terminate => {
println!("Worker {} was told to terminate.", id);
break;
}
};
println!("hello, loop");
});
Then everything works fine. If you fire off 5 requests you'll see each thread gets one immediately. Concurrency!
The question is "why does variable binding affect lifetime" (I'm assuming that's the reason). Or if not then I'm missing something and what is that?! The book itself talks about how you cannot implement the worker loop with while let Ok(job) = receiver.lock().unwrap().recv() { because of the scope of the lock but apparently even inside the loop there be dragons.
Because in Rust, "resource acquisition is initialization".
Specifically receiver.lock() returns a type which acquires the lock when it is initialized and releases the lock when it is dropped.
In your first example, the lifetime of the MutexGuard extends to the end of the match statement, so the lock will be held while job.call_box() is called.
match receiver.lock().unwrap().recv().unwrap() {
// ...
};
// `MutexGuard` is dropped and lock is released here
In your second example, the lock guard is only kept alive long enough to read a message from your message queue; the lock guard is dropped at the end of the statement and the lock is released before the match is entered.
let message = receiver.lock().unwrap().recv().unwrap();
// `MutexGuard` is dropped and lock is released here
match message {
Related
I referred this and also tried tungstenite library. But I was able to run only one server at a time, it captured whole thread.
I tried running multiple servers on different thread but that never listen anything and just exit the program.
Is there anyway that I can run multiple WebSocket servers on different ports, and create, destroy a server in runtime?
Edit: If I run a server on main thread and another one on other thread, it works, looks like I'd have to keep main thread busy somehow.. but is there any better way?
here's some example code:
it uses:
use std::net::TcpListener;
use std::thread::spawn;
use tungstenite::accept;
this is the normal code that blocks the main thread
let server = TcpListener::bind("127.0.0.1:9002").expect("err: ");
for stream in server.incoming() {
spawn(move || {
let mut websocket = accept(stream.unwrap()).unwrap();
loop {
let msg = websocket.read_message().unwrap();
println!("{}", msg);
// We do not want to send back ping/pong messages.
if msg.is_binary() || msg.is_text() {
websocket.write_message(msg).unwrap();
}
}
});
}
here's the code with thread:
spawn(|| {
let server = TcpListener::bind("127.0.0.1:9001").expect("err: ");
for stream in server.incoming() {
spawn(move || {
let mut websocket = accept(stream.unwrap()).unwrap();
loop {
let msg = websocket.read_message().unwrap();
println!("{}", msg);
// We do not want to send back ping/pong messages.
if msg.is_binary() || msg.is_text() {
websocket.write_message(msg).unwrap();
}
}
});
}
});
but the above code needs the main thread to run somehow, I'm indeed able to run multiple servers on different threads but need something to occupy main thread.
Rust programs terminate when the end of main() is reached. What you need to do is wait until your secondary threads have finished.
std::thread::spawn returns a JoinHandle, which has a join method which does exactly that - it waits (blocks) until the thread that the handle refers to finishes, and returns an error if the thread panicked.
So, to keep your program alive as long as any threads are running, you need to collect all of these handles, and join() them one by one. Unlike a busy-loop, this will not waste CPU resources unnecessarily.
use std::net::TcpListener;
use std::thread::spawn;
use tungstenite::accept;
fn main() {
let mut handles = vec![];
// Spawn 3 identical servers on ports 9001, 9002, 9003
for i in 1..=3 {
let handle = spawn(move || {
let server = TcpListener::bind(("127.0.0.1", 9000 + i)).expect("err: ");
for stream in server.incoming() {
spawn(move || {
let mut websocket = accept(stream.unwrap()).unwrap();
loop {
let msg = websocket.read_message().unwrap();
println!("{}", msg);
// We do not want to send back ping/pong messages.
if msg.is_binary() || msg.is_text() {
websocket.write_message(msg).unwrap();
}
}
});
}
});
handles.push(handle);
}
// Wait for each thread to finish before exiting
for handle in handles {
if let Err(e) = handle.join() {
eprintln!("{:?}", e)
}
}
}
When you do all the work in a thread (or threads) and the main thread has nothing to do, usually it is set to wait (join) that thread.
This has the additional advantage that if your secondary thread finishes or panics, then your program will also finish. Or you can wrap the whole create-thread/join-thread in a loop and make it more resilient:
fn main() {
loop {
let th = std::thread::spawn(|| {
// Do the real work here
std::thread::sleep(std::time::Duration::from_secs(1));
panic!("oh!");
});
if let Err(e) = th.join() {
eprintln!("Thread panic: {:?}", e)
}
}
}
Link to playground, I've changed to the loop into a for _ in ..3 because playgrond does not like infinite loops.
My requirement is very simple, which is a very reasonable requirement in many programs. It is to send a specified message to my Channel after a specified time.
I've checked tokio for topics related to delay, interval or timeout, but none of them seem that straightforward to implement.
What I've come up with now is to spawn an asynchronous task, then wait or sleep for a certain amount of time, and finally send the message.
But, obviously, spawning an asynchronous task is a relatively heavy operation. Is there a better solution?
async fn my_handler(sender: mpsc::Sender<i32>, dur: Duration) {
tokio::spawn(async {
time::sleep(dur).await;
sender.send(0).await;
}
}
You could try adding a second channel and a continuously running task that buffers messages until the time they are to be received. Implementing this is more involved than it sounds, I hope I'm handling cancellations right here:
fn make_timed_channel<T: Ord + Send + Sync + 'static>() -> (Sender<(Instant, T)>, Receiver<T>) {
// Ord is an unnecessary requirement arising from me stuffing both the Instant and the T into the Binary heap
// You could drop this requirement by using the priority_queue crate instead
let (sender1, receiver1) = mpsc::channel::<(Instant, T)>(42);
let (sender2, receiver2) = mpsc::channel::<T>(42);
let mut receiver1 = Some(receiver1);
tokio::spawn(async move {
let mut buf = std::collections::BinaryHeap::<Reverse<(Instant, T)>>::new();
loop {
// Pretend we're a bounded channel or exit if the upstream closed
if buf.len() >= 42 || receiver1.is_none() {
match buf.pop() {
Some(Reverse((time, element))) => {
sleep_until(time).await;
if sender2.send(element).await.is_err() {
break;
}
}
None => break,
}
}
// We have some deadline to send a message at
else if let Some(Reverse((then, _))) = buf.peek() {
if let Ok(recv) = timeout_at(*then, receiver1.as_mut().unwrap().recv()).await {
match recv {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
} else {
if sender2.send(buf.pop().unwrap().0 .1).await.is_err() {
break;
}
}
}
// We're empty, wait around
else {
match receiver1.as_mut().unwrap().recv().await {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
}
}
});
(sender1, receiver2)
}
Playground
Whether this is more efficient than spawning tasks, you'd have to benchmark. (I doubt it. Tokio iirc has some much fancier solution than a BinaryHeap for waiting for waking up at the next timeout, e.g.)
One optimization you could make if you don't need a Receiver<T> but just something that .poll().await can be called on: You could drop the second channel and maintain the BinaryHeap inside a custom receiver.
In my application I have a blocking task that synchronically reads messages from a queue and feeds them to a running task.
All of this works fine, but the problem that I'm having is that the process does not terminate correctly, since the queue_reader task does not stop.
I've constructed a small example based on the tokio documentation at: https://docs.rs/tokio/1.20.1/tokio/task/fn.spawn_blocking.html
use tokio::sync::mpsc;
use tokio::task;
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
loop {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
println!("Finalizing thread");
queue_reader.abort(); // This doesn't seem to terminate the queue_reader task
queue_reader.await.unwrap(); // <-- The process hangs on this task.
println!("Done");
}
At first I expected that queue_reader.abort() should terminate the task, however it doesn't. My expectation is that tokio can only do this for tasks that use .await internally, because that will handle control over to tokio. Is this right?
In order to terminate the queue_reader task I introduced a oneshot channel, over which I signal the termination, as shown in the next snippet.
use tokio::task;
use tokio::sync::{oneshot, mpsc};
#[tokio::main]
async fn main() {
let (incoming_tx, mut incoming_rx) = mpsc::channel(2);
// A new channel to communicate when the process must finish.
let (term_tx, mut term_rx) = oneshot::channel();
// Some blocking task that never ends
let queue_reader = task::spawn_blocking(move || {
// As long as termination is not signalled
while term_rx.try_recv().is_err() {
// Stand in for receiving messages from queue
incoming_tx.blocking_send(5).unwrap();
}
});
let mut acc = 0;
// Some complex condition that determines whether the job is done
while acc < 95 {
tokio::select! {
Some(v) = incoming_rx.recv() => {
acc += v;
}
}
}
assert_eq!(acc, 95);
// Signal termination
term_tx.send(()).unwrap();
println!("Finalizing thread");
queue_reader.await.unwrap();
println!("Done");
}
My question is, is this the canonical/best way to do this, or are there better alternatives?
Tokio cannot terminate CPU-bound/blocking tasks.
It is technically possible to kill OS threads, but generally it is not a good idea, as it's expensive to create new threads and it can leave your program in an invalid state. Even if Tokio decided this was something worth implementing, it would serverely limit its implementation - it would be forced into a multithread model, just to support the possibility that you'd want to kill a blocking task before it's finished.
Your solution is pretty good; give your blocking task the responsibility for terminating itself and provide a way to tell it to do so. If this future was part of a library, you could abstract the mechanism away by returning a "handle" to the task that had a cancel() method.
Are there better alternatives? Maybe, but that would depend on other factors. Your solution is good and easily extended, for example if you later needed to send different types of signal to the task.
Is it possible to force resume a sleeping thread which has been paused? For example, by calling sleep:
std::thread::sleep(std::time::Duration::from_secs(60 * 20));
I know that I can communicate between threads using std::sync::mpsc but if the thread is asleep, this does not force it to wake up before the time indicated.
I have thought that using std::sync::mpsc and maybe
Builder and .name associated with the thread, but I do not know how to get the thread to wake up.
If you want to be woken up by an event, thread::sleep() is not the correct function to use, as it's not supposed to be stopped.
There are other methods of waiting while being able to be woken up by an event (this is usually called blocking). Probably the easiest way is to use a channel together with Receiver::recv_timeout(). Often it's also sufficient to send () through the channel. That way we just communicate a signal, but don't send actual data.
If you don't want to wake up after a specific timeout, but only when a signal arrives, just use Receiver::recv().
Example with timeout:
use std::thread;
use std::sync::mpsc::{self, RecvTimeoutError};
use std::time::Duration;
use std::io;
fn main() {
let (sender, receiver) = mpsc::channel();
thread::spawn(move || {
loop {
match receiver.recv_timeout(Duration::from_secs(2)) {
Err(RecvTimeoutError::Timeout) => {
println!("Still waiting... I'm bored!");
// we'll try later...
}
Err(RecvTimeoutError::Disconnected) => {
// no point in waiting anymore :'(
break;
}
Ok(_) => {
println!("Finally got a signal! ♥♥♥");
// doing work now...
}
}
}
});
loop {
let mut s = String::new();
io::stdin().read_line(&mut s).expect("reading from stdin failed");
if s.trim() == "start" {
sender.send(()).unwrap();
}
}
}
Here, the second thread is woken up at least every two seconds (the timeout), but also earlier once something was sent through the channel.
park_timeout allows timed sleeps with wakeups from unpark, but it can also wake up early.
See std::thread module documentation
Here's an example but what should I wait on to decide when it is done. Do we have a better way to wait for the channel to be empty and all the threads to have completed? Full example is at http://github.com/posix4e/rust_webcrawl
loop {
let n_active_threads = running_threads.compare_and_swap(0, 0, Ordering::SeqCst);
match rx.try_recv() {
Ok(new_site) => {
let new_site_copy = new_site.clone();
let tx_copy = tx.clone();
counter += 1;
print!("{} ", counter);
if !found_urls.contains(&new_site) {
found_urls.insert(new_site);
running_threads.fetch_add(1, Ordering::SeqCst);
let my_running_threads = running_threads.clone();
pool.execute(move || {
for new_url in get_websites_helper(new_site_copy) {
if new_url.starts_with("http") {
tx_copy.send(new_url).unwrap();
}
}
my_running_threads.fetch_sub(1, Ordering::SeqCst);
});
}
}
Err(TryRecvError::Empty) if n_active_threads == 0 => break,
Err(TryRecvError::Empty) => {
writeln!(&mut std::io::stderr(),
"Channel is empty, but there are {} threads running",
n_active_threads);
thread::sleep_ms(10);
},
Err(TryRecvError::Disconnected) => unreachable!(),
}
}
This is actually a very complicated question, one with a great potential for race conditions! As I understand it, you:
Have an unbounded queue
Have a set of workers that operate on the queue items
The workers can put an unknown amount of items back into the queue
Want to know when everything is "done"
One obvious issue is that it may never be done. If every worker puts one item back into the queue, you've got an infinite loop.
That being said, I feel like the solution is to track
How many items are queued
How many items are in progress
When both of these values are zero, then you are done. Easier said than done...
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize,Ordering};
use std::sync::mpsc::{channel,TryRecvError};
use std::thread;
fn main() {
let running_threads = Arc::new(AtomicUsize::new(0));
let (tx, rx) = channel();
// We prime the channel with the first bit of work
tx.send(10).unwrap();
loop {
// In an attempt to avoid a race condition, we fetch the
// active thread count before checking the channel. Otherwise,
// we might read nothing from the channel, and *then* a thread
// finishes and added something to the queue.
let n_active_threads = running_threads.compare_and_swap(0, 0, Ordering::SeqCst);
match rx.try_recv() {
Ok(id) => {
// I lie a bit and increment the counter to start
// with. If we let the thread increment this, we might
// read from the channel before the thread ever has a
// chance to run!
running_threads.fetch_add(1, Ordering::SeqCst);
let my_tx = tx.clone();
let my_running_threads = running_threads.clone();
// You could use a threadpool, but I'm spawning
// threads to only rely on stdlib.
thread::spawn(move || {
println!("Working on {}", id);
// Simulate work
thread::sleep_ms(100);
if id != 0 {
my_tx.send(id - 1).unwrap();
// Send multiple sometimes
if id % 3 == 0 && id > 2 {
my_tx.send(id - 2).unwrap();
}
}
my_running_threads.fetch_sub(1, Ordering::SeqCst);
});
},
Err(TryRecvError::Empty) if n_active_threads == 0 => break,
Err(TryRecvError::Empty) => {
println!("Channel is empty, but there are {} threads running", n_active_threads);
// We sleep a bit here, to avoid quickly spinning
// through an empty channel while the worker threads
// work.
thread::sleep_ms(1);
},
Err(TryRecvError::Disconnected) => unreachable!(),
}
}
}
I make no guarantees that this implementation is perfect (I probably should guarantee that it's broken, because threading is hard). One big caveat is that I don't intimately know the meanings of all the variants of Ordering, so I chose the one that looked to give the strongest guarantees.