I'm trying to implement the sieve of Eratosthenes in Rust using coroutines as a learning exercise (not homework), and I can't find any reasonable way of connecting each thread to the Receiver and Sender ends of two different channels.
The Receiver is involved in two distinct tasks, namely reporting the highest prime found so far, and supplying further candidate primes for the filter. This is fundamental to the algorithm.
Here is what I would like to do but can't because the Receiver cannot be transferred between threads. Using std::sync::Arc does not appear to help, unsurprisingly.
Please note that I do understand why this doesn't work
fn main() {
let (basetx, baserx): (Sender<u32>, Receiver<u32>) = channel();
let max_number = 103;
thread::spawn(move|| {
generate_natural_numbers(&basetx, max_number);
});
let oldrx = &baserx;
loop {
// we need the prime in this thread
let prime = match oldrx.recv() {
Ok(num) => num,
Err(_) => { break; 0 }
};
println!("{}",prime);
// create (newtx, newrx) in a deliberately unspecified way
// now we need to pass the receiver off to the sieve thread
thread::spawn(move || {
sieve(oldrx, newtx, prime); // forwards numbers if not divisible by prime
});
oldrx = newrx;
}
}
Equivalent working Go code:
func main() {
channel := make(chan int)
var ok bool = true;
var prime int = 0;
go generate(channel, 103)
for true {
prime, ok = <- channel
if !ok {
break;
}
new_channel := make(chan int)
go sieve(channel, new, prime)
channel = new_channel
fmt.Println(prime)
}
}
What is the best way to deal with a situation like this where a Receiver needs to be handed off to a different thread?
You don't really explain what the problem that you are having, but your code is close enough:
use std::sync::mpsc::{channel, Sender, Receiver};
use std::thread;
fn generate_numbers(tx: Sender<u8>) {
for i in 2..100 {
tx.send(i).unwrap();
}
}
fn filter(rx: Receiver<u8>, tx: Sender<u8>, val: u8) {
for v in rx {
if v % val != 0 {
tx.send(v).unwrap();
}
}
}
fn main() {
let (base_tx, base_rx) = channel();
thread::spawn(move || {
generate_numbers(base_tx);
});
let mut old_rx = base_rx;
loop {
let num = match old_rx.recv() {
Ok(v) => v,
Err(_) => break,
};
println!("prime: {}", num);
let (new_tx, new_rx) = channel();
thread::spawn(move || {
filter(old_rx, new_tx, num);
});
old_rx = new_rx;
}
}
using coroutines
Danger, Danger, Will Robinson! These are not coroutines; they are full-fledged threads! These are a lot more heavyweight compared to a coroutine.
What is the best way to deal with a situation like this where a Receiver needs to be handed off to a different thread?
Just... transfer ownership of the Receiver to the thread?
Related
I'm trying to do some parallel processing on a list of values:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let output_list = vec![0.0f32;list.len()];
thread::scope(|s| { // Block S
(0..list.len()).collect::<Vec<_>>().chunks(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|&idx| {
let value = calc_value(list[idx]);
unsafe {
let out = (output_list.as_ptr() as *mut f32).offset(idx as isize);
*out = value;
}
});
});
});
});
output_list
}
The API says that thread::scope() only returns once each thread spawned by the scope it creates has finished. However, the compiler is telling me that the temporary range object (0..list.len()) is deconstructed while the threads that use it might still be alive.
I'm curious about what's actually happening under the hood. My intuition tells me that each thread spawned and variable created within Block S would both have Block S's lifetime. But clearly the threads have a lifetime longer than Block S.
Why aren't these lifetimes be the same?
Is the best practice here to create a variable in Block F that serves the purpose of the temporary like so:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let output_list = vec![0.0f32;list.len()];
let range = (0..list.len()).collect::<Vec<_>>();
thread::scope(|s| { // Block S
range.chunks(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|&idx| {
let value = calc_value(list[idx]);
unsafe {
let out = (output_list.as_ptr() as *mut f32).offset(idx as isize);
*out = value;
}
});
});
});
});
output_list
}
You don't have to ask SO to find out what happens under the hood of std functions, their source code is readily available, but since you asked I'll try to explain a little more in the comments.
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T,
{
// `Scope` creation not very relevant to the issue at hand
let scope = ...;
// here your 'Block S' gets run and returns.
let result = catch_unwind(AssertUnwindSafe(|| f(&scope)));
// the above is just a fancy way of calling `f` while catching any panics.
// but we're waiting for threads to finish running until here
while scope.data.num_running_threads.load(Ordering::Acquire) != 0 {
park();
}
// further not so relevant cleanup code
//...
}
So as you can see your assumption that 'Block S' will stick around as long as any of the threads is wrong.
And yes the solution is to capture the owner of the chunks before you call thread::scope.
There is also no reason to dive into unsafe for your example, you can use zip instead:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let mut output_list = vec![0.0f32; list.len()];
let mut zipped = list
.into_iter()
.zip(output_list.iter_mut())
.collect::<Vec<_>>();
thread::scope(|s| { // Block S
zipped.chunks_mut(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|(v, out)| {
let value = calc_value(*v);
**out = value;
});
});
});
});
output_list
}
The test code below considers a situation in which there are three different threads.
Each thread has to do certain asynchronous tasks, that may take a certain time to finish.
This is "simulated" in the code below with a sleep.
On top of that, two of the threads collect information that they have to send to the third one for further processing. This is done using mpsc channels.
Due to the fact that there are out of our control information obtained from outside of the Rust application, the threads may get interrupted. This is emulated by generating a random number, and the loop on each thread breaks when that happens.
What I'm trying to achieve is a system in which whenever one of the threads has an error (simulated with the random number = 9), every other thread is cancelled too.
`
use std::sync::mpsc::channel;
use std::sync::mpsc::{Sender, Receiver, TryRecvError};
use std::thread::sleep;
use tokio::time::Duration;
use rand::distributions::{Uniform, Distribution};
#[tokio::main]
async fn main() {
execution_cycle().await;
}
async fn execution_cycle() {
let (tx_first, rx_first) = channel::<Message>();
let (tx_second, rx_second) = channel::<Message>();
let handle_sender_first = tokio::spawn(sender_thread(tx_first));
let handle_sender_second = tokio::spawn(sender_thread(tx_second));
let handle_receiver = tokio::spawn(receiver_thread(rx_first, rx_second));
let mut thread_rng = rand::thread_rng();
let rng_generator = Uniform::from(1..10);
let mut cancel_from_cycle = rng_generator.sample(&mut thread_rng);
while !&handle_sender_first.is_finished() && !&handle_sender_second.is_finished() && !&handle_receiver.is_finished() {
cancel_from_cycle = rng_generator.sample(&mut thread_rng);
if (cancel_from_cycle == 9) {
println!("Aborting from the execution cycle.");
handle_receiver.abort();
handle_sender_first.abort();
handle_sender_second.abort();
}
}
if handle_sender_first.is_finished() {
println!("handle_sender_first finished.");
} else {
println!("handle_sender_first ongoing.");
}
if handle_sender_second.is_finished() {
println!("handle_sender_second finished.");
} else {
println!("handle_sender_second ongoing.");
}
if handle_receiver.is_finished() {
println!("handle_receiver finished.");
} else {
println!("handle_receiver ongoing.");
}
}
async fn sender_thread(tx: Sender<Message>) {
let mut thread_rng = rand::thread_rng();
let rng_generator = Uniform::from(1..20);
let mut random_id = rng_generator.sample(&mut thread_rng);
while random_id != 9 {
let msg = Message {
id: random_id,
text: "hello".to_owned()
};
println!("Sending message {}.", msg.id);
random_id = rng_generator.sample(&mut thread_rng);
println!("Generated id {}.", random_id);
let result = tx.send(msg);
match result {
Ok(res) => {},
Err(error) => {
println!("Sending error {:?}", error);
random_id = 9;
}
}
sleep(Duration::from_millis(2000));
}
}
async fn receiver_thread(rx_first: Receiver<Message>, rx_second: Receiver<Message>) {
let mut channel_open_first = true;
let mut channel_open_second = true;
let mut thread_rng = rand::thread_rng();
let rng_generator = Uniform::from(1..15);
let mut random_event = rng_generator.sample(&mut thread_rng);
while channel_open_first && channel_open_second && random_event != 9 {
channel_open_first = receiver_inner(&rx_first);
channel_open_second = receiver_inner(&rx_second);
random_event = rng_generator.sample(&mut thread_rng);
println!("Generated event {}.", random_event);
sleep(Duration::from_millis(800));
}
}
fn receiver_inner(rx: &Receiver<Message>) -> bool {
let value = rx.try_recv();
match value {
Ok(msg) => {
println!("Message {} received: {}", msg.id, msg.text);
},
Err(error) => {
if error != TryRecvError::Empty {
println!("{}", error);
return false;
} else { /* Channel is empty.*/ }
}
}
return true;
}
struct Message {
id: usize,
text: String,
}
`
In the working example here, it does exactly that, however, it does it only from inside the threads, and I would like to add a "kill switch" in the execution_cycle() method, allowing to cancel all the three threads when a certain event takes place (the random number cancel_from_cycle == 9), and do that in the most simple way possible... I tried drop(handler_sender), and also panic!() from the execution_cycle() but the spawn threads keep running, preventing the application to finish. I also tried handle_receiver().abort() without success.
How can I achieve the wished result?
Given several threads that complete with an Output value, how do I get the first Output that's produced? Ideally while still being able to get the remaining Outputs later in the order they're produced, and bearing in mind that some threads may or may not terminate.
Example:
struct Output(i32);
fn main() {
let mut spawned_threads = Vec::new();
for i in 0..10 {
let join_handle: ::std::thread::JoinHandle<Output> = ::std::thread::spawn(move || {
// pretend to do some work that takes some amount of time
::std::thread::sleep(::std::time::Duration::from_millis(
(1000 - (100 * i)) as u64,
));
Output(i) // then pretend to return the `Output` of that work
});
spawned_threads.push(join_handle);
}
// I can do this to wait for each thread to finish and collect all `Output`s
let outputs_in_order_of_thread_spawning = spawned_threads
.into_iter()
.map(::std::thread::JoinHandle::join)
.collect::<Vec<::std::thread::Result<Output>>>();
// but how would I get the `Output`s in order of completed threads?
}
I could solve the problem myself using a shared queue/channels/similar, but are there built-in APIs or existing libraries which could solve this use case for me more elegantly?
I'm looking for an API like:
fn race_threads<A: Send>(
threads: Vec<::std::thread::JoinHandle<A>>
) -> (::std::thread::Result<A>, Vec<::std::thread::JoinHandle<A>>) {
unimplemented!("so far this doesn't seem to exist")
}
(Rayon's join is the closest I could find, but a) it only races 2 closures rather than an arbitrary number of closures, and b) the thread pool w/ work stealing approach doesn't make sense for my use case of having some closures that might run forever.)
It is possible to solve this use case using pointers from How to check if a thread has finished in Rust? just like it's possible to solve this use case using an MPSC channel, however here I'm after a clean API to race n threads (or failing that, n closures on n threads).
These problems can be solved by using a condition variable:
use std::sync::{Arc, Condvar, Mutex};
#[derive(Debug)]
struct Output(i32);
enum State {
Starting,
Joinable,
Joined,
}
fn main() {
let pair = Arc::new((Mutex::new(Vec::new()), Condvar::new()));
let mut spawned_threads = Vec::new();
let &(ref lock, ref cvar) = &*pair;
for i in 0..10 {
let my_pair = pair.clone();
let join_handle: ::std::thread::JoinHandle<Output> = ::std::thread::spawn(move || {
// pretend to do some work that takes some amount of time
::std::thread::sleep(::std::time::Duration::from_millis(
(1000 - (100 * i)) as u64,
));
let &(ref lock, ref cvar) = &*my_pair;
let mut joinable = lock.lock().unwrap();
joinable[i] = State::Joinable;
cvar.notify_one();
Output(i as i32) // then pretend to return the `Output` of that work
});
lock.lock().unwrap().push(State::Starting);
spawned_threads.push(Some(join_handle));
}
let mut should_stop = false;
while !should_stop {
let locked = lock.lock().unwrap();
let mut locked = cvar.wait(locked).unwrap();
should_stop = true;
for (i, state) in locked.iter_mut().enumerate() {
match *state {
State::Starting => {
should_stop = false;
}
State::Joinable => {
*state = State::Joined;
println!("{:?}", spawned_threads[i].take().unwrap().join());
}
State::Joined => (),
}
}
}
}
(playground link)
I'm not claiming this is the simplest way to do it. The condition variable will awake the main thread every time a child thread is done. The list can show the state of each thread, if one is (about to) finish, it can be joined.
No, there is no such API.
You've already been presented with multiple options to solve your problem:
Use channels
Use a CondVar
Use futures
Sometimes when programming, you have to go beyond sticking pre-made blocks together. This is supposed to be a fun part of programming. I encourage you to embrace it. Go create your ideal API using the components available and publish it to crates.io.
I really don't see what's so terrible about the channels version:
use std::{sync::mpsc, thread, time::Duration};
#[derive(Debug)]
struct Output(i32);
fn main() {
let (tx, rx) = mpsc::channel();
for i in 0..10 {
let tx = tx.clone();
thread::spawn(move || {
thread::sleep(Duration::from_millis((1000 - (100 * i)) as u64));
tx.send(Output(i)).unwrap();
});
}
// Don't hold on to the sender ourselves
// Otherwise the loop would never terminate
drop(tx);
for r in rx {
println!("{:?}", r);
}
}
I have a Rust application on on OSX firing up a large amount of threads as can be seen in the code below, however, after looking at how many max threads my version of OSX is allowed to create via the sysctl kern.num_taskthreads command, I can see that it is kern.num_taskthreads: 2048 which explains why I can't spin up over 2048 threads.
How do I go about getting past this hard limit?
let threads = 300000;
let requests = 1;
for _x in 0..threads {
println!("{}", _x);
let request_clone = request.clone();
let handle = thread::spawn(move || {
for _y in 0..requests {
request_clone.lock().unwrap().push((request::Request::new(request::Request::create_request())));
}
});
child_threads.push(handle);
}
Before starting, I'd encourage you to read about the C10K problem. When you get into this scale, there's a lot more things you need to keep in mind.
That being said, I'd suggest looking at mio...
a lightweight IO library for Rust with a focus on adding as little overhead as possible over the OS abstractions.
Specifically, mio provides an event loop, which allows you to handle a large number of connections without spawning threads. Unfortunately, I don't know of a HTTP library that currently supports mio. You could create one and be a hero to the Rust community!
Not sure how helpful this will be, but I was trying to create a small pool of threads that will create connections and then send them over to an event loop via a channel for reading.
I'm sure this code is probably pretty bad, but here it is anyways for examples. It uses the Hyper library, like you mentioned.
extern crate hyper;
use std::io::Read;
use std::thread;
use std::thread::{JoinHandle};
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use hyper::Client;
use hyper::client::Response;
use hyper::header::Connection;
const TARGET: i32 = 100;
const THREADS: i32 = 10;
struct ResponseWithString {
index: i32,
response: Response,
data: Vec<u8>,
complete: bool
}
fn main() {
// Create a client.
let url: &'static str = "http://www.gooogle.com/";
let mut threads = Vec::<JoinHandle<()>>::with_capacity((TARGET * 2) as usize);
let conn_count = Arc::new(Mutex::new(0));
let (tx, rx) = channel::<ResponseWithString>();
for _ in 0..THREADS {
// Move var references into thread context
let conn_count = conn_count.clone();
let tx = tx.clone();
let t = thread::spawn(move || {
loop {
let idx: i32;
{
// Lock, increment, and release
let mut count = conn_count.lock().unwrap();
*count += 1;
idx = *count;
}
if idx > TARGET {
break;
}
let mut client = Client::new();
// Creating an outgoing request.
println!("Creating connection {}...", idx);
let res = client.get(url) // Get URL...
.header(Connection::close()) // Set headers...
.send().unwrap(); // Fire!
println!("Pushing response {}...", idx);
tx.send(ResponseWithString {
index: idx,
response: res,
data: Vec::<u8>::with_capacity(1024),
complete: false
}).unwrap();
}
});
threads.push(t);
}
let mut responses = Vec::<ResponseWithString>::with_capacity(TARGET as usize);
let mut buf: [u8; 1024] = [0; 1024];
let mut completed_count = 0;
loop {
if completed_count >= TARGET {
break; // No more work!
}
match rx.try_recv() {
Ok(r) => {
println!("Incoming response! {}", r.index);
responses.push(r)
},
_ => { }
}
for r in &mut responses {
if r.complete {
continue;
}
// Read the Response.
let res = &mut r.response;
let data = &mut r.data;
let idx = &r.index;
match res.read(&mut buf) {
Ok(i) => {
if i == 0 {
println!("No more data! {}", idx);
r.complete = true;
completed_count += 1;
}
else {
println!("Got data! {} => {}", idx, i);
for x in 0..i {
data.push(buf[x]);
}
}
}
Err(e) => {
panic!("Oh no! {} {}", idx, e);
}
}
}
}
}
use std::iter;
fn worker_sum(from: u64, to: u64) -> u64 {
range(from, to).fold(0u64, |sum, x| sum + x)
}
fn main() {
let max = 5u64;
let step = 2u64;
let (sender, receiver) = channel::<u64>();
for x in iter::range_step_inclusive(0u64, max, step) {
let end = if x + step > max { max } else { x + step };
//println!("{} -> {} = {}", x, end, worker_sum(x, end));
let local_sender = sender.clone();
spawn(proc(){
local_sender.send(worker_sum(x, end));
});
}
loop {
match receiver.try_recv() {
Ok(x) => println!("{}", x),
Err(_) => break,
}
}
}
I get the error:
task '' failed at 'sending on a closed channel', /home/rustbuild/src/rust-buildbot/slave/nightly-linux/build/src/libsync/comm/mod.rs:573
I somehow understand the problem, but how to properly "select" from the channel? The documentation is really sparse, even though I'm using the nightly build, which is said to have improved the docs (since version 0.13).
So my questions are:
How to solve the problem with as little structural changes in the code as possible?
How to make the code idiomatic?
The problem you have here is that the channel becomes closed by the reading task before all data is sent. Your loop is:
loop {
match receiver.try_recv() {
Ok(x) => println!("{}", x),
Err(_) => break,
}
}
In this loop, your receiver breaks as soon as it meets and error. Once the loop is broken, your function will reach the end of its scope and the receiver will be destroyed. Once this is done, any attempt to send more data will fail.
The problem here, is that your receiver gets an Err(Empty) because the senders have not yet sent anything. You must wait for them and only break when meeting an Err(Disconnected)
You need to change your code to something like this (explanations in comments) :
use std::iter;
fn worker_sum(from: u64, to: u64) -> u64 {
range(from, to).fold(0u64, |sum, x| sum + x)
}
fn main() {
let max = 5u64;
let step = 2u64;
let (sender, receiver) = channel::<u64>();
for x in iter::range_step_inclusive(0u64, max, step) {
let end = if x + step > max { max } else { x + step };
// here, each thread will own its own sender, and the channel will
// be closed once all senders are destroyed.
let local_sender = sender.clone();
spawn(proc(){
local_sender.send(worker_sum(x, end));
// Once we reach here, the sender of this task is destroyed.
});
}
// We destroy the sender of the main task,
// because we don't want to wait for it:
// it would deadlock the program
drop(sender);
loop {
match receiver.try_recv() {
Ok(x) => println!("{}", x),
// We break only if the channel is closed,
// it means that all senders are finished.
Err(e) if e == ::std::comm::Disconnected => { break; },
_ => {}
}
}
}