Channels for passing hashmap between threads | stuck in loop | Rust - multithreading

I am solving a problem for the website Exercism in rust, where I basically try to concurrently count how many times different letters occur in some text. I am doing this by passing hashmaps between threads, and somehow am in some kind of infinite loop. I think the issue is in my handling of the receiver, but I really don't know. Please help.
use std::collections::HashMap;
use std::thread;
use std::sync::mpsc;
use std::str;
pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
// Empty case
if input.is_empty() {
return HashMap::new();
}
// Flatten input, set workload for each thread, create hashmap to catch results
let mut flat_input = input.join("");
let workload = input.len() / worker_count;
let mut final_map: HashMap<char, usize> = HashMap::new();
let (tx, rx) = mpsc::channel();
for _i in 0..worker_count {
let task = flat_input.split_off(flat_input.len() - workload);
let tx_clone = mpsc::Sender::clone(&tx);
// Separate threads ---------------------------------------------
thread::spawn(move || {
let mut partial_map: HashMap<char, usize> = HashMap::new();
for letter in task.chars() {
match partial_map.remove(&letter) {
Some(count) => {
partial_map.insert(letter, count + 1);
},
None => {
partial_map.insert(letter, 1);
}
}
}
tx_clone.send(partial_map).expect("Didn't work fool");
});
// --------------------------------------------------
}
// iterate through the returned hashmaps to update the final map
for received in rx {
for (key, value) in received {
match final_map.remove(&key) {
Some(count) => {
final_map.insert(key, count + value);
},
None => {
final_map.insert(key, value);
}
}
}
}
return final_map;
}

Iterating on the receiver rx will block for new messages while senders exist. The ones you've cloned into the threads will drop out of scope when they're done, but you have the original sender tx still in scope.
You can force tx out of scope by dropping it manually:
for _i in 0..worker_count {
...
}
std::mem::drop(tx); // <--------
for received in rx {
...
}

Related

deno_runtime running multiple invokes on single worker concurrently

I'm trying to run multiple invocation of the same script on a single deno MainWorker concurrently, and waiting for their
results (since the scripts can be async). Conceptually, I want something like the loop in run_worker below.
type Tx = Sender<(String, Sender<String>)>;
type Rx = Receiver<(String, Sender<String>)>;
struct Runner {
worker: MainWorker,
futures: FuturesUnordered<Pin<Box<dyn Future<Output=(String, Result<Global<Value>, Error>)>>>>,
response_futures: FuturesUnordered<Pin<Box<dyn Future<Output=(String, Result<(), SendError<String>>)>>>>,
result_senders: HashMap<String, Sender<String>>,
}
impl Runner {
fn new() ...
async fn run_worker(&mut self, rx: &mut Rx, main_module: ModuleSpecifier, user_module: ModuleSpecifier) {
self.worker.execute_main_module(&main_module).await.unwrap();
self.worker.preload_side_module(&user_module).await.unwrap();
loop {
tokio::select! {
msg = rx.recv() => {
if let Some((id, sender)) = msg {
let global = self.worker.js_runtime.execute_script("test", "mod.entry()").unwrap();
self.result_senders.insert(id, sender);
self.futures.push(Box::pin(async {
let resolved = self.worker.js_runtime.resolve_value(global).await;
return (id, resolved);
}));
}
},
script_result = self.futures.next() => {
if let Some((id, out)) = script_result {
self.response_futures.push(Box::pin(async {
let value = deserialize_value(out.unwrap(), &mut self.worker);
let res = self.result_senders.remove(&id).unwrap().send(value).await;
return (id.clone(), res);
}));
}
},
// also handle response_futures here
else => break,
}
}
}
}
The worker can't be borrowed as mutable multiple times, so this won't work. So the worker has to be a RefCell, and
I've created a BorrowingFuture:
struct BorrowingFuture {
worker: RefCell<MainWorker>,
global: Global<Value>,
id: String
}
And its poll implementation:
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
match Pin::new(&mut Box::pin(self.worker.borrow_mut().js_runtime.resolve_value(self.global.clone()))).poll(cx) {
Poll::Ready(result) => Poll::Ready((self.id.clone(), result)),
Poll::Pending => {
cx.waker().clone().wake_by_ref();
Poll::Pending
}
}
}
So the above
self.futures.push(Box::pin(async {
let resolved = self.worker.js_runtime.resolve_value(global).await;
return (id, resolved);
}));
would become
self.futures.push(Box::pin(BorrowingFuture{worker: self.worker, global: global.clone(), id: id.clone()}));
and this would have to be done for the response_futures above as well.
But I see a few issues with this.
Creating a new future on every poll and then polling that seems wrong, but it does work.
It probably has a performance impact because new objects are created constantly.
The same issue would happen for the response futures, which would call send on each poll, which seems completely wrong.
The waker.wake_by_ref is called on every poll, because there is no way to know when a script result
will resolve. This results in the future being polled thousands (and more) times per second (always creating a new object),
which could be the same as checking it in a loop, I guess.
Note My current setup doesn't use select!, but an enum as Output from multiple Future implementations, pushed into
a single FuturesUnordered, and then matched to handle the correct type (script, send, receive). I used select here because
it's far less verbose, and gets the point across.
Is there a way to do this better/more efficiently? Or is it just not the way MainWorker was meant to be used?
main for completeness:
#[tokio::main]
async fn main() {
let main_module = deno_runtime::deno_core::resolve_url(MAIN_MODULE_SPECIFIER).unwrap();
let user_module = deno_runtime::deno_core::resolve_url(USER_MODULE_SPECIFIER).unwrap();
let (tx, mut rx) = channel(1);
let (result_tx, mut result_rx) = channel(1);
let handle = thread::spawn(move || {
let runtime = tokio::runtime::Builder::new_multi_thread().enable_all().build().unwrap();
let mut runner = Runner::new();
runtime.block_on(runner.run_worker(&mut rx, main_module, user_module));
});
tx.send(("test input".to_string(), result_tx)).await.unwrap();
let result = result_rx.recv().await.unwrap();
println!("result from worker {}", result);
handle.join().unwrap();
}

How to pass a value over a channel without borrow checking issues?

In the following code, I understand why I'm not allowed to do this(I think), but I'm not sure what to do to fix the issue. I'm simply trying to perform an action based upon an incoming message on a UDPSocket. However, by sending the reference to the slice over the channel, I get a problem where the buffer doesn't live long enough. I'm hoping for some suggestions because I don't know enough about Rust to move forward.
fn main() -> std::io::Result<()> {
let (tx, rx) = mpsc::channel();
thread::spawn(move || loop {
match rx.try_recv() {
Ok(msg) => {
match msg {
"begin" => // run an operation
"end" | _ => // kill the previous operation
}
}
Err = { //Error Handling }
}
}
// start listener
let socket: UdpSocket = UdpSocket::bind("0.0.0.0:9001")?;
loop {
let mut buffer = [0; 100];
let (length, src_address) = socket.recv_from(&mut buffer)?;
println!("Received message of {} bytes from {}", length, src_address);
let cmd= str::from_utf8(&buffer[0..length]).unwrap(); // <- buffer does not live long enough
println!("Command: {}", cmd);
tx.send(cmd).expect("unable to send message to channel"); // Error goes away if I remove this.
}
}
Generally you should avoid sending non-owned values over a channel since its unlikely that a lifetime would be valid for both the sender and receiver (its possible to do, but you'd have to plan for it).
In this situation, you're trying to share pass &str across the channel but since it just references buffer which isn't guaranteed to exist whenever rx receives it, you get a borrow checking error. You would probably want to convert the &str into an owned String and pass that over the channel:
use std::net::UdpSocket;
use std::sync::mpsc;
fn main() {
let (tx, rx) = mpsc::channel();
std::thread::spawn(move || loop {
match rx.recv().as_deref() {
Ok("begin") => { /* run an operation */ }
Ok("end") => { /* kill the previous operation */ }
Ok(_) => { /* unknown */ }
Err(_) => { break; }
}
});
let socket = UdpSocket::bind("0.0.0.0:9001").unwrap();
loop {
let mut buffer = [0; 100];
let (length, src_address) = socket.recv_from(&mut buffer).unwrap();
let cmd = std::str::from_utf8(&buffer[0..length]).unwrap();
tx.send(cmd.to_owned()).unwrap();
}
}
As proposed in the comments, you can avoid allocating a string if you parse the value into a known value for an enum and send that across the channel instead:
use std::net::UdpSocket;
use std::sync::mpsc;
enum Command {
Begin,
End,
}
fn main() {
let (tx, rx) = mpsc::channel();
std::thread::spawn(move || loop {
match rx.recv() {
Ok(Command::Begin) => { /* run an operation */ }
Ok(Command::End) => { /* kill the previous operation */ }
Err(_) => { break; }
}
});
let socket = UdpSocket::bind("0.0.0.0:9001").unwrap();
loop {
let mut buffer = [0; 100];
let (length, src_address) = socket.recv_from(&mut buffer).unwrap();
let cmd = std::str::from_utf8(&buffer[0..length]).unwrap();
let cmd = match cmd {
"begin" => Command::Begin,
"end" => Command::End,
_ => panic!("unknown command")
};
tx.send(cmd).unwrap();
}
}

How to create a ring communication between threads using mpsc channels?

I want to spawn n threads with the ability to communicate to other threads in a ring topology, e.g. thread 0 can send messages to thread 1, thread 1 to thread 2, etc. and thread n to thread 0.
This is an example of what I want to achieve with n=3:
use std::sync::mpsc::{self, Receiver, Sender};
use std::thread;
let (tx0, rx0): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let (tx1, rx1): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let (tx2, rx2): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let child0 = thread::spawn(move || {
tx0.send(0).unwrap();
println!("thread 0 sent: 0");
println!("thread 0 recv: {:?}", rx2.recv().unwrap());
});
let child1 = thread::spawn(move || {
tx1.send(1).unwrap();
println!("thread 1 sent: 1");
println!("thread 1 recv: {:?}", rx0.recv().unwrap());
});
let child2 = thread::spawn(move || {
tx2.send(2).unwrap();
println!("thread 2 sent: 2");
println!("thread 2 recv: {:?}", rx1.recv().unwrap());
});
child0.join();
child1.join();
child2.join();
Here I create channels in a loop, store them in a vector, reorder the senders, store them in a new vector and then spawn threads each with their own Sender-Receiver (tx1/rx0, tx2/rx1, etc.) pair.
const NTHREADS: usize = 8;
// create n channels
let channels: Vec<(Sender<i32>, Receiver<i32>)> =
(0..NTHREADS).into_iter().map(|_| mpsc::channel()).collect();
// switch tupel entries for the senders to create ring topology
let mut channels_ring: Vec<(Sender<i32>, Receiver<i32>)> = (0..NTHREADS)
.into_iter()
.map(|i| {
(
channels[if i < channels.len() - 1 { i + 1 } else { 0 }].0,
channels[i].1,
)
})
.collect();
let mut children = Vec::new();
for i in 0..NTHREADS {
let (tx, rx) = channels_ring.remove(i);
let child = thread::spawn(move || {
tx.send(i).unwrap();
println!("thread {} sent: {}", i, i);
println!("thread {} recv: {:?}", i, rx.recv().unwrap());
});
children.push(child);
}
for child in children {
let _ = child.join();
}
This doesn't work, because Sender cannot be copied to create a new vector.
However, if I use refs (& Sender):
let mut channels_ring: Vec<(&Sender<i32>, Receiver<i32>)> = (0..NTHREADS)
.into_iter()
.map(|i| {
(
&channels[if i < channels.len() - 1 { i + 1 } else { 0 }].0,
channels[i].1,
)
})
.collect();
I cannot spawn the threads, because std::sync::mpsc::Sender<i32> cannot be shared between threads safely.
Senders and Receivers cannot be shared so you need to move them into their respective threads. That means removing them from the Vec or else consuming the Vec while iterating it - the vector is not permitted to be in an invalid state (with holes), even as an intermediate step. Iterating over the vectors with into_iter will achieve that by consuming them.
A little trick you can use to get the the senders and receivers to pair up in a cycle, is to create two vectors; one of senders and one of receivers; and then rotate one so that the same index into each vector will give you the pairs you want.
use std::sync::mpsc::{self, Receiver, Sender};
use std::thread;
fn main() {
const NTHREADS: usize = 8;
// create n channels
let (mut senders, receivers): (Vec<Sender<i32>>, Vec<Receiver<i32>>) =
(0..NTHREADS).into_iter().map(|_| mpsc::channel()).unzip();
// move the first sender to the back
senders.rotate_left(1);
let children: Vec<_> = senders
.into_iter()
.zip(receivers.into_iter())
.enumerate()
.map(|(i, (tx, rx))| {
thread::spawn(move || {
tx.send(i as i32).unwrap();
println!("thread {} sent: {}", i, i);
println!("thread {} recv: {:?}", i, rx.recv().unwrap());
})
})
.collect();
for child in children {
let _ = child.join();
}
}
This doesn't work, because Sender cannot be copied to create a new vector. However, if I use refs (& Sender):
While it's true that Sender cannot be copied, it does implement Clone, so you can always clone it manually. But that approach won't work for Receiver, which is not Clone and which you also need to extract from the vector.
The problem with your first code is that you cannot use let foo = vec[i] to move just one value out of a vector of non-Copy values. That would leave the vector in an invalid state, with one element invalid, subsequent access to which would cause undefined behavior. For this to work, Vec would need to track which elements were moved and which not, which would impose a cost on all Vecs. So instead, Vec disallows moving an element out of it, leaving it to the user to track moves.
A simple way to move a value out of Vec is to replace Vec<T> with Vec<Option<T>> and use Option::take. foo = vec[i] is replaced with foo = vec[i].take().unwrap(), which moves the T value from the option in vec[i] (while asserting that it's not None) and leaves None, a valid variant of Option<T>, in the vector. Here is your first attempt modified in that manner (playground):
const NTHREADS: usize = 8;
let channels_ring: Vec<_> = {
let mut channels: Vec<_> = (0..NTHREADS)
.into_iter()
.map(|_| {
let (tx, rx) = mpsc::channel();
(Some(tx), Some(rx))
})
.collect();
(0..NTHREADS)
.into_iter()
.map(|rxpos| {
let txpos = if rxpos < NTHREADS - 1 { rxpos + 1 } else { 0 };
(
channels[txpos].0.take().unwrap(),
channels[rxpos].1.take().unwrap(),
)
})
.collect()
};
let children: Vec<_> = channels_ring
.into_iter()
.enumerate()
.map(|(i, (tx, rx))| {
thread::spawn(move || {
tx.send(i as i32).unwrap();
println!("thread {} sent: {}", i, i);
println!("thread {} recv: {:?}", i, rx.recv().unwrap());
})
})
.collect();
for child in children {
child.join().unwrap();
}

Application on OSX cannot spawn more than 2048 threads

I have a Rust application on on OSX firing up a large amount of threads as can be seen in the code below, however, after looking at how many max threads my version of OSX is allowed to create via the sysctl kern.num_taskthreads command, I can see that it is kern.num_taskthreads: 2048 which explains why I can't spin up over 2048 threads.
How do I go about getting past this hard limit?
let threads = 300000;
let requests = 1;
for _x in 0..threads {
println!("{}", _x);
let request_clone = request.clone();
let handle = thread::spawn(move || {
for _y in 0..requests {
request_clone.lock().unwrap().push((request::Request::new(request::Request::create_request())));
}
});
child_threads.push(handle);
}
Before starting, I'd encourage you to read about the C10K problem. When you get into this scale, there's a lot more things you need to keep in mind.
That being said, I'd suggest looking at mio...
a lightweight IO library for Rust with a focus on adding as little overhead as possible over the OS abstractions.
Specifically, mio provides an event loop, which allows you to handle a large number of connections without spawning threads. Unfortunately, I don't know of a HTTP library that currently supports mio. You could create one and be a hero to the Rust community!
Not sure how helpful this will be, but I was trying to create a small pool of threads that will create connections and then send them over to an event loop via a channel for reading.
I'm sure this code is probably pretty bad, but here it is anyways for examples. It uses the Hyper library, like you mentioned.
extern crate hyper;
use std::io::Read;
use std::thread;
use std::thread::{JoinHandle};
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use hyper::Client;
use hyper::client::Response;
use hyper::header::Connection;
const TARGET: i32 = 100;
const THREADS: i32 = 10;
struct ResponseWithString {
index: i32,
response: Response,
data: Vec<u8>,
complete: bool
}
fn main() {
// Create a client.
let url: &'static str = "http://www.gooogle.com/";
let mut threads = Vec::<JoinHandle<()>>::with_capacity((TARGET * 2) as usize);
let conn_count = Arc::new(Mutex::new(0));
let (tx, rx) = channel::<ResponseWithString>();
for _ in 0..THREADS {
// Move var references into thread context
let conn_count = conn_count.clone();
let tx = tx.clone();
let t = thread::spawn(move || {
loop {
let idx: i32;
{
// Lock, increment, and release
let mut count = conn_count.lock().unwrap();
*count += 1;
idx = *count;
}
if idx > TARGET {
break;
}
let mut client = Client::new();
// Creating an outgoing request.
println!("Creating connection {}...", idx);
let res = client.get(url) // Get URL...
.header(Connection::close()) // Set headers...
.send().unwrap(); // Fire!
println!("Pushing response {}...", idx);
tx.send(ResponseWithString {
index: idx,
response: res,
data: Vec::<u8>::with_capacity(1024),
complete: false
}).unwrap();
}
});
threads.push(t);
}
let mut responses = Vec::<ResponseWithString>::with_capacity(TARGET as usize);
let mut buf: [u8; 1024] = [0; 1024];
let mut completed_count = 0;
loop {
if completed_count >= TARGET {
break; // No more work!
}
match rx.try_recv() {
Ok(r) => {
println!("Incoming response! {}", r.index);
responses.push(r)
},
_ => { }
}
for r in &mut responses {
if r.complete {
continue;
}
// Read the Response.
let res = &mut r.response;
let data = &mut r.data;
let idx = &r.index;
match res.read(&mut buf) {
Ok(i) => {
if i == 0 {
println!("No more data! {}", idx);
r.complete = true;
completed_count += 1;
}
else {
println!("Got data! {} => {}", idx, i);
for x in 0..i {
data.push(buf[x]);
}
}
}
Err(e) => {
panic!("Oh no! {} {}", idx, e);
}
}
}
}
}

Split/gather pattern for jobs

I have a set of jobs that I am trying to run in parallel. I want to run each task on its own thread and gather the responses on the calling thread.
Some jobs may take much longer than others, so I'd like to start using each result as it comes in, and not have to wait for all jobs to complete.
Here is an attempt:
struct Container<T> {
items : Vec<T>
}
#[derive(Debug)]
struct Item {
x: i32
}
impl Item {
fn foo (&mut self) {
self.x += 1; //consider an expensive mutating computation
}
}
fn main() {
use std;
use std::sync::{Mutex, Arc};
use std::collections::RingBuf;
//set up a container with 2 items
let mut item1 = Item { x: 0};
let mut item2 = Item { x: 1};
let container = Container { items: vec![item1, item2]};
//set a gather system for our results
let ringBuf = Arc::new(Mutex::new(RingBuf::<Item>::new()));
//farm out each job to its own thread...
for item in container.items {
std::thread::Thread::spawn(|| {
item.foo(); //job
ringBuf.lock().unwrap().push_back(item); //push item back to caller
});
}
loop {
let rb = ringBuf.lock().unwrap();
if rb.len() > 0 { //gather results as soon as they are available
println!("{:?}",rb[0]);
rb.pop_front();
}
}
}
For starters, this does not compile due to the impenetrable cannot infer an appropriate lifetime due to conflicting requirements error.
What am I doing wrong and how do I do it right?
You've got a couple compounding issues, but the first one is a misuse / misunderstanding of Arc. You need to give each thread it's own copy of the Arc. Arc itself will make sure that changes are synchronized. The main changes were the addition of .clone() and the move keyword:
for item in container.items {
let mrb = ringBuf.clone();
std::thread::Thread::spawn(move || {
item.foo(); //job
mrb.lock().unwrap().push_back(item); //push item back to caller
});
}
After changing this, you'll run into some simpler errors about forgotten mut qualifiers, and then you hit another problem - you are trying to send mutable references across threads. Your for loop will need to return &mut Item to call foo, but this doesn't match your Vec. Changing it, we can get to something that compiles:
for mut item in container.items.into_iter() {
let mrb = ringBuf.clone();
std::thread::Thread::spawn(move || {
item.foo(); //job
mrb.lock().unwrap().push_back(item); //push item back to caller
});
}
Here, we consume the input vector, moving each of the Items to the worker thread. Unfortunately, this hits the Playpen timeout, so there's probably some deeper issue.
All that being said, I'd highly recommend using channels:
#![feature(std_misc)]
use std::sync::mpsc::channel;
#[derive(Debug)]
struct Item {
x: i32
}
impl Item {
fn foo(&mut self) { self.x += 1; }
}
fn main() {
let items = vec![Item { x: 0 }, Item { x: 1 }];
let rx = {
let (tx, rx) = channel();
for item in items.into_iter() {
let my_tx = tx.clone();
std::thread::Thread::spawn(move || {
let mut item = item;
item.foo();
my_tx.send(item).unwrap();
});
}
rx
};
for item in rx.iter() {
println!("{:?}", item);
}
}
This also times-out in the playpen, but works fine when compiled and run locally.

Resources