How can I send a message to a specific thread? - rust

I need to create some threads where some of them are going to run until their runner variable value has been changed. This is my minimal code.
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
let mut log_runner = Arc::new(Mutex::new(true));
println!("{}", *log_runner.lock().unwrap());
let mut threads = Vec::new();
{
let mut log_runner_ref = Arc::clone(&log_runner);
// log runner thread
let handle = thread::spawn(move || {
while *log_runner_ref.lock().unwrap() == true {
// DO SOME THINGS CONTINUOUSLY
println!("I'm a separate thread!");
}
});
threads.push(handle);
}
// let the main thread to sleep for x time
thread::sleep(Duration::from_millis(1));
// stop the log_runner thread
*log_runner.lock().unwrap() = false;
// join all threads
for handle in threads {
handle.join().unwrap();
println!("Thread joined!");
}
println!("{}", *log_runner.lock().unwrap());
}
It looks like I'm able to set the log_runner_ref in the log runner thread after 1 second to false. Is there a way to mark the treads with some name / ID or something similar and send a message to a specific thread using its specific marker (name / ID)?
If I understand it correctly, then the let (tx, rx) = mpsc::channel(); can be used for sending messages to all the threads simultaneously rather than to a specific one. I could send some identifier with the messages and each thread will be looking for its own identifier for the decision if to act on received message or not, but I would like to avoid the broadcasting effect.

MPSC stands for Multiple Producers, Single Consumer. As such, no, you cannot use that by itself to send a message to all threads, since for that you'd have to be able to duplicate the consumer. There are tools for this, but the choice of them requires a bit more info than just "MPMC" or "SPMC".
Honestly, if you can rely on channels for messaging (there are cases where it'd be a bad idea), you can create a channel per thread, assign the ID outside of the thread, and keep a HashMap instead of a Vec with the IDs associated to the threads. Receiver<T> can be moved into the thread (it implements Send if T implements Send), so you can quite literally move it in.
You then keep the Sender outside and send stuff to it :-)

Related

Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
}
impl<T> Channel<T> {
pub fn send(&self, value: T) {
self.values.lock().unwrap()
.push_back(value);
// Notify the consumer a value is available
self.notify.notify_one();
}
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
}
// Wait for values to be available
self.notify.notified().await;
}
}
}
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
#[tokio::main]
async fn main() {
let mut i = 0;
loop{
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
});
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
sender.send(1);
});
}
}else{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
receiver.recv().await;
});
handles.push(handle);
}
}
}
futures::future::join_all(handles).await;
i += 1;
println!("No.{i} loop finished.");
}
}
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

How can I move the data between threads safely?

I'm currently trying to call a function to which I pass multiple file names and expect the function to read the files and generate the appropriate structs and return them in a Vec<Audit>. I've been able to accomplish it reading the files one by one but I want to achieve it using threads.
This is the function:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut audits = Arc::new(Mutex::new(vec![]));
let mut handlers = vec![];
for file in files {
let audits = Arc::clone(&audits);
handlers.push(thread::spawn(move || {
let mut audits = audits.lock().unwrap();
audits.push(audit_from_xml_file(file.clone()));
audits
}));
}
for handle in handlers {
let _ = handle.join();
}
audits
.lock()
.unwrap()
.into_iter()
.fold(vec![], |mut result, audit| {
result.push(audit);
result
})
}
But it won't compile due to the following error:
error[E0277]: `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
--> src/main.rs:82:23
|
82 | handlers.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
|
::: /home/enthys/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:618:8
I have tried wrapping the generated Audit structs in Some(Audit) to avoid the MutexGuard but then I stumble with Poisonned Thread issues.
The cause of the error is that after after pushing the new Audit into the (locked) audits vec you then try to return the vec's MutexGuard.
In Rust, a thread's function can actually return values, the point of doing that is to send the value back to whoever is join-ing the thread. This means the value is going to move between threads, so the value needs to be movable betweem threads (aka Send), which mutex guards have no reason to be[0].
The easy solution is to just... not do that. Just delete the last line of the spawn function. Though it's not like the code works after that as you still have borrowing issue related to the thing at the end.
An alternative is to lean into the feature (especially if Audit objects are not too big): drop the audits vec entirely and instead have each thread return its audit, then collect from the handlers when you join them:
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers.into_iter()
.map(|handler| handler.join().unwrap())
.collect()
}
Though at that point you might as well just let Rayon handle it:
use rayon::prelude::*;
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
files.into_par_iter().map(audit_from_xml_file).collect()
}
That also avoids crashing the program or bringing the machine to its knees if you happen to have millions of files.
[0] and all the reasons not to be, locking on one thread and unlocking on an other is not necessarily supported e.g. ReleaseMutex
The ReleaseMutex function fails if the calling thread does not own the mutex object.
(NB: in the windows lingo, "owning" a mutex means having acquired it via WaitForSingleObject, which translates to lock in posix lingo)
and can be plain UB e.g. pthread_mutex_unlock
If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.
Your problem is that you are passing your Vec<Audit> (or more precisely the MutexGuard<Vec<Audit>>), to the threads and back again, without really needing it.
And you don't need Mutex or Arc for this simpler task:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers
.into_iter()
.flat_map(|x| x.join())
.collect()
}

Using thread unsafe values in an async block

In this code snippet (playground link), we have some simple communication between two threads. The main thread (which executes the second async block) sends 2 to thread 2 in the async move block, which receives it, adds its own value, and sends the result back over another channel to the main thread, which prints the value.
Thread 2 contains some local state, the thread_unsafe variable, which is neither Send nor Sync, and is maintained across an .await. Therefore the impl Future object that we are creating is itself neither Send nor Sync, and hence the call to pool.spawn_ok is a compile error.
However, this seems like it should be fine. I understand why spawn_ok() can't accept a future that is not Send, and I also understand why the compilation of the async block into a state machine results in a struct that contains a non-Send value, but in this example the only thing I want to send to the other thread is recv and send2. How do I express that the future should switch to non-thread safe mode only after it has been sent?
use std::rc::Rc;
use std::cell::RefCell;
use futures::channel::oneshot::channel;
use futures::executor::{ThreadPool, block_on};
fn main() {
let pool = ThreadPool::new().unwrap();
let (send, recv) = channel();
let (send2, recv2) = channel();
pool.spawn_ok(async move {
let thread_unsafe = Rc::new(RefCell::new(40));
let a = recv.await.unwrap();
send2.send(a + *thread_unsafe.borrow()).unwrap();
});
let r = block_on(async {
send.send(2).unwrap();
recv2.await.unwrap()
});
println!("the answer is {}", r)
}
but in this example the only thing I want to send to the other thread is recv and send2
There is also the local variable thread_unsafe which is used across an .await. Since .await can suspend an async function, and later resume it on another thread, this could send thread_unsafe to a different thread, which is not allowed.

What are idiomatic ways to send data between threads?

I want to do some calculation in a separate thread, and then recover the data from the main thread. What are the canonical ways to pass some data from a thread to another in Rust?
fn main() {
let handle = std::thread::spawn(|| {
// I want to send this to the main thread:
String::from("Hello world!")
});
// How to recover the data from the other thread?
handle.join().unwrap();
}
There are lots of ways to send send data between threads -- without a clear "best" solution. It depends on your situation.
Using just thread::join
Many people do not realize that you can very easily send data with only the thread API, but only twice: once to the new thread and once back.
use std::thread;
let data_in = String::from("lots of data");
let handle = thread::spawn(move || {
println!("{}", data_in); // we can use the data here!
let data_out = heavy_compuations();
data_out // <-- simply return the data from the closure
});
let data_out = handle.join().expect("thread panicked :(");
println!("{}", data_out); // we can use the data generated in the thread here!
(Playground)
This is immensely useful for threads that are just spawned to do one specific job. Note the move keyword before the closure that makes sure all referenced variables are moved into the closure (which is then moved to another thread).
Channels from std
The standard library offers a multi producer single consumer channel in std::sync::mpsc. You can send arbitrarily many values through a channel, so it can be used in more situations. Simple example:
use std::{
sync::mpsc::channel,
thread,
time::Duration,
};
let (sender, receiver) = channel();
thread::spawn(move || {
sender.send("heavy computation 1").expect("receiver hung up :(");
thread::sleep(Duration::from_millis(500));
sender.send("heavy computation 2").expect("receiver hung up :(");
});
let result1 = receiver.recv().unwrap();
let result2 = receiver.recv().unwrap();
(Playground)
Of course you can create another channel to provide communication in the other direction as well.
More powerful channels by crossbeam
Unfortunately, the standard library currently only provides channels that are restricted to a single consumer (i.e. Receiver can't be cloned). To get more powerful channels, you probably want to use the channels from the awesome crossbeam library. Their description:
This crate is an alternative to std::sync::mpsc with more features and better performance.
In particular, it is a mpmc (multi consumer!) channel. This provides a nice way to easily share work between multiple threads. Example:
use std::thread;
// You might want to use a bounded channel instead...
let (sender, receiver) = crossbeam_channel::unbounded();
for _ in 0..num_cpus::get() {
let receiver = receiver.clone(); // clone for this thread
thread::spawn(move || {
for job in receiver {
// process job
}
});
}
// Generate jobs
for x in 0..10_000 {
sender.send(x).expect("all threads hung up :(");
}
(Playground)
Again, adding another channel allows you to communicate results back to the main thread.
Other methods
There are plenty of other crates that offer some other means of sending data between threads. Too many to list them here.
Note that sending data is not the only way to communicate between threads. There is also the possibility to share data between threads via Mutex, atomics, lock-free data structures and many other ways. This is conceptually very different. It depends on the situation whether sending or sharing data is the better way to describe your cross thread communication.
The idiomatic way to do so is to use a channel. It conceptually behaves like an unidirectional tunnel: you put something in one end and it comes out the other side.
use std::sync::mpsc::channel;
fn main() {
let (sender, receiver) = channel();
let handle = std::thread::spawn(move || {
sender.send(String::from("Hello world!")).unwrap();
});
let data = receiver.recv().unwrap();
println!("Got {:?}", data);
handle.join().unwrap();
}
The channel won't work anymore when the receiver is dropped.
They are mainly 3 ways to recover the data:
recv will block until something is received
try_recv will return immediately. If the channel is not closed, it is either Ok(data) or Err(TryRevcError::Empty).
recv_timeout is the same as try_recv but it waits to get a data a certain amount of time.

Is it possible to share a HashMap between threads without locking the entire HashMap?

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don't want to lock the whole HashMap for a single update/remove, so my HashMap looks something like HashMap<u8, Mutex<u8>>. This works, but it makes no sense since the thread will lock the whole map anyways.
Here's this working version, without threads; I don't think that's necessary for the example.
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
fn main() {
let s = Arc::new(Mutex::new(S::new()));
let z = s.clone();
let _ = z.lock().unwrap();
}
struct S {
x: HashMap<u8, Mutex<u8>>, // other non-mutable fields
}
impl S {
pub fn new() -> S {
S {
x: HashMap::default(),
}
}
}
Playground
Is this possible in any way? Is there something obvious I missed in the documentation?
I've been trying to get this working, but I'm not sure how. Basically every example I see there's always a Mutex (or RwLock, or something like that) guarding the inner value.
I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?
In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.
Here's an example:
use std::{
collections::HashMap,
sync::{Arc, Mutex, RwLock},
thread,
time::Duration,
};
fn main() {
let data = Arc::new(RwLock::new(HashMap::new()));
let threads: Vec<_> = (0..10)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
})
.collect();
for t in threads {
t.join().expect("Thread panicked");
}
println!("{:?}", data);
}
fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
loop {
// Assume that the element already exists
let map = data.read().expect("RwLock poisoned");
if let Some(element) = map.get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
thread::sleep(Duration::from_secs(1));
return;
}
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
drop(map);
let mut map = data.write().expect("RwLock poisoned");
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
thread::sleep(Duration::from_millis(50));
map.entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
}
}
This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.
This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.
In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.
The next step up would be to use a crate like arc-swap, which says:
Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.
The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.
I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.
As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.
Suppose the key of the data is map-able to a u8
You can have Arc<HashMap<u8,Mutex<HashMap<Key,Value>>>
When you initialize the data structure you populate all the first level map before putting it in Arc (it will be immutable after initialization)
When you want a value from the map you will need to do a double get, something like:
data.get(&map_to_u8(&key)).unwrap().lock().expect("poison").get(&key)
where the unwrap is safe because we initialized the first map with all the value.
to write in the map something like:
data.get(&map_to_u8(id)).unwrap().lock().expect("poison").entry(id).or_insert_with(|| value);
It's easy to see contention is reduced because we now have 256 Mutex and the probability of multiple threads asking the same Mutex is low.
#Shepmaster example with 100 threads takes about 10s on my machine, the following example takes a little more than 1 second.
use std::{
collections::HashMap,
sync::{Arc, Mutex, RwLock},
thread,
time::Duration,
};
fn main() {
let mut inner = HashMap::new( );
for i in 0..=u8::max_value() {
inner.insert(i, Mutex::new(HashMap::new()));
}
let data = Arc::new(inner);
let threads: Vec<_> = (0..100)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
})
.collect();
for t in threads {
t.join().expect("Thread panicked");
}
println!("{:?}", data);
}
fn worker_thread(id: u8, data: Arc<HashMap<u8,Mutex<HashMap<u8,Mutex<i32>>>>> ) {
loop {
// first unwrap is safe to unwrap because we populated for every `u8`
if let Some(element) = data.get(&id).unwrap().lock().expect("poison").get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
thread::sleep(Duration::from_secs(1));
return;
}
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
thread::sleep(Duration::from_millis(50));
data.get(&id).unwrap().lock().expect("poison").entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
}
}
Maybe you want to consider evmap:
A lock-free, eventually consistent, concurrent multi-value map.
The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.

Resources