Rust deadlock with shared struct: Arc + channel + atomic - rust

I'm new to Rust and was trying to generate plenty of JSON data on the fly for a project, but I'm having deadlocks.
I've tried removing the serialization (json_serde) and sending the HashMaps in the channel instead but I still get deadlocks on my computer. If I however comment the send(generator.next()) line and send a string myself, code works flawlessly, thus the deadlock is caused by my DatasetGenerator, but I don't understand why.
Code summary:
Have a DatasetGenerator object that can generate sequences of "events" and serialize them to JSON.
generator.next() works like an "iterator" - It increments an internal atomic counter in the generator and then generates the i-th item in the sequence + serializes the JSON.
Have a generator threadpool generate these JSONs at high throughput (very large payloads each)
Send these JSONs through a channel to other thread (which will send them through network but irrelevant for this question)
Depending if I comment tx_ref.send(generator_ref.next()) or tx_ref.send(some_new_string) below my code deadlocks or succeeds:
src/main.rs:
extern crate threads_pool;
use threads_pool::*;
mod generator;
use std::sync::mpsc;
use std::sync::Arc;
use std::thread;
fn main() {
// N will be an argument, and a very high number. For tests use this:
const N: i64 = 12; // Increase this if you're not getting the deadlock yet, or run cargo run again until it happens.
let (tx, rx) = mpsc::channel();
let tx_producer = tx.clone();
let producer_thread = thread::spawn(move || {
let pool = ThreadPool::new(4);
let generator = Arc::new(generator::data_generator::DatasetGenerator::new(3000));
for i in 0..N {
println!("Generating #{}", i);
let tx_ref = tx_producer.clone();
let generator_ref = generator.clone();
pool.execute(move || {
////////// v !!!DEADLOCK HERE!!! v //////////
tx_ref.send(generator_ref.next()).expect("tx failed."); // This locks!
//tx_ref.send(format!(" {} ", i)).expect("tx failed."); // This works!
////////// ^ !!!DEADLOCK HERE!!! ^ //////////
})
.unwrap();
}
println!("Generator done!");
});
println!("-» Consumer consuming!");
for j in 0..N {
let s = rx.recv().expect("rx failed");
println!("-» Consumed #{}: {} ... ", j, &s[..10]);
}
println!("Consumer done!!");
producer_thread.join().unwrap();
println!("Success. Exit!");
}
This is my DatasetGenerator which seems to be causing all the trouble (as not using serde but outputting the HashMaps still gives deadlocks). src/generator/dataset_generator.rs:
use serde_json::Value;
use std::collections::HashMap;
use std::sync::atomic;
pub struct DatasetGenerator {
num_features: usize,
pub counter: atomic::AtomicI64,
feature_names: Vec<String>,
}
type Datapoint = HashMap<String, Value>;
type Out = String;
impl DatasetGenerator {
pub fn new(num_features: usize) -> DatasetGenerator {
let mut feature_names = Vec::new();
for i in 0..num_features {
feature_names.push(format!("f_{}", i));
}
DatasetGenerator {
num_features,
counter: atomic::AtomicI64::new(0),
feature_names,
}
}
/// Generates the next item in the sequence (iterator-like).
pub fn next(&self) -> Out {
let value = self.counter.fetch_add(1, atomic::Ordering::SeqCst);
self.gen(value)
}
/// Generates the ith item in the sequence. DEADLOCKS!!! ///////////////////////////
pub fn gen(&self, ith: i64) -> Out {
let mut data = Datapoint::with_capacity(self.num_features);
for f in 0..self.num_features {
let name = self.feature_names.get(f).unwrap();
data.insert(name.to_string(), Value::from(ith));
}
serde_json::json!(data).to_string() // Tried without serialization and still deadlocks!
}
}
Commit with deadlock code is here if you want to try out yourself with cargo run: https://github.com/AlbertoEAF/learn-rust/tree/dc5fa867e5a70b605553ef65796fdc9dd42d38a0/rest-injector
Deadlock on Windows with Rust 1.60.0:
Thank you for the help! it's greatly appreciated :)
Update
I've followed the suggestions from #kmdreko's answer below, and apparently the problem is in the generator: not all the items are generated. Even though pool.execute() is called N times, only a random number of closures c < N are executed even if I place pool.close() before leaving the producer_thread. Why does that happen / How can it be fixed?
Fix: Turns out this lockup is caused by the threads_pool library (0.2.6). I switched the thread pool to rayon's and it worked smoothly at the first try.

One thing you should change: an mpsc::Receiver will return an error on .recv() if it cannot possibly yield a result by realizing that all the associated mpsc::Senders have dropped, which is a good indicator that all the work is done. Your tx_refs and even tx_producer will be dropped when their respective tasks/threads complete, however you still have tx in scope that can theoretically give a value. This is what gives you the apparent deadlock. You should simply remove tx_producer and use tx directly so it is moved into the producer thread and dropped accordingly.
Now, you'll see either all N tasks complete, or you'll get an error indicating that some tasks did not complete. The reason not all tasks are completing is because you're creating the thread pool, spawning all the tasks, and then immediately destroying it. The threads_pool documentation says that the threads will finish their current job when the pool is destroyed, but you want to wait until all jobs have completed. For that you need to call the .close() method provided by the PoolManager trait before the end of the closure.
The reason you saw inconsistent behavior, but was benefited by returning a string directly is because the jobs required less work and the threads could get away with completing all them before they saw their signal to exit. Your generator_ref.next() requires much more computation so its not surprising they'd only process 4-plus-a-bit jobs before they see they've been told to exit.

Related

Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
}
impl<T> Channel<T> {
pub fn send(&self, value: T) {
self.values.lock().unwrap()
.push_back(value);
// Notify the consumer a value is available
self.notify.notify_one();
}
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
}
// Wait for values to be available
self.notify.notified().await;
}
}
}
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
#[tokio::main]
async fn main() {
let mut i = 0;
loop{
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
});
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
sender.send(1);
});
}
}else{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
receiver.recv().await;
});
handles.push(handle);
}
}
}
futures::future::join_all(handles).await;
i += 1;
println!("No.{i} loop finished.");
}
}
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

Testing a thread Worker with an anonymous function

I am adding tests to the 'hello' web server from the rust book.
My issue/error is around how to test whether a Worker has processed a Job.
My idea is to pass an anonymous function which updates a bool from false to true.
I think ownership is an issue here. I tried wrapping f in a Box, thinking it would prevent passing bool as a value as opposed to a reference. Using Box I struggled to mutate the value of state_updated when it was wrapped in this way.
I also tried writing a basic struct to wrap and update the bool. I have since reverted back to a mut bool.
First question: What changes do I need to make to get the test to pass?
Second question: Is there a better way for me to test this?
Below is a minimal version which reproduces my issue.
The full code is available at the bottom of this page in the rust book.
My current test creates a Worker, sends a Job to the worker, and asserts on an expected change
that could only have occurred if the Worker has processed the Job.
I intend to iterate on this test to add proper thread cleanup in the future.
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use hello_server_help::Worker;
use std::thread;
use std::time::Duration;
#[test]
fn test_worker_processes_job() {
let (sender, r) = mpsc::channel();
let receiver = Arc::new(Mutex::new(r));
let _ = Worker::new(0, receiver);
let mut state_updated = false;
let f = move || state_updated = true;
sender.send(Box::new(f)).unwrap();
thread::sleep(Duration::from_secs(1)); // primitive wait, for now
assert_eq!(state_updated, true);
}
It's my understanding that f is taking ownership of state_updated. In the assert line, however,
at the end, there is no error along the lines of "referenced after move".
Running the tests gives me the output:
running 1 test
test test_worker_processes_job ... FAILED
failures:
---- test_worker_processes_job stdout ----
thread 'test_worker_processes_job' panicked at 'assertion failed: `(left == right)`
left: `false`,
right: `true`', tests/worker_tests.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The MRE implementation:
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
pub type Job = Box<dyn FnOnce() + Send + 'static>;
pub struct Worker {
id: usize,
handle: Option<thread::JoinHandle<()>>,
}
impl Worker {
pub fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
let job = receiver
.lock()
.expect("Error obtaining lock.")
.recv()
.unwrap();
job();
});
Worker {
id,
handle: Some(thread),
}
}
}
state_updated is a boolean so it implements Copy, which is why you can move it into your closure and keep using it afterwards, and why you can't see the changes: the one that is modified by the closure is the copy and not the original.
If you want to update a boolean in the thread and have it visible in the caller, you will need to make sure that you send a reference and you will need to have some synchronization mechanism. Two solutions:
Use an Arc<Mutex<bool>>:
use std::sync::Arc;
use std::sync::Mutex;
let state_updated = Arc::new (Mutex::new (false));
let state_ref = state_updated.clone()
let f = move || *state_ref.lock().unwrap() = true;
…
assert_eq!(*state_updated.lock().unwrap(), true);
Or use an AtomicBool:
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering;
let state_updated = AtomicBool::new (false);
let state_ref = &state_updated;
let f = move || state_ref.store (true, Ordering::Release);
…
assert_eq!(state_updated.load (Ordering::Acquire), true);
The compiler will complain that "state_ref does not live long enough", but you can get around that by using a scoped thread (or from rayon or crossbeam), or with a bit of unsafe: let state_ref: &'static AtomicBool = unsafe { transmute (&state_updated) }; (just make sure you join the child thread before state_updated goes out of scope).
It might however be better to use a channel for the return value:
use use std::sync::mpsc;
let (rsend, rrecv) = mpsc::channel();
let f = move || rsend.send(());
…
assert_eq!(rrecv.recv_timeout (Duration::from_secs (1)), Ok(()));
that way you only wait until the result is available (the duration is just a timeout if the thread takes too long to compute the result).

How can I move the data between threads safely?

I'm currently trying to call a function to which I pass multiple file names and expect the function to read the files and generate the appropriate structs and return them in a Vec<Audit>. I've been able to accomplish it reading the files one by one but I want to achieve it using threads.
This is the function:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut audits = Arc::new(Mutex::new(vec![]));
let mut handlers = vec![];
for file in files {
let audits = Arc::clone(&audits);
handlers.push(thread::spawn(move || {
let mut audits = audits.lock().unwrap();
audits.push(audit_from_xml_file(file.clone()));
audits
}));
}
for handle in handlers {
let _ = handle.join();
}
audits
.lock()
.unwrap()
.into_iter()
.fold(vec![], |mut result, audit| {
result.push(audit);
result
})
}
But it won't compile due to the following error:
error[E0277]: `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
--> src/main.rs:82:23
|
82 | handlers.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
|
::: /home/enthys/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:618:8
I have tried wrapping the generated Audit structs in Some(Audit) to avoid the MutexGuard but then I stumble with Poisonned Thread issues.
The cause of the error is that after after pushing the new Audit into the (locked) audits vec you then try to return the vec's MutexGuard.
In Rust, a thread's function can actually return values, the point of doing that is to send the value back to whoever is join-ing the thread. This means the value is going to move between threads, so the value needs to be movable betweem threads (aka Send), which mutex guards have no reason to be[0].
The easy solution is to just... not do that. Just delete the last line of the spawn function. Though it's not like the code works after that as you still have borrowing issue related to the thing at the end.
An alternative is to lean into the feature (especially if Audit objects are not too big): drop the audits vec entirely and instead have each thread return its audit, then collect from the handlers when you join them:
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers.into_iter()
.map(|handler| handler.join().unwrap())
.collect()
}
Though at that point you might as well just let Rayon handle it:
use rayon::prelude::*;
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
files.into_par_iter().map(audit_from_xml_file).collect()
}
That also avoids crashing the program or bringing the machine to its knees if you happen to have millions of files.
[0] and all the reasons not to be, locking on one thread and unlocking on an other is not necessarily supported e.g. ReleaseMutex
The ReleaseMutex function fails if the calling thread does not own the mutex object.
(NB: in the windows lingo, "owning" a mutex means having acquired it via WaitForSingleObject, which translates to lock in posix lingo)
and can be plain UB e.g. pthread_mutex_unlock
If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.
Your problem is that you are passing your Vec<Audit> (or more precisely the MutexGuard<Vec<Audit>>), to the threads and back again, without really needing it.
And you don't need Mutex or Arc for this simpler task:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers
.into_iter()
.flat_map(|x| x.join())
.collect()
}

Multi-threaded memoisation in Rust

I am developing an algorithm in Rust that I want to multi-thread. The nature of the algorithm is that it produces solutions to overlapping subproblems, hence why I am looking for a way to achieve multi-threaded memoisation.
An implementation of (single-threaded) memoisation is presented by Pritchard in this article.
I would like to have this functionality extended such that:
Whenever the underlying function must be invoked, including recursively, the result is evaluated asynchronously on a new thread.
Continuing on from the previous point, suppose we have some memoised function f, and f(x) that needs to recursively invoke f(x1), f(x2), … f(xn). It should be possible for all of these recursive invocations to be evaluated concurrently on separate threads.
If the memoised function is called on an input whose result is currently being evaluated, the current thread should block on this thread, and somehow obtain the result after it is released. This ensures that we don't end up with multiple threads attempting to evaluate the same result.
There is a means of forcing f(x) to be evaluated and cached (if it isn't already) without blocking the current thread. This allows the programmer to preemptively begin the evaluation of a result on a particular value that they know will be (or is likely to be) needed later.
One way you could do this is by storing a HashMap, where the key is the paramaters to f and the value is the receiver of a oneshot message containing the result. Then for any value that you need:
If there is already a receiver in the map, await it.
Otherwise, spawn a future to start calculating the result, and store the receiver in the map.
Here is a very contrived example that took way longer than it should have, but successfully runs (Playground):
use futures::{
future::{self, BoxFuture},
prelude::*,
ready,
};
use std::{
collections::HashMap,
pin::Pin,
sync::Arc,
task::{Context, Poll},
};
use tokio::sync::{oneshot, Mutex};
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct MemoInput(usize);
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct MemoReturn(usize);
/// This is necessary in order to make a concrete type for the `HashMap`.
struct OneshotReceiverUnwrap<T>(oneshot::Receiver<T>);
impl<T> Future for OneshotReceiverUnwrap<T> {
type Output = T;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// Don't worry too much about this part
Poll::Ready(ready!(Pin::new(&mut self.0).poll(cx)).unwrap())
}
}
type MemoMap = Mutex<HashMap<MemoInput, future::Shared<OneshotReceiverUnwrap<MemoReturn>>>>;
/// Compute (2^n)-1, super inefficiently.
fn compute(map: Arc<MemoMap>, x: MemoInput) -> BoxFuture<'static, MemoReturn> {
async move {
// First, get all dependencies.
let dependencies: Vec<MemoReturn> = future::join_all({
let map2 = map.clone();
let mut map_lock = map.lock().await;
// This is an iterator of futures that resolve to the results of the
// dependencies.
(0..x.0).map(move |i| {
let key = MemoInput(i);
let key2 = key.clone();
(*map_lock)
.entry(key)
.or_insert_with(|| {
// If the value is not currently being calculated (ie.
// is not in the map), start calculating it
let (tx, rx) = oneshot::channel();
let map3 = map2.clone();
tokio::spawn(async move {
// Compute the value, then send it to the receiver
// that we put in the map. This will awake all
// threads that were awaiting it.
tx.send(compute(map3, key2).await).unwrap();
});
// Return a shared future so that multiple threads at a
// time can await it
OneshotReceiverUnwrap(rx).shared()
})
.clone() // Clone one instance of the shared future for us
})
})
.await;
// At this point, all dependencies have been resolved!
let result = dependencies.iter().map(|r| r.0).sum::<usize>() + x.0;
MemoReturn(result)
}
.boxed() // Box in order to prevent a recursive type
}
#[tokio::main]
async fn main() {
let map = Arc::new(MemoMap::default());
let result = compute(map, MemoInput(10)).await.0;
println!("{}", result); // 1023
}
Note: this could certainly be better optimized, this is just a POC example.

How do I modify a value in one thread and read the value in another thread using shared memory?

The following Python code creates a thread (actually a process) with an array containing two floats passed to it, the thread counts up 1 by the first float and -1 by the second float every 5 seconds, while the main thread is continuously printing the two floats:
from multiprocessing import Process, Array
from time import sleep
def target(states):
while True:
states[0] -= 1
states[1] += 1
sleep(5)
def main():
states = Array("d", [0.0, 0.0])
process = Process(target=target, args=(states,))
process.start()
while True:
print(states[0])
print(states[1])
if __name__ == "__main__":
main()
How can I do the same thing using shared memory in Rust? I've tried doing the following (playground):
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let data = data.clone();
thread::spawn(move || {
let mut data = data.lock().unwrap();
data[0] = 1.0;
});
print!("{}", data[0]);
}
But that's giving a compile error:
error: cannot index a value of type `std::sync::Arc<std::sync::Mutex<[_; 1]>>`
--> <anon>:12:18
|>
12 |> print!("{}", data[0]);
|> ^^^^^^^
And even if that'd work, it does something different. I've read this, but I've still no idea how to do it.
Your code is not that far off! :)
Let's look at the compiler error first: it says that you are apparently attempting to index something. This is true, you want to index the data variable (with data[0]), but the compiler complains that the value you want to index is of type std::sync::Arc<std::sync::Mutex<[_; 1]>> and cannot be indexed.
If you look at the type, you can quickly see: my array is still wrapped in a Mutex<T> which is wrapped in an Arc<T>. This brings us to the solution: you have to lock for read access, too. So you have to add the lock().unwrap() like in the other thread:
print!("{}", data.lock().unwrap()[0]);
But now a new compiler error arises: use of moved value: `data`. Dang! This comes from your name shadowing. You say let data = data.clone(); before starting the thread; this shadows the original data. So how about we replace it by let data_for_thread = data.clone() and use data_for_thread in the other thread? You can see the working result here on the playground.
Making it do the same thing as the Python example isn't that hard anymore then, is it?
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
let data = Arc::new(Mutex::new([0.0, 0.0]));
let data_for_thread = data.clone();
thread::spawn(move || {
loop {
thread::sleep(Duration::from_secs(5))
let mut data = data_for_thread.lock().unwrap();
data[0] += 1.0;
data[1] -= 1.0;
}
});
loop {
let data = data.lock().unwrap();
println!("{}, {}", data[0], data[1]);
}
You can try it here on the playground, although I changed a few minor things to allow running on the playground.
Ok, so let's first fix the compiler error:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let thread_data = data.clone();
thread::spawn(move || {
let mut data = thread_data.lock().unwrap();
data[0] = 1.0;
});
println!("{}", data.lock().unwrap()[0]);
}
The variable thread_data is always moved into the thread, that is why it cannot be accessed after the thread is spawned.
But this still has a problem: you are starting a thread that will run concurrently with the main thread and the last print statement will execute before the thread changes the value most of the time (it will be random).
To fix this you have to wait for the thread to finish before printing the value:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let thread_data = data.clone();
let t = thread::spawn(move || {
let mut data = thread_data.lock().unwrap();
data[0] = 1.0;
});
t.join().unwrap();
println!("{}", data.lock().unwrap()[0]);
}
This will always produce the correct result.
If you update common data by a thread, the other threads might not see the updated value, unless you do the following:
Declare the variable as volatile which makes sure that the latest update is given back to the threads that read the variable. The data is read from the memory block but not from cache.
Make all updates and reads as synchronized which might turn out to be costly in terms of performance but is sure to deal with data corruptions/in-consistency due to non-synchronization methods of writes and reads by distinct threads.

Resources