Testing a thread Worker with an anonymous function - rust

I am adding tests to the 'hello' web server from the rust book.
My issue/error is around how to test whether a Worker has processed a Job.
My idea is to pass an anonymous function which updates a bool from false to true.
I think ownership is an issue here. I tried wrapping f in a Box, thinking it would prevent passing bool as a value as opposed to a reference. Using Box I struggled to mutate the value of state_updated when it was wrapped in this way.
I also tried writing a basic struct to wrap and update the bool. I have since reverted back to a mut bool.
First question: What changes do I need to make to get the test to pass?
Second question: Is there a better way for me to test this?
Below is a minimal version which reproduces my issue.
The full code is available at the bottom of this page in the rust book.
My current test creates a Worker, sends a Job to the worker, and asserts on an expected change
that could only have occurred if the Worker has processed the Job.
I intend to iterate on this test to add proper thread cleanup in the future.
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use hello_server_help::Worker;
use std::thread;
use std::time::Duration;
#[test]
fn test_worker_processes_job() {
let (sender, r) = mpsc::channel();
let receiver = Arc::new(Mutex::new(r));
let _ = Worker::new(0, receiver);
let mut state_updated = false;
let f = move || state_updated = true;
sender.send(Box::new(f)).unwrap();
thread::sleep(Duration::from_secs(1)); // primitive wait, for now
assert_eq!(state_updated, true);
}
It's my understanding that f is taking ownership of state_updated. In the assert line, however,
at the end, there is no error along the lines of "referenced after move".
Running the tests gives me the output:
running 1 test
test test_worker_processes_job ... FAILED
failures:
---- test_worker_processes_job stdout ----
thread 'test_worker_processes_job' panicked at 'assertion failed: `(left == right)`
left: `false`,
right: `true`', tests/worker_tests.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The MRE implementation:
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
pub type Job = Box<dyn FnOnce() + Send + 'static>;
pub struct Worker {
id: usize,
handle: Option<thread::JoinHandle<()>>,
}
impl Worker {
pub fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
let job = receiver
.lock()
.expect("Error obtaining lock.")
.recv()
.unwrap();
job();
});
Worker {
id,
handle: Some(thread),
}
}
}

state_updated is a boolean so it implements Copy, which is why you can move it into your closure and keep using it afterwards, and why you can't see the changes: the one that is modified by the closure is the copy and not the original.
If you want to update a boolean in the thread and have it visible in the caller, you will need to make sure that you send a reference and you will need to have some synchronization mechanism. Two solutions:
Use an Arc<Mutex<bool>>:
use std::sync::Arc;
use std::sync::Mutex;
let state_updated = Arc::new (Mutex::new (false));
let state_ref = state_updated.clone()
let f = move || *state_ref.lock().unwrap() = true;
…
assert_eq!(*state_updated.lock().unwrap(), true);
Or use an AtomicBool:
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering;
let state_updated = AtomicBool::new (false);
let state_ref = &state_updated;
let f = move || state_ref.store (true, Ordering::Release);
…
assert_eq!(state_updated.load (Ordering::Acquire), true);
The compiler will complain that "state_ref does not live long enough", but you can get around that by using a scoped thread (or from rayon or crossbeam), or with a bit of unsafe: let state_ref: &'static AtomicBool = unsafe { transmute (&state_updated) }; (just make sure you join the child thread before state_updated goes out of scope).
It might however be better to use a channel for the return value:
use use std::sync::mpsc;
let (rsend, rrecv) = mpsc::channel();
let f = move || rsend.send(());
…
assert_eq!(rrecv.recv_timeout (Duration::from_secs (1)), Ok(()));
that way you only wait until the result is available (the duration is just a timeout if the thread takes too long to compute the result).

Related

Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
}
impl<T> Channel<T> {
pub fn send(&self, value: T) {
self.values.lock().unwrap()
.push_back(value);
// Notify the consumer a value is available
self.notify.notify_one();
}
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
}
// Wait for values to be available
self.notify.notified().await;
}
}
}
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
#[tokio::main]
async fn main() {
let mut i = 0;
loop{
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
});
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
sender.send(1);
});
}
}else{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
receiver.recv().await;
});
handles.push(handle);
}
}
}
futures::future::join_all(handles).await;
i += 1;
println!("No.{i} loop finished.");
}
}
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

Rust deadlock with shared struct: Arc + channel + atomic

I'm new to Rust and was trying to generate plenty of JSON data on the fly for a project, but I'm having deadlocks.
I've tried removing the serialization (json_serde) and sending the HashMaps in the channel instead but I still get deadlocks on my computer. If I however comment the send(generator.next()) line and send a string myself, code works flawlessly, thus the deadlock is caused by my DatasetGenerator, but I don't understand why.
Code summary:
Have a DatasetGenerator object that can generate sequences of "events" and serialize them to JSON.
generator.next() works like an "iterator" - It increments an internal atomic counter in the generator and then generates the i-th item in the sequence + serializes the JSON.
Have a generator threadpool generate these JSONs at high throughput (very large payloads each)
Send these JSONs through a channel to other thread (which will send them through network but irrelevant for this question)
Depending if I comment tx_ref.send(generator_ref.next()) or tx_ref.send(some_new_string) below my code deadlocks or succeeds:
src/main.rs:
extern crate threads_pool;
use threads_pool::*;
mod generator;
use std::sync::mpsc;
use std::sync::Arc;
use std::thread;
fn main() {
// N will be an argument, and a very high number. For tests use this:
const N: i64 = 12; // Increase this if you're not getting the deadlock yet, or run cargo run again until it happens.
let (tx, rx) = mpsc::channel();
let tx_producer = tx.clone();
let producer_thread = thread::spawn(move || {
let pool = ThreadPool::new(4);
let generator = Arc::new(generator::data_generator::DatasetGenerator::new(3000));
for i in 0..N {
println!("Generating #{}", i);
let tx_ref = tx_producer.clone();
let generator_ref = generator.clone();
pool.execute(move || {
////////// v !!!DEADLOCK HERE!!! v //////////
tx_ref.send(generator_ref.next()).expect("tx failed."); // This locks!
//tx_ref.send(format!(" {} ", i)).expect("tx failed."); // This works!
////////// ^ !!!DEADLOCK HERE!!! ^ //////////
})
.unwrap();
}
println!("Generator done!");
});
println!("-» Consumer consuming!");
for j in 0..N {
let s = rx.recv().expect("rx failed");
println!("-» Consumed #{}: {} ... ", j, &s[..10]);
}
println!("Consumer done!!");
producer_thread.join().unwrap();
println!("Success. Exit!");
}
This is my DatasetGenerator which seems to be causing all the trouble (as not using serde but outputting the HashMaps still gives deadlocks). src/generator/dataset_generator.rs:
use serde_json::Value;
use std::collections::HashMap;
use std::sync::atomic;
pub struct DatasetGenerator {
num_features: usize,
pub counter: atomic::AtomicI64,
feature_names: Vec<String>,
}
type Datapoint = HashMap<String, Value>;
type Out = String;
impl DatasetGenerator {
pub fn new(num_features: usize) -> DatasetGenerator {
let mut feature_names = Vec::new();
for i in 0..num_features {
feature_names.push(format!("f_{}", i));
}
DatasetGenerator {
num_features,
counter: atomic::AtomicI64::new(0),
feature_names,
}
}
/// Generates the next item in the sequence (iterator-like).
pub fn next(&self) -> Out {
let value = self.counter.fetch_add(1, atomic::Ordering::SeqCst);
self.gen(value)
}
/// Generates the ith item in the sequence. DEADLOCKS!!! ///////////////////////////
pub fn gen(&self, ith: i64) -> Out {
let mut data = Datapoint::with_capacity(self.num_features);
for f in 0..self.num_features {
let name = self.feature_names.get(f).unwrap();
data.insert(name.to_string(), Value::from(ith));
}
serde_json::json!(data).to_string() // Tried without serialization and still deadlocks!
}
}
Commit with deadlock code is here if you want to try out yourself with cargo run: https://github.com/AlbertoEAF/learn-rust/tree/dc5fa867e5a70b605553ef65796fdc9dd42d38a0/rest-injector
Deadlock on Windows with Rust 1.60.0:
Thank you for the help! it's greatly appreciated :)
Update
I've followed the suggestions from #kmdreko's answer below, and apparently the problem is in the generator: not all the items are generated. Even though pool.execute() is called N times, only a random number of closures c < N are executed even if I place pool.close() before leaving the producer_thread. Why does that happen / How can it be fixed?
Fix: Turns out this lockup is caused by the threads_pool library (0.2.6). I switched the thread pool to rayon's and it worked smoothly at the first try.
One thing you should change: an mpsc::Receiver will return an error on .recv() if it cannot possibly yield a result by realizing that all the associated mpsc::Senders have dropped, which is a good indicator that all the work is done. Your tx_refs and even tx_producer will be dropped when their respective tasks/threads complete, however you still have tx in scope that can theoretically give a value. This is what gives you the apparent deadlock. You should simply remove tx_producer and use tx directly so it is moved into the producer thread and dropped accordingly.
Now, you'll see either all N tasks complete, or you'll get an error indicating that some tasks did not complete. The reason not all tasks are completing is because you're creating the thread pool, spawning all the tasks, and then immediately destroying it. The threads_pool documentation says that the threads will finish their current job when the pool is destroyed, but you want to wait until all jobs have completed. For that you need to call the .close() method provided by the PoolManager trait before the end of the closure.
The reason you saw inconsistent behavior, but was benefited by returning a string directly is because the jobs required less work and the threads could get away with completing all them before they saw their signal to exit. Your generator_ref.next() requires much more computation so its not surprising they'd only process 4-plus-a-bit jobs before they see they've been told to exit.

Get HashMap from thread

I am trying to get a value from a thread, in this case a HashMap. I reduced the code to the following (I originally tried to share a HashMap containig a Vec):
use std::thread;
use std::sync::mpsc;
use std::sync::Mutex;
use std::sync::Arc;
use std::collections::HashMap;
fn main() {
let(tx, rx) = mpsc::channel();
let n_handle= thread::spawn( || {
tx.send(worker());
});
print!("{:?}", rx.recv().unwrap().into_inner().unwrap());
}
fn worker() -> Arc<Mutex<HashMap<String, i32>>>{
let result: HashMap<String, i32> = HashMap::new();
// some computation
Arc::from(Mutex::from(result))
}
Still Rust says that:
std::sync::mpsc::Sender<std::sync::Arc<std::sync::Mutex<std::collections::HashMap<std::string::String, i32>>>> cannot be shared between threads safely
I read some confusing stuff about putting everything into Arc<Mutex<..>> which I also tried with the value:
let result: HashMap<String, Arc<Mutex<i32>>> = HashMap::new();
Can anyone point me to a document that explains the usage of the mpsc::channel with values such as HashMaps? I understand why it is not working, as the trait Sync is not implemented for the HashMap, which is required to share the stuff. Still I have no idea how to get it to work.
You can pass the values between threads with using mpsc channel.
Until you tag your thread::spawn with the move keyword like following:
thread::spawn(move || {});
Since you did not tag it with move keyword then it is not moving the outer variables into the thread scope but only sharing their references. Thus you need to implement Sync trait that every outer variable you use.
mpsc::Sender does not implement Sync that is why you get the error cannot be shared between threads.
The solution for your case would be ideal to move the sender to inside of the thread scope with move like following:
use std::collections::HashMap;
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
fn main() {
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
let _ = tx.send(worker());
});
let arc = rx.recv().unwrap();
let hashmap_guard = arc.lock().unwrap();
print!(
"HashMap that retrieved from thread : {:?}",
hashmap_guard.get("Hello").unwrap()
);
}
fn worker() -> Arc<Mutex<HashMap<String, i32>>> {
let mut result: HashMap<String, i32> = HashMap::new();
result.insert("Hello".to_string(), 2);
// some computation
Arc::new(Mutex::new(result))
}
Playground
For further info: I'd recommend reading The Rust Programming Language, specifically the chapter on concurrency. In it, you are introduced to Arc: especially if you want to share your data in between threads.

How do I modify a value in one thread and read the value in another thread using shared memory?

The following Python code creates a thread (actually a process) with an array containing two floats passed to it, the thread counts up 1 by the first float and -1 by the second float every 5 seconds, while the main thread is continuously printing the two floats:
from multiprocessing import Process, Array
from time import sleep
def target(states):
while True:
states[0] -= 1
states[1] += 1
sleep(5)
def main():
states = Array("d", [0.0, 0.0])
process = Process(target=target, args=(states,))
process.start()
while True:
print(states[0])
print(states[1])
if __name__ == "__main__":
main()
How can I do the same thing using shared memory in Rust? I've tried doing the following (playground):
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let data = data.clone();
thread::spawn(move || {
let mut data = data.lock().unwrap();
data[0] = 1.0;
});
print!("{}", data[0]);
}
But that's giving a compile error:
error: cannot index a value of type `std::sync::Arc<std::sync::Mutex<[_; 1]>>`
--> <anon>:12:18
|>
12 |> print!("{}", data[0]);
|> ^^^^^^^
And even if that'd work, it does something different. I've read this, but I've still no idea how to do it.
Your code is not that far off! :)
Let's look at the compiler error first: it says that you are apparently attempting to index something. This is true, you want to index the data variable (with data[0]), but the compiler complains that the value you want to index is of type std::sync::Arc<std::sync::Mutex<[_; 1]>> and cannot be indexed.
If you look at the type, you can quickly see: my array is still wrapped in a Mutex<T> which is wrapped in an Arc<T>. This brings us to the solution: you have to lock for read access, too. So you have to add the lock().unwrap() like in the other thread:
print!("{}", data.lock().unwrap()[0]);
But now a new compiler error arises: use of moved value: `data`. Dang! This comes from your name shadowing. You say let data = data.clone(); before starting the thread; this shadows the original data. So how about we replace it by let data_for_thread = data.clone() and use data_for_thread in the other thread? You can see the working result here on the playground.
Making it do the same thing as the Python example isn't that hard anymore then, is it?
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
let data = Arc::new(Mutex::new([0.0, 0.0]));
let data_for_thread = data.clone();
thread::spawn(move || {
loop {
thread::sleep(Duration::from_secs(5))
let mut data = data_for_thread.lock().unwrap();
data[0] += 1.0;
data[1] -= 1.0;
}
});
loop {
let data = data.lock().unwrap();
println!("{}, {}", data[0], data[1]);
}
You can try it here on the playground, although I changed a few minor things to allow running on the playground.
Ok, so let's first fix the compiler error:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let thread_data = data.clone();
thread::spawn(move || {
let mut data = thread_data.lock().unwrap();
data[0] = 1.0;
});
println!("{}", data.lock().unwrap()[0]);
}
The variable thread_data is always moved into the thread, that is why it cannot be accessed after the thread is spawned.
But this still has a problem: you are starting a thread that will run concurrently with the main thread and the last print statement will execute before the thread changes the value most of the time (it will be random).
To fix this you have to wait for the thread to finish before printing the value:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let thread_data = data.clone();
let t = thread::spawn(move || {
let mut data = thread_data.lock().unwrap();
data[0] = 1.0;
});
t.join().unwrap();
println!("{}", data.lock().unwrap()[0]);
}
This will always produce the correct result.
If you update common data by a thread, the other threads might not see the updated value, unless you do the following:
Declare the variable as volatile which makes sure that the latest update is given back to the threads that read the variable. The data is read from the memory block but not from cache.
Make all updates and reads as synchronized which might turn out to be costly in terms of performance but is sure to deal with data corruptions/in-consistency due to non-synchronization methods of writes and reads by distinct threads.

Code not running in parallel when using thread::scoped

Can someone please explain why the code below does not run in parallel? I guess I don't understand how thread::scoped works..
use std::thread;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use std::old_io::timer;
fn main() {
let buf = Arc::new(Mutex::new(Vec::<String>::new()));
let res = test(buf);
println!("{:?}", *res.lock().unwrap());
}
fn test(buf: Arc<Mutex<Vec<String>>>) -> Arc<Mutex<Vec<String>>> {
let guards: Vec<_> = (0..3).map( |i| {
let mtx = buf.clone();
thread::scoped(|| {
println!("Thread: {}", i);
let mut res = mtx.lock().unwrap();
timer::sleep(Duration::seconds(5));
res.push(format!("thread {}", i));
});
}).collect();
buf
}
The code is based on the examples here where it's stated:
The scoped function takes one argument, a closure, indicated by the double bars ||. This closure is executed in a new thread created by scoped. The method is called scoped because it returns a 'join guard', which will automatically join the child thread when it goes out of scope. Because we collect these guards into a Vec, and that vector goes out of scope at the end of our program, our program will wait for every thread to finish before finishing.
Thanks
This is a tricky case. The problem is the humble semicolon. Look at this minimized code:
thread::scoped(|| {});
That semicolon means that the result of the collect isn't a vector of JoinGuards — it's a Vec<()>! Each JoinGuard is dropped immediately, forcing the thread to finish before the next iteration starts.
When you fix this issue, you'll hit the next problem, which is that i and mtx don't live long enough. You'll need to move them into the closure:
thread::scoped(move || {})

Resources