Lifetimes of thread::scope()'s variables and spawned threads - multithreading

I'm trying to do some parallel processing on a list of values:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let output_list = vec![0.0f32;list.len()];
thread::scope(|s| { // Block S
(0..list.len()).collect::<Vec<_>>().chunks(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|&idx| {
let value = calc_value(list[idx]);
unsafe {
let out = (output_list.as_ptr() as *mut f32).offset(idx as isize);
*out = value;
}
});
});
});
});
output_list
}
The API says that thread::scope() only returns once each thread spawned by the scope it creates has finished. However, the compiler is telling me that the temporary range object (0..list.len()) is deconstructed while the threads that use it might still be alive.
I'm curious about what's actually happening under the hood. My intuition tells me that each thread spawned and variable created within Block S would both have Block S's lifetime. But clearly the threads have a lifetime longer than Block S.
Why aren't these lifetimes be the same?
Is the best practice here to create a variable in Block F that serves the purpose of the temporary like so:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let output_list = vec![0.0f32;list.len()];
let range = (0..list.len()).collect::<Vec<_>>();
thread::scope(|s| { // Block S
range.chunks(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|&idx| {
let value = calc_value(list[idx]);
unsafe {
let out = (output_list.as_ptr() as *mut f32).offset(idx as isize);
*out = value;
}
});
});
});
});
output_list
}

You don't have to ask SO to find out what happens under the hood of std functions, their source code is readily available, but since you asked I'll try to explain a little more in the comments.
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T,
{
// `Scope` creation not very relevant to the issue at hand
let scope = ...;
// here your 'Block S' gets run and returns.
let result = catch_unwind(AssertUnwindSafe(|| f(&scope)));
// the above is just a fancy way of calling `f` while catching any panics.
// but we're waiting for threads to finish running until here
while scope.data.num_running_threads.load(Ordering::Acquire) != 0 {
park();
}
// further not so relevant cleanup code
//...
}
So as you can see your assumption that 'Block S' will stick around as long as any of the threads is wrong.
And yes the solution is to capture the owner of the chunks before you call thread::scope.
There is also no reason to dive into unsafe for your example, you can use zip instead:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let mut output_list = vec![0.0f32; list.len()];
let mut zipped = list
.into_iter()
.zip(output_list.iter_mut())
.collect::<Vec<_>>();
thread::scope(|s| { // Block S
zipped.chunks_mut(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|(v, out)| {
let value = calc_value(*v);
**out = value;
});
});
});
});
output_list
}

Related

Sharing state between threads with notify-rs

i'm new to rust.
I'm trying to write a file_sensor that will start a counter after a file is created. The plan is that after an amount of time, if a second file is not received the sensor will exit with a zero exit code.
I could write the code to continue that work but i feel the code below illustrates the problem (i have also missed for example the post function referred to)
I have been struggling with this problem for several hours, i've tried Arc and mutex's and even global variables.
The Timer implementation is Ticktock-rs
I need to be able to either get heartbeat in the match body for EventKind::Create(CreateKind::Folder) or file_count in the loop
The code i've attached here runs but file_count is always zero in the loop.
use std::env;
use std::path::Path;
use std::{thread, time};
use std::process::ExitCode;
use ticktock::Timer;
use notify::{
Watcher,
RecommendedWatcher,
RecursiveMode,
Result,
event::{EventKind, CreateKind, ModifyKind, Event}
};
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let mut file_count = 0;
let args = Args::parse();
let REQUEST_SENSOR_PATH = env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher(move|res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count += 1;
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
},
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
},
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now){
println!("Heartbeat: {}, fileCount: {}", n, file_count);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
Ok(())
}
Your compiler warnings show you the problem:
warning: unused variable: `file_count`
--> src/main.rs:31:25
|
31 | file_count += 1;
| ^^^^^^^^^^
|
= note: `#[warn(unused_variables)]` on by default
= help: did you mean to capture by reference instead?
The problem here is that you use file_count inside of a move || closure. file_count is an i32, which is Copy. Using it in a move || closure actually creates a copy of it, which does no longer update the original variable if you assign to it.
Either way, it's impossible to modify a variable in main() from an event handler. Event handlers require 'static lifetime if they reference things, because Rust cannot guarantee that the event handler lives shorter than main.
One solution for this problem is to use reference counters and interior mutability. In this case, I will use Arc for reference counters and AtomicI32 for interior mutability. Note that notify::recommended_watcher requires thread safety, otherwise instead of an Arc<AtomicI32> we could have used an Rc<Cell<i32>>, which is the same thing but only for single-threaded environments, with a little less overhead.
use notify::{
event::{CreateKind, Event, EventKind},
RecursiveMode, Result, Watcher,
};
use std::time;
use std::{env, sync::atomic::Ordering};
use std::{path::Path, sync::Arc};
use std::{process::ExitCode, sync::atomic::AtomicI32};
use ticktock::Timer;
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let file_count = Arc::new(AtomicI32::new(0));
let REQUEST_SENSOR_PATH =
env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher({
let file_count = Arc::clone(&file_count);
move |res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count.fetch_add(1, Ordering::AcqRel);
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
}
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
}
}
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now) {
println!(
"Heartbeat: {}, fileCount: {}",
n,
file_count.load(Ordering::Acquire)
);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
}
Also, note that the ExitCode::from(101); gives you a warning. It does not actually exit the program, it only creates an exit code variable and then discards it again. You probably intended to write std::process::exit(101);. Although I would discourage it, because it does not properly clean up (does not call any Drop implementations). I'd use panic here, instead. This is the exact usecase panic is meant for.

Pass a matrix (Vec<Vec<f64>>) readonly to multiple threads

I am new to Rust and I am struggling with the concept of borrowing.
I want to load a Vec<Vec<f64>> matrix and then process it in parallel. However when I try to compile this piece of code I get error: capture of moved value: `matrix` [E0382] at the let _ = line.
This matrix is supposed to be readonly for the threads, they won't modify it. How can I pass it readonly and make the "moved value" error go away?
fn process(matrix: &Vec<Vec<f64>>) {
// do nothing for now
}
fn test() {
let filename = "matrix.tsv";
// loads matrix into a Vec<Vec<f64>>
let mut matrix = load_matrix(filename);
// Determine number of cpus
let ncpus = num_cpus::get();
println!("Number of cpus on this machine: {}", ncpus);
for i in 0..ncpus {
// In the next line the "error: capture of moved value: matrix" happens
let _ = thread::spawn(move || {
println!("Thread number: {}", i);
process(&matrix);
});
}
let d = Duration::from_millis(1000 * 1000);
thread::sleep(d);
}
Wrap the object to be shared into an Arc which stands for atomically reference counted (pointer). For each thread, clone this pointer and pass ownership of the clone to the thread. The wrapped object will be deallocated when it's no longer used by anything.
fn process(matrix: &Vec<Vec<f64>>) {
// do nothing for now
}
fn test() {
use std::sync::Arc;
let filename = "matrix.tsv";
// loads matrix into a Vec<Vec<f64>>
let mut matrix = Arc::new(load_matrix(filename));
// Determine number of cpus
let ncpus = num_cpus::get();
println!("Number of cpus on this machine: {}", ncpus);
for i in 0..ncpus {
let matrix = matrix.clone();
let _ = thread::spawn(move || {
println!("Thread number: {}", i);
process(&matrix);
});
}
let d = Duration::from_millis(1000 * 1000);
thread::sleep(d);
}

How to daisy chain threads using channels in Rust?

I'm trying to implement the sieve of Eratosthenes in Rust using coroutines as a learning exercise (not homework), and I can't find any reasonable way of connecting each thread to the Receiver and Sender ends of two different channels.
The Receiver is involved in two distinct tasks, namely reporting the highest prime found so far, and supplying further candidate primes for the filter. This is fundamental to the algorithm.
Here is what I would like to do but can't because the Receiver cannot be transferred between threads. Using std::sync::Arc does not appear to help, unsurprisingly.
Please note that I do understand why this doesn't work
fn main() {
let (basetx, baserx): (Sender<u32>, Receiver<u32>) = channel();
let max_number = 103;
thread::spawn(move|| {
generate_natural_numbers(&basetx, max_number);
});
let oldrx = &baserx;
loop {
// we need the prime in this thread
let prime = match oldrx.recv() {
Ok(num) => num,
Err(_) => { break; 0 }
};
println!("{}",prime);
// create (newtx, newrx) in a deliberately unspecified way
// now we need to pass the receiver off to the sieve thread
thread::spawn(move || {
sieve(oldrx, newtx, prime); // forwards numbers if not divisible by prime
});
oldrx = newrx;
}
}
Equivalent working Go code:
func main() {
channel := make(chan int)
var ok bool = true;
var prime int = 0;
go generate(channel, 103)
for true {
prime, ok = <- channel
if !ok {
break;
}
new_channel := make(chan int)
go sieve(channel, new, prime)
channel = new_channel
fmt.Println(prime)
}
}
What is the best way to deal with a situation like this where a Receiver needs to be handed off to a different thread?
You don't really explain what the problem that you are having, but your code is close enough:
use std::sync::mpsc::{channel, Sender, Receiver};
use std::thread;
fn generate_numbers(tx: Sender<u8>) {
for i in 2..100 {
tx.send(i).unwrap();
}
}
fn filter(rx: Receiver<u8>, tx: Sender<u8>, val: u8) {
for v in rx {
if v % val != 0 {
tx.send(v).unwrap();
}
}
}
fn main() {
let (base_tx, base_rx) = channel();
thread::spawn(move || {
generate_numbers(base_tx);
});
let mut old_rx = base_rx;
loop {
let num = match old_rx.recv() {
Ok(v) => v,
Err(_) => break,
};
println!("prime: {}", num);
let (new_tx, new_rx) = channel();
thread::spawn(move || {
filter(old_rx, new_tx, num);
});
old_rx = new_rx;
}
}
using coroutines
Danger, Danger, Will Robinson! These are not coroutines; they are full-fledged threads! These are a lot more heavyweight compared to a coroutine.
What is the best way to deal with a situation like this where a Receiver needs to be handed off to a different thread?
Just... transfer ownership of the Receiver to the thread?

Application on OSX cannot spawn more than 2048 threads

I have a Rust application on on OSX firing up a large amount of threads as can be seen in the code below, however, after looking at how many max threads my version of OSX is allowed to create via the sysctl kern.num_taskthreads command, I can see that it is kern.num_taskthreads: 2048 which explains why I can't spin up over 2048 threads.
How do I go about getting past this hard limit?
let threads = 300000;
let requests = 1;
for _x in 0..threads {
println!("{}", _x);
let request_clone = request.clone();
let handle = thread::spawn(move || {
for _y in 0..requests {
request_clone.lock().unwrap().push((request::Request::new(request::Request::create_request())));
}
});
child_threads.push(handle);
}
Before starting, I'd encourage you to read about the C10K problem. When you get into this scale, there's a lot more things you need to keep in mind.
That being said, I'd suggest looking at mio...
a lightweight IO library for Rust with a focus on adding as little overhead as possible over the OS abstractions.
Specifically, mio provides an event loop, which allows you to handle a large number of connections without spawning threads. Unfortunately, I don't know of a HTTP library that currently supports mio. You could create one and be a hero to the Rust community!
Not sure how helpful this will be, but I was trying to create a small pool of threads that will create connections and then send them over to an event loop via a channel for reading.
I'm sure this code is probably pretty bad, but here it is anyways for examples. It uses the Hyper library, like you mentioned.
extern crate hyper;
use std::io::Read;
use std::thread;
use std::thread::{JoinHandle};
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use hyper::Client;
use hyper::client::Response;
use hyper::header::Connection;
const TARGET: i32 = 100;
const THREADS: i32 = 10;
struct ResponseWithString {
index: i32,
response: Response,
data: Vec<u8>,
complete: bool
}
fn main() {
// Create a client.
let url: &'static str = "http://www.gooogle.com/";
let mut threads = Vec::<JoinHandle<()>>::with_capacity((TARGET * 2) as usize);
let conn_count = Arc::new(Mutex::new(0));
let (tx, rx) = channel::<ResponseWithString>();
for _ in 0..THREADS {
// Move var references into thread context
let conn_count = conn_count.clone();
let tx = tx.clone();
let t = thread::spawn(move || {
loop {
let idx: i32;
{
// Lock, increment, and release
let mut count = conn_count.lock().unwrap();
*count += 1;
idx = *count;
}
if idx > TARGET {
break;
}
let mut client = Client::new();
// Creating an outgoing request.
println!("Creating connection {}...", idx);
let res = client.get(url) // Get URL...
.header(Connection::close()) // Set headers...
.send().unwrap(); // Fire!
println!("Pushing response {}...", idx);
tx.send(ResponseWithString {
index: idx,
response: res,
data: Vec::<u8>::with_capacity(1024),
complete: false
}).unwrap();
}
});
threads.push(t);
}
let mut responses = Vec::<ResponseWithString>::with_capacity(TARGET as usize);
let mut buf: [u8; 1024] = [0; 1024];
let mut completed_count = 0;
loop {
if completed_count >= TARGET {
break; // No more work!
}
match rx.try_recv() {
Ok(r) => {
println!("Incoming response! {}", r.index);
responses.push(r)
},
_ => { }
}
for r in &mut responses {
if r.complete {
continue;
}
// Read the Response.
let res = &mut r.response;
let data = &mut r.data;
let idx = &r.index;
match res.read(&mut buf) {
Ok(i) => {
if i == 0 {
println!("No more data! {}", idx);
r.complete = true;
completed_count += 1;
}
else {
println!("Got data! {} => {}", idx, i);
for x in 0..i {
data.push(buf[x]);
}
}
}
Err(e) => {
panic!("Oh no! {} {}", idx, e);
}
}
}
}
}

Finding a way to solve "...does not live long enough"

I'm building a multiplex in rust. It's one of my first applications and a great learning experience!
However, I'm facing a problem and I cannot find out how to solve it in rust:
Whenever a new channel is added to the multiplex, I have to listen for data on this channel.
The new channel is allocated on the stack when it is requested by the open() function.
However, this channel must not be allocated on the stack but on the heap somehow, because it should stay alive and should not be freed in the next iteration of my receiving loop.
Right now my code looks like this (v0.10-pre):
extern crate collections;
extern crate sync;
use std::comm::{Chan, Port, Select};
use std::mem::size_of_val;
use std::io::ChanWriter;
use std::io::{ChanWriter, PortReader};
use collections::hashmap::HashMap;
use sync::{rendezvous, SyncPort, SyncChan};
use std::task::try;
use std::rc::Rc;
struct MultiplexStream {
internal_port: Port<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>,
internal_chan: Chan<u32>
}
impl MultiplexStream {
fn new(downstream: (Port<~[u8]>, Chan<~[u8]>)) -> ~MultiplexStream {
let (downstream_port, downstream_chan) = downstream;
let (p1, c1): (Port<u32>, Chan<u32>) = Chan::new();
let (p2, c2):
(Port<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>,
Chan<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>) = Chan::new();
let mux = ~MultiplexStream {
internal_port: p2,
internal_chan: c1
};
spawn(proc() {
let mut pool = Select::new();
let mut by_port_num = HashMap::new();
let mut by_handle_id = HashMap::new();
let mut handle_id2port_num = HashMap::new();
let mut internal_handle = pool.handle(&p1);
let mut downstream_handle = pool.handle(&downstream_port);
unsafe {
internal_handle.add();
downstream_handle.add();
}
loop {
let handle_id = pool.wait();
if handle_id == internal_handle.id() {
// setup new port
let port_num: u32 = p1.recv();
if by_port_num.contains_key(&port_num) {
c2.send((port_num, None))
}
else {
let (p1_,c1_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
let (p2_,c2_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
/********************************/
let mut h = pool.handle(&p1_); // <--
/********************************/
/* the error is HERE ^^^ */
/********************************/
unsafe { h.add() };
by_port_num.insert(port_num, c2_);
handle_id2port_num.insert(h.id(), port_num);
by_handle_id.insert(h.id(), h);
c2.send((port_num, Some((p2_,c1_))));
}
}
else if handle_id == downstream_handle.id() {
// demultiplex
let res = try(proc() {
let mut reader = PortReader::new(downstream_port);
let port_num = reader.read_le_u32().unwrap();
let data = reader.read_to_end().unwrap();
return (port_num, data);
});
if res.is_ok() {
let (port_num, data) = res.unwrap();
by_port_num.get(&port_num).send(data);
}
else {
// TODO: handle error
}
}
else {
// multiplex
let h = by_handle_id.get_mut(&handle_id);
let port_num = handle_id2port_num.get(&handle_id);
let port_num = *port_num;
let data = h.recv();
try(proc() {
let mut writer = ChanWriter::new(downstream_chan);
writer.write_le_u32(port_num);
writer.write(data);
writer.flush();
});
// todo check if chan was closed
}
}
});
return mux;
}
fn open(self, port_num: u32) -> Result<(Port<~[u8]>, Chan<~[u8]>), ()> {
let res = try(proc() {
self.internal_chan.send(port_num);
let (n, res) = self.internal_port.recv();
assert!(n == port_num);
return res;
});
if res.is_err() {
return Err(());
}
let res = res.unwrap();
if res.is_none() {
return Err(());
}
let (p,c) = res.unwrap();
return Ok((p,c));
}
}
And the compiler raises this error:
multiplex_stream.rs:81:31: 81:35 error: `p1_` does not live long enough
multiplex_stream.rs:81 let mut h = pool.handle(&p1_);
^~~~
multiplex_stream.rs:48:16: 122:4 note: reference must be valid for the block at 48:15...
multiplex_stream.rs:48 spawn(proc() {
multiplex_stream.rs:49 let mut pool = Select::new();
multiplex_stream.rs:50 let mut by_port_num = HashMap::new();
multiplex_stream.rs:51 let mut by_handle_id = HashMap::new();
multiplex_stream.rs:52 let mut handle_id2port_num = HashMap::new();
multiplex_stream.rs:53
...
multiplex_stream.rs:77:11: 87:7 note: ...but borrowed value is only valid for the block at 77:10
multiplex_stream.rs:77 else {
multiplex_stream.rs:78 let (p1_,c1_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
multiplex_stream.rs:79 let (p2_,c2_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
multiplex_stream.rs:80
multiplex_stream.rs:81 let mut h = pool.handle(&p1_);
multiplex_stream.rs:82 unsafe { h.add() };
Does anyone have an idea how to solve this issue?
The problem is that the new channel that you create does not live long enough—its scope is that of the else block only. You need to ensure that it will live longer—its scope must be at least that of pool.
I haven't made the effort to understand precisely what your code is doing, but what I would expect to be the simplest way to ensure the lifetime of the ports is long enough is to place it into a vector at the same scope as pool, e.g. let ports = ~[];, inserting it with ports.push(p1_); and then taking the reference as &ports[ports.len() - 1]. Sorry, that won't cut it—you can't add new items to a vector while references to its elements are active. You'll need to restructure things somewhat if you want that appraoch to work.

Resources