I am new to Rust and I am struggling with the concept of borrowing.
I want to load a Vec<Vec<f64>> matrix and then process it in parallel. However when I try to compile this piece of code I get error: capture of moved value: `matrix` [E0382] at the let _ = line.
This matrix is supposed to be readonly for the threads, they won't modify it. How can I pass it readonly and make the "moved value" error go away?
fn process(matrix: &Vec<Vec<f64>>) {
// do nothing for now
}
fn test() {
let filename = "matrix.tsv";
// loads matrix into a Vec<Vec<f64>>
let mut matrix = load_matrix(filename);
// Determine number of cpus
let ncpus = num_cpus::get();
println!("Number of cpus on this machine: {}", ncpus);
for i in 0..ncpus {
// In the next line the "error: capture of moved value: matrix" happens
let _ = thread::spawn(move || {
println!("Thread number: {}", i);
process(&matrix);
});
}
let d = Duration::from_millis(1000 * 1000);
thread::sleep(d);
}
Wrap the object to be shared into an Arc which stands for atomically reference counted (pointer). For each thread, clone this pointer and pass ownership of the clone to the thread. The wrapped object will be deallocated when it's no longer used by anything.
fn process(matrix: &Vec<Vec<f64>>) {
// do nothing for now
}
fn test() {
use std::sync::Arc;
let filename = "matrix.tsv";
// loads matrix into a Vec<Vec<f64>>
let mut matrix = Arc::new(load_matrix(filename));
// Determine number of cpus
let ncpus = num_cpus::get();
println!("Number of cpus on this machine: {}", ncpus);
for i in 0..ncpus {
let matrix = matrix.clone();
let _ = thread::spawn(move || {
println!("Thread number: {}", i);
process(&matrix);
});
}
let d = Duration::from_millis(1000 * 1000);
thread::sleep(d);
}
Related
I'm trying to do some parallel processing on a list of values:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let output_list = vec![0.0f32;list.len()];
thread::scope(|s| { // Block S
(0..list.len()).collect::<Vec<_>>().chunks(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|&idx| {
let value = calc_value(list[idx]);
unsafe {
let out = (output_list.as_ptr() as *mut f32).offset(idx as isize);
*out = value;
}
});
});
});
});
output_list
}
The API says that thread::scope() only returns once each thread spawned by the scope it creates has finished. However, the compiler is telling me that the temporary range object (0..list.len()) is deconstructed while the threads that use it might still be alive.
I'm curious about what's actually happening under the hood. My intuition tells me that each thread spawned and variable created within Block S would both have Block S's lifetime. But clearly the threads have a lifetime longer than Block S.
Why aren't these lifetimes be the same?
Is the best practice here to create a variable in Block F that serves the purpose of the temporary like so:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let output_list = vec![0.0f32;list.len()];
let range = (0..list.len()).collect::<Vec<_>>();
thread::scope(|s| { // Block S
range.chunks(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|&idx| {
let value = calc_value(list[idx]);
unsafe {
let out = (output_list.as_ptr() as *mut f32).offset(idx as isize);
*out = value;
}
});
});
});
});
output_list
}
You don't have to ask SO to find out what happens under the hood of std functions, their source code is readily available, but since you asked I'll try to explain a little more in the comments.
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T,
{
// `Scope` creation not very relevant to the issue at hand
let scope = ...;
// here your 'Block S' gets run and returns.
let result = catch_unwind(AssertUnwindSafe(|| f(&scope)));
// the above is just a fancy way of calling `f` while catching any panics.
// but we're waiting for threads to finish running until here
while scope.data.num_running_threads.load(Ordering::Acquire) != 0 {
park();
}
// further not so relevant cleanup code
//...
}
So as you can see your assumption that 'Block S' will stick around as long as any of the threads is wrong.
And yes the solution is to capture the owner of the chunks before you call thread::scope.
There is also no reason to dive into unsafe for your example, you can use zip instead:
fn process_list(list: Vec<f32>) -> Vec<f32> { // Block F
let chunk_size = 100;
let mut output_list = vec![0.0f32; list.len()];
let mut zipped = list
.into_iter()
.zip(output_list.iter_mut())
.collect::<Vec<_>>();
thread::scope(|s| { // Block S
zipped.chunks_mut(chunk_size).for_each(|chunk| { // Block T
s.spawn(|| {
chunk.into_iter().for_each(|(v, out)| {
let value = calc_value(*v);
**out = value;
});
});
});
});
output_list
}
i'm new to rust.
I'm trying to write a file_sensor that will start a counter after a file is created. The plan is that after an amount of time, if a second file is not received the sensor will exit with a zero exit code.
I could write the code to continue that work but i feel the code below illustrates the problem (i have also missed for example the post function referred to)
I have been struggling with this problem for several hours, i've tried Arc and mutex's and even global variables.
The Timer implementation is Ticktock-rs
I need to be able to either get heartbeat in the match body for EventKind::Create(CreateKind::Folder) or file_count in the loop
The code i've attached here runs but file_count is always zero in the loop.
use std::env;
use std::path::Path;
use std::{thread, time};
use std::process::ExitCode;
use ticktock::Timer;
use notify::{
Watcher,
RecommendedWatcher,
RecursiveMode,
Result,
event::{EventKind, CreateKind, ModifyKind, Event}
};
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let mut file_count = 0;
let args = Args::parse();
let REQUEST_SENSOR_PATH = env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher(move|res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count += 1;
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
},
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
},
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now){
println!("Heartbeat: {}, fileCount: {}", n, file_count);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
Ok(())
}
Your compiler warnings show you the problem:
warning: unused variable: `file_count`
--> src/main.rs:31:25
|
31 | file_count += 1;
| ^^^^^^^^^^
|
= note: `#[warn(unused_variables)]` on by default
= help: did you mean to capture by reference instead?
The problem here is that you use file_count inside of a move || closure. file_count is an i32, which is Copy. Using it in a move || closure actually creates a copy of it, which does no longer update the original variable if you assign to it.
Either way, it's impossible to modify a variable in main() from an event handler. Event handlers require 'static lifetime if they reference things, because Rust cannot guarantee that the event handler lives shorter than main.
One solution for this problem is to use reference counters and interior mutability. In this case, I will use Arc for reference counters and AtomicI32 for interior mutability. Note that notify::recommended_watcher requires thread safety, otherwise instead of an Arc<AtomicI32> we could have used an Rc<Cell<i32>>, which is the same thing but only for single-threaded environments, with a little less overhead.
use notify::{
event::{CreateKind, Event, EventKind},
RecursiveMode, Result, Watcher,
};
use std::time;
use std::{env, sync::atomic::Ordering};
use std::{path::Path, sync::Arc};
use std::{process::ExitCode, sync::atomic::AtomicI32};
use ticktock::Timer;
fn main() -> Result<()> {
let now = time::Instant::now();
let mut heartbeat = Timer::apply(
|_, count| {
*count += 1;
*count
},
0,
)
.every(time::Duration::from_millis(500))
.start(now);
let file_count = Arc::new(AtomicI32::new(0));
let REQUEST_SENSOR_PATH =
env::var("REQUEST_SENSOR_PATH").expect("$REQUEST_SENSOR_PATH} is not set");
let mut watcher = notify::recommended_watcher({
let file_count = Arc::clone(&file_count);
move |res: Result<Event>| {
match res {
Ok(event) => {
match event.kind {
EventKind::Create(CreateKind::File) => {
file_count.fetch_add(1, Ordering::AcqRel);
// do something with file
}
_ => { /* something else changed */ }
}
println!("{:?}", event);
}
Err(e) => {
println!("watch error: {:?}", e);
ExitCode::from(101);
}
}
}
})?;
watcher.watch(Path::new(&REQUEST_SENSOR_PATH), RecursiveMode::Recursive)?;
loop {
let now = time::Instant::now();
if let Some(n) = heartbeat.update(now) {
println!(
"Heartbeat: {}, fileCount: {}",
n,
file_count.load(Ordering::Acquire)
);
if n > 10 {
heartbeat.set_value(0);
// This function will reset timer when a file arrives
}
}
}
}
Also, note that the ExitCode::from(101); gives you a warning. It does not actually exit the program, it only creates an exit code variable and then discards it again. You probably intended to write std::process::exit(101);. Although I would discourage it, because it does not properly clean up (does not call any Drop implementations). I'd use panic here, instead. This is the exact usecase panic is meant for.
I want to spawn n threads with the ability to communicate to other threads in a ring topology, e.g. thread 0 can send messages to thread 1, thread 1 to thread 2, etc. and thread n to thread 0.
This is an example of what I want to achieve with n=3:
use std::sync::mpsc::{self, Receiver, Sender};
use std::thread;
let (tx0, rx0): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let (tx1, rx1): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let (tx2, rx2): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let child0 = thread::spawn(move || {
tx0.send(0).unwrap();
println!("thread 0 sent: 0");
println!("thread 0 recv: {:?}", rx2.recv().unwrap());
});
let child1 = thread::spawn(move || {
tx1.send(1).unwrap();
println!("thread 1 sent: 1");
println!("thread 1 recv: {:?}", rx0.recv().unwrap());
});
let child2 = thread::spawn(move || {
tx2.send(2).unwrap();
println!("thread 2 sent: 2");
println!("thread 2 recv: {:?}", rx1.recv().unwrap());
});
child0.join();
child1.join();
child2.join();
Here I create channels in a loop, store them in a vector, reorder the senders, store them in a new vector and then spawn threads each with their own Sender-Receiver (tx1/rx0, tx2/rx1, etc.) pair.
const NTHREADS: usize = 8;
// create n channels
let channels: Vec<(Sender<i32>, Receiver<i32>)> =
(0..NTHREADS).into_iter().map(|_| mpsc::channel()).collect();
// switch tupel entries for the senders to create ring topology
let mut channels_ring: Vec<(Sender<i32>, Receiver<i32>)> = (0..NTHREADS)
.into_iter()
.map(|i| {
(
channels[if i < channels.len() - 1 { i + 1 } else { 0 }].0,
channels[i].1,
)
})
.collect();
let mut children = Vec::new();
for i in 0..NTHREADS {
let (tx, rx) = channels_ring.remove(i);
let child = thread::spawn(move || {
tx.send(i).unwrap();
println!("thread {} sent: {}", i, i);
println!("thread {} recv: {:?}", i, rx.recv().unwrap());
});
children.push(child);
}
for child in children {
let _ = child.join();
}
This doesn't work, because Sender cannot be copied to create a new vector.
However, if I use refs (& Sender):
let mut channels_ring: Vec<(&Sender<i32>, Receiver<i32>)> = (0..NTHREADS)
.into_iter()
.map(|i| {
(
&channels[if i < channels.len() - 1 { i + 1 } else { 0 }].0,
channels[i].1,
)
})
.collect();
I cannot spawn the threads, because std::sync::mpsc::Sender<i32> cannot be shared between threads safely.
Senders and Receivers cannot be shared so you need to move them into their respective threads. That means removing them from the Vec or else consuming the Vec while iterating it - the vector is not permitted to be in an invalid state (with holes), even as an intermediate step. Iterating over the vectors with into_iter will achieve that by consuming them.
A little trick you can use to get the the senders and receivers to pair up in a cycle, is to create two vectors; one of senders and one of receivers; and then rotate one so that the same index into each vector will give you the pairs you want.
use std::sync::mpsc::{self, Receiver, Sender};
use std::thread;
fn main() {
const NTHREADS: usize = 8;
// create n channels
let (mut senders, receivers): (Vec<Sender<i32>>, Vec<Receiver<i32>>) =
(0..NTHREADS).into_iter().map(|_| mpsc::channel()).unzip();
// move the first sender to the back
senders.rotate_left(1);
let children: Vec<_> = senders
.into_iter()
.zip(receivers.into_iter())
.enumerate()
.map(|(i, (tx, rx))| {
thread::spawn(move || {
tx.send(i as i32).unwrap();
println!("thread {} sent: {}", i, i);
println!("thread {} recv: {:?}", i, rx.recv().unwrap());
})
})
.collect();
for child in children {
let _ = child.join();
}
}
This doesn't work, because Sender cannot be copied to create a new vector. However, if I use refs (& Sender):
While it's true that Sender cannot be copied, it does implement Clone, so you can always clone it manually. But that approach won't work for Receiver, which is not Clone and which you also need to extract from the vector.
The problem with your first code is that you cannot use let foo = vec[i] to move just one value out of a vector of non-Copy values. That would leave the vector in an invalid state, with one element invalid, subsequent access to which would cause undefined behavior. For this to work, Vec would need to track which elements were moved and which not, which would impose a cost on all Vecs. So instead, Vec disallows moving an element out of it, leaving it to the user to track moves.
A simple way to move a value out of Vec is to replace Vec<T> with Vec<Option<T>> and use Option::take. foo = vec[i] is replaced with foo = vec[i].take().unwrap(), which moves the T value from the option in vec[i] (while asserting that it's not None) and leaves None, a valid variant of Option<T>, in the vector. Here is your first attempt modified in that manner (playground):
const NTHREADS: usize = 8;
let channels_ring: Vec<_> = {
let mut channels: Vec<_> = (0..NTHREADS)
.into_iter()
.map(|_| {
let (tx, rx) = mpsc::channel();
(Some(tx), Some(rx))
})
.collect();
(0..NTHREADS)
.into_iter()
.map(|rxpos| {
let txpos = if rxpos < NTHREADS - 1 { rxpos + 1 } else { 0 };
(
channels[txpos].0.take().unwrap(),
channels[rxpos].1.take().unwrap(),
)
})
.collect()
};
let children: Vec<_> = channels_ring
.into_iter()
.enumerate()
.map(|(i, (tx, rx))| {
thread::spawn(move || {
tx.send(i as i32).unwrap();
println!("thread {} sent: {}", i, i);
println!("thread {} recv: {:?}", i, rx.recv().unwrap());
})
})
.collect();
for child in children {
child.join().unwrap();
}
This question already has an answer here:
How do I pass disjoint slices from a vector to different threads?
(1 answer)
Closed 4 years ago.
I've got an embarrassingly parallel bit of graphics rendering code that I would like to run across my CPU cores. I've coded up a test case (the function computed is nonsense) to explore how I might parallelize it. I'd like to code this using std Rust in order to learn about using std::thread. But, I don't understand how to give each thread a portion of the framebuffer. I'll put the full testcase code below, but I'll try to break it down first.
The sequential form is super simple:
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
I thought that it would help to make a buffer that was the same size, but re-arranged to be 3D & indexed by core first. This is the same computation, just a reordering of the data to show the workings.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
But, when I try to put the inner part of the code in a closure & create a thread, I get errors about the buffer & lifetimes. I basically don't understand what to do & could use some guidance. I want per_core_buffer to just temporarily refer to the data in buffer2 that belongs to that core & allow it to be written, synchronize all the threads & then read buffer2 afterwards. Is this possible?
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
The error is this & I don't understand:
error[E0597]: `buffer2` does not live long enough
--> src/main.rs:50:36
|
50 | let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
| ^^^^^^^ borrowed value does not live long enough
...
88 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
The full testcase is:
extern crate num_cpus;
use std::time::Instant;
use std::thread;
fn compute(x: i32, y: i32) -> i32 {
(x*y) % (x+y+10000)
}
fn main() {
let num_logical_cores = num_cpus::get();
const WIDTH: usize = 40000;
const HEIGHT: usize = 10000;
let y_per_core = HEIGHT/num_logical_cores + 1;
// ------------------------------------------------------------
// Serial Calculation...
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
let start0 = Instant::now();
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
let dur0 = start0.elapsed();
// ------------------------------------------------------------
// On the way to Parallel Calculation...
// Reorder the data buffer to be 3D with one 2D region per core.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let start1 = Instant::now();
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
let dur1 = start1.elapsed();
// ------------------------------------------------------------
// Actual Parallel Calculation...
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
let start2 = Instant::now();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
let dur2 = start2.elapsed();
println!("Runtime: Serial={0:.3}ms, AlmostParallel={1:.3}ms, Parallel={2:.3}ms",
1000.*dur0.as_secs() as f64 + 1e-6*(dur0.subsec_nanos() as f64),
1000.*dur1.as_secs() as f64 + 1e-6*(dur1.subsec_nanos() as f64),
1000.*dur2.as_secs() as f64 + 1e-6*(dur2.subsec_nanos() as f64));
// Sanity check
for j in 0..HEIGHT {
let c = j % num_logical_cores;
let y = j / num_logical_cores;
for i in 0..WIDTH {
if buffer0[j][i] != buffer1[c][y][i] {
println!("wtf1? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer1[c][y][i])
}
if buffer0[j][i] != buffer2[c][y][i] {
println!("wtf2? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer2[c][y][i])
}
}
}
}
Thanks to #Shepmaster for the pointers and clarification that this is not an easy problem for Rust, and that I needed to consider crates to find a reasonable solution. I'm only just starting out in Rust, so this really wasn't clear to me.
I liked the ability to control the number of threads that scoped_threadpool gives, so I went with that. Translating my code from above directly, I tried to use the 4D buffer with core as the most-significant-index and that ran into troubles because that 3D vector does not implement the Copy trait. The fact that it implements Copy makes me concerned about performance, but I went back to the original problem and implemented it more directly & found a reasonable speedup by making each row a thread. Copying each row will not be a large memory overhead.
The code that works for me is:
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
let mut pool = Pool::new(num_logical_cores as u32);
pool.scoped(|scope| {
let mut y = 0;
for e in &mut buffer2 {
scope.execute(move || {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
});
y += 1;
}
});
On a 6 core, 12 thread i7-8700K for 400000x4000 testcase this runs in 3.2 seconds serially & 481ms in parallel--a reasonable speedup.
EDIT: I continued to think about this issue and got a suggestion from Rustlang on twitter that I should consider rayon. I converted my code to rayon and got similar speedup with the following code.
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
buffer2
.par_iter_mut()
.enumerate()
.map(|(y,e): (usize, &mut Vec<i32>)| {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
})
.collect::<Vec<_>>();
I'm not sure I understand Rust's concurrency support with Mutexes and condition variables. In the following code, the main thread sets the poll_thread to be idle for two seconds, then to "read a register" for 2 seconds, and then return to "idle":
use std::thread;
use std::sync::{Arc, Mutex, Condvar};
use std::time;
#[derive(PartialEq, Debug)]
enum Command {
Idle,
ReadRegister(u32),
}
fn poll_thread(sync_pair: Arc<(Mutex<Command>, Condvar)>) {
let &(ref mutex, ref cvar) = &*sync_pair;
loop {
let mut flag = mutex.lock().unwrap();
while *flag == Command::Idle {
flag = cvar.wait(flag).unwrap();
}
match *flag {
Command::Idle => {
println!("WHAT IMPOSSIBLE!");
panic!();
}
Command::ReadRegister(i) => {
println!("You want me to read {}?", i);
thread::sleep(time::Duration::from_millis(450));
println!("Ok, here it is: {}", 42);
}
}
}
}
pub fn main() {
let pair = Arc::new((Mutex::new(Command::Idle), Condvar::new()));
let pclone = pair.clone();
let rx_thread = thread::spawn(|| poll_thread(pclone));
let &(ref mutex, ref cvar) = &*pair;
for i in 0..10 {
thread::sleep(time::Duration::from_millis(500));
if i == 4 {
println!("Setting ReadRegister");
let mut flag = mutex.lock().unwrap();
*flag = Command::ReadRegister(5);
println!("flag is = {:?}", *flag);
cvar.notify_one();
} else if i == 8 {
println!("Setting Idle");
let mut flag = mutex.lock().unwrap();
*flag = Command::Idle;
println!("flag is = {:?}", *flag);
cvar.notify_one();
}
}
println!("after notify_one()");
rx_thread.join();
}
This works as expected, but when the line to sleep for 450 milliseconds is uncommented, the code will often remain in the "read" state and not return to waiting on the condition variable cvar.wait(). Sometimes it will return to idle after, say, 15 seconds!
I would think that when poll_thread reaches the bottom of the loop, it would release the lock, allowing main to acquire and set flag = Command::Idle, and within roughly half a second, poll_thread would return to idle, but it appears that isn't happening when poll_thread sleeps. Why?