Sharing arrays between threads in Rust - rust

I'm new to Rust and I'm struggling with some ownership semantics.
The goal is to do some nonsense measurements on multiplying 2 f64 arrays and writing the result in a third array.
In the single-threaded version, a single thread takes care of the whole range. In the multi-threaded version, each thread takes care of a segment of the range.
The single-threaded version is easy, but my problem is with the multithreaded version where I'm struggling with the ownership rules.
I was thinking to use raw pointers, to bypass the borrow checker. But I'm still not able to make it pass.
#![feature(box_syntax)]
use std::time::SystemTime;
use rand::Rng;
use std::thread;
fn main() {
let nCells = 1_000_000;
let concurrency = 1;
let mut one = box [0f64; 1_000_000];
let mut two = box [0f64; 1_000_000];
let mut res = box [0f64; 1_000_000];
println!("Creating data");
let mut rng = rand::thread_rng();
for i in 0..nCells {
one[i] = rng.gen::<f64>();
two[i] = rng.gen::<f64>();
res[i] = 0 as f64;
}
println!("Finished creating data");
let rounds = 100000;
let start = SystemTime::now();
let one_raw = Box::into_raw(one);
let two_raw = Box::into_raw(two);
let res_raw = Box::into_raw(res);
let mut handlers = Vec::new();
for _ in 0..rounds {
let sizePerJob = nCells / concurrency;
for j in 0..concurrency {
let from = j * sizePerJob;
let to = (j + 1) * sizePerJob;
handlers.push(thread::spawn(|| {
unsafe {
unsafe {
processData(one_raw, two_raw, res_raw, from, to);
}
}
}));
}
for j in 0..concurrency {
handlers.get_mut(j).unwrap().join();
}
handlers.clear();
}
let durationUs = SystemTime::now().duration_since(start).unwrap().as_micros();
let durationPerRound = durationUs / rounds;
println!("duration per round {} us", durationPerRound);
}
// Make sure we can find the function in the generated Assembly
#[inline(never)]
pub fn processData(one: *const [f64;1000000],
two: *const [f64;1000000],
res: *mut [f64;1000000],
from: usize,
to: usize) {
unsafe {
for i in from..to {
(*res)[i] = (*one)[i] * (*two)[i];
}
}
}
This is the error I'm getting
error[E0277]: `*mut [f64; 1000000]` cannot be shared between threads safely
--> src/main.rs:38:27
|
38 | handlers.push(thread::spawn(|| {
| ^^^^^^^^^^^^^ `*mut [f64; 1000000]` cannot be shared between threads safely
|
= help: the trait `Sync` is not implemented for `*mut [f64; 1000000]`
= note: required because of the requirements on the impl of `Send` for `&*mut [f64; 1000000]`
note: required because it's used within this closure
--> src/main.rs:38:41
|
38 | handlers.push(thread::spawn(|| {
| ^^
note: required by a bound in `spawn`
--> /home/pveentjer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:653:8
|
653 | F: Send + 'static,
| ^^^^ required by this bound in `spawn`
[edit] I know that spawning threads is very expensive. I'll convert this to a pool of worker threads that can be recycled once this code is up and running.

You can use chunks_mut or split_at_mut to get non-overlapping slices of one two and res. You can then access different slices from different threads safely. See: documentation for chunks_mut and documentation for split_at_mut
I was able to compile it using scoped threads and chunks_mut. I have removed all the unsafe stuff because there is no need. See the code:
#![feature(box_syntax)]
#![feature(scoped_threads)]
use rand::Rng;
use std::thread;
use std::time::SystemTime;
fn main() {
let nCells = 1_000_000;
let concurrency = 2;
let mut one = box [0f64; 1_000_000];
let mut two = box [0f64; 1_000_000];
let mut res = box [0f64; 1_000_000];
println!("Creating data");
let mut rng = rand::thread_rng();
for i in 0..nCells {
one[i] = rng.gen::<f64>();
two[i] = rng.gen::<f64>();
res[i] = 0 as f64;
}
println!("Finished creating data");
let rounds = 1000;
let start = SystemTime::now();
for _ in 0..rounds {
let size_per_job = nCells / concurrency;
thread::scope(|s| {
for it in one
.chunks_mut(size_per_job)
.zip(two.chunks_mut(size_per_job))
.zip(res.chunks_mut(size_per_job))
{
let ((one, two), res) = it;
s.spawn(|| {
processData(one, two, res);
});
}
});
}
let durationUs = SystemTime::now().duration_since(start).unwrap().as_micros();
let durationPerRound = durationUs / rounds;
println!("duration per round {} us", durationPerRound);
}
// Make sure we can find the function in the generated Assembly
#[inline(never)]
pub fn processData(one: &[f64], two: &[f64], res: &mut [f64]) {
for i in 0..one.len() {
res[i] = one[i] * two[i];
}
}

Related

How to create threads that last entire duration of program and pass immutable chunks for threads to operate on?

I have a bunch of math that has real time constraints. My main loop will just call this function repeatedly and it will always store results into an existing buffer. However, I want to be able to spawn the threads at init time and then allow the threads to run and do their work and then wait for more data. The synchronization I will use a Barrier and have that part working. What I can't get working and have tried various iterations of Arc or crossbeam is splitting the thread spawning up and the actual workload. This is what I have now.
pub const WORK_SIZE: usize = 524_288;
pub const NUM_THREADS: usize = 6;
pub const NUM_TASKS_PER_THREAD: usize = WORK_SIZE / NUM_THREADS;
fn main() {
let mut work: Vec<f64> = Vec::with_capacity(WORK_SIZE);
for i in 0..WORK_SIZE {
work.push(i as f64);
}
crossbeam::scope(|scope| {
let threads: Vec<_> = work
.chunks(NUM_TASKS_PER_THREAD)
.map(|chunk| scope.spawn(move |_| chunk.iter().cloned().sum::<f64>()))
.collect();
let threaded_time = std::time::Instant::now();
let thread_sum: f64 = threads.into_iter().map(|t| t.join().unwrap()).sum();
let threaded_micros = threaded_time.elapsed().as_micros() as f64;
println!("threaded took: {:#?}", threaded_micros);
let serial_time = std::time::Instant::now();
let no_thread_sum: f64 = work.iter().cloned().sum();
let serial_micros = serial_time.elapsed().as_micros() as f64;
println!("serial took: {:#?}", serial_micros);
assert_eq!(thread_sum, no_thread_sum);
println!(
"Threaded performace was {:?}",
serial_micros / threaded_micros
);
})
.unwrap();
}
But I can't find a way to spin these threads up in an init function and then in a do_work function pass work into them. I attempted to do something like this with Arc's and Mutex's but couldn't get everything straight there either. What I want to turn this into is something like the following
use std::sync::{Arc, Barrier, Mutex};
use std::{slice::Chunks, thread::JoinHandle};
pub const WORK_SIZE: usize = 524_288;
pub const NUM_THREADS: usize = 6;
pub const NUM_TASKS_PER_THREAD: usize = WORK_SIZE / NUM_THREADS;
//simplified version of what actual work that code base will do
fn do_work(data: &[f64], result: Arc<Mutex<f64>>, barrier: Arc<Barrier>) {
loop {
barrier.wait();
let sum = data.into_iter().cloned().sum::<f64>();
let mut result = *result.lock().unwrap();
result += sum;
}
}
fn init(
mut data: Chunks<'_, f64>,
result: &Arc<Mutex<f64>>,
barrier: &Arc<Barrier>,
) -> Vec<std::thread::JoinHandle<()>> {
let mut handles = Vec::with_capacity(NUM_THREADS);
//spawn threads, in actual code these would be stored in a lib crate struct
for i in 0..NUM_THREADS {
let result = result.clone();
let barrier = barrier.clone();
let chunk = data.nth(i).unwrap();
handles.push(std::thread::spawn(|| {
//Pass the particular thread the particular chunk it will operate on.
do_work(chunk, result, barrier);
}));
}
handles
}
fn main() {
let mut work: Vec<f64> = Vec::with_capacity(WORK_SIZE);
let mut result = Arc::new(Mutex::new(0.0));
for i in 0..WORK_SIZE {
work.push(i as f64);
}
let work_barrier = Arc::new(Barrier::new(NUM_THREADS + 1));
let threads = init(work.chunks(NUM_TASKS_PER_THREAD), &result, &work_barrier);
loop {
work_barrier.wait();
//actual code base would do something with summation stored in result.
println!("{:?}", result.lock().unwrap());
}
}
I hope this expresses the intent clearly enough of what I need to do. The issue with this specific implementation is that the chunks don't seem to live long enough and when I tried wrapping them in an Arc as it just moved the argument doesn't live long enough to the Arc::new(data.chunk(_)) line.
use std::sync::{Arc, Barrier, Mutex};
use std::thread;
pub const WORK_SIZE: usize = 524_288;
pub const NUM_THREADS: usize = 6;
pub const NUM_TASKS_PER_THREAD: usize = WORK_SIZE / NUM_THREADS;
//simplified version of what actual work that code base will do
fn do_work(data: &[f64], result: Arc<Mutex<f64>>, barrier: Arc<Barrier>) {
loop {
barrier.wait();
let sum = data.iter().sum::<f64>();
*result.lock().unwrap() += sum;
}
}
fn init(
work: Vec<f64>,
result: Arc<Mutex<f64>>,
barrier: Arc<Barrier>,
) -> Vec<thread::JoinHandle<()>> {
let mut handles = Vec::with_capacity(NUM_THREADS);
//spawn threads, in actual code these would be stored in a lib crate struct
for i in 0..NUM_THREADS {
let slice = work[i * NUM_TASKS_PER_THREAD..(i + 1) * NUM_TASKS_PER_THREAD].to_owned();
let result = Arc::clone(&result);
let w = Arc::clone(&barrier);
handles.push(thread::spawn(move || {
do_work(&slice, result, w);
}));
}
handles
}
fn main() {
let mut work: Vec<f64> = Vec::with_capacity(WORK_SIZE);
let result = Arc::new(Mutex::new(0.0));
for i in 0..WORK_SIZE {
work.push(i as f64);
}
let work_barrier = Arc::new(Barrier::new(NUM_THREADS + 1));
let _threads = init(work, Arc::clone(&result), Arc::clone(&work_barrier));
loop {
thread::sleep(std::time::Duration::from_secs(3));
work_barrier.wait();
//actual code base would do something with summation stored in result.
println!("{:?}", result.lock().unwrap());
}
}

How to use a clone in a Rust thread

In this rust program, inside the run function, I am trying to pass the "pair_clone" as a parameter for both threads but I keep getting a mismatched type error? I thought I was passing the pair but it says I'm passing an integer instead.
use std::sync::{Arc, Mutex, Condvar};
fn producer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "producing"
}
}
fn consumer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "consuming"
}
}
pub fn run() {
println!("Main::Begin");
let num_of_loops = 5;
let num_of_threads = 4;
let mut array_of_threads = vec!();
let pair = Arc ::new((Mutex::new(true), Condvar::new()));
for pair in 0..num_of_threads {
let pair_clone = pair.clone();
array_of_threads.push(std::thread::spawn( move || producer(&pair_clone, num_of_loops)));
array_of_threads.push(std::thread::spawn( move || consumer(&pair_clone, num_of_loops)));
}
for i in array_of_threads {
i.join().unwrap();
}
println!("Main::End");
}
You have two main errors
The first: you are using the name of the pair as the loop index. This makes pair be the integer the compiler complains about.
The second: you are using one copy while you need two, one for the producer and the other for the consumer
After Edit
use std::sync::{Arc, Mutex, Condvar};
fn producer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "producing"
}
fn consumer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "consuming"
}
pub fn run() {
println!("Main::Begin");
let num_of_loops = 5;
let num_of_threads = 4;
let mut array_of_threads = vec![];
let pair = Arc ::new((Mutex::new(true), Condvar::new()));
for _ in 0..num_of_threads {
let pair_clone1 = pair.clone();
let pair_clone2 = pair.clone();
array_of_threads.push(std::thread::spawn( move || producer(&pair_clone1, num_of_loops)));
array_of_threads.push(std::thread::spawn( move || consumer(&pair_clone2, num_of_loops)));
}
for i in array_of_threads {
i.join().unwrap();
}
println!("Main::End");
}
Demo
Note that I haven't given any attention to the code quality. just fixed the compile errors.

Future created by async block is not `Send` because of *mut u8 [duplicate]

This question already has an answer here:
How to send a pointer to another thread?
(1 answer)
Closed 5 months ago.
I was able to proceed forward to implement my asynchronous udp server. However I have this error showing up twice because my variable data has type *mut u8 which is not Send:
error: future cannot be sent between threads safely
help: within `impl std::future::Future`, the trait `std::marker::Send` is not implemented for `*mut u8`
note: captured value is not `Send`
And the code (MRE):
use std::error::Error;
use std::time::Duration;
use std::env;
use tokio::net::UdpSocket;
use tokio::{sync::mpsc, task, time}; // 1.4.0
use std::alloc::{alloc, Layout};
use std::mem;
use std::mem::MaybeUninit;
use std::net::SocketAddr;
const UDP_HEADER: usize = 8;
const IP_HEADER: usize = 20;
const AG_HEADER: usize = 4;
const MAX_DATA_LENGTH: usize = (64 * 1024 - 1) - UDP_HEADER - IP_HEADER;
const MAX_CHUNK_SIZE: usize = MAX_DATA_LENGTH - AG_HEADER;
const MAX_DATAGRAM_SIZE: usize = 0x10000;
/// A wrapper for [ptr::copy_nonoverlapping] with different argument order (same as original memcpy)
unsafe fn memcpy(dst_ptr: *mut u8, src_ptr: *const u8, len: usize) {
std::ptr::copy_nonoverlapping(src_ptr, dst_ptr, len);
}
// Different from https://doc.rust-lang.org/std/primitive.u32.html#method.next_power_of_two
// Returns the [exponent] from the smallest power of two greater than or equal to n.
const fn next_power_of_two_exponent(n: u32) -> u32 {
return 32 - (n - 1).leading_zeros();
}
async fn run_server(socket: UdpSocket) {
let mut missing_indexes: Vec<u16> = Vec::new();
let mut peer_addr = MaybeUninit::<SocketAddr>::uninit();
let mut data = std::ptr::null_mut(); // ptr for the file bytes
let mut len: usize = 0; // total len of bytes that will be written
let mut layout = MaybeUninit::<Layout>::uninit();
let mut buf = [0u8; MAX_DATA_LENGTH];
let mut start = false;
let (debounce_tx, mut debounce_rx) = mpsc::channel::<(usize, SocketAddr)>(3300);
let (network_tx, mut network_rx) = mpsc::channel::<(usize, SocketAddr)>(3300);
loop {
// Listen for events
let debouncer = task::spawn(async move {
let duration = Duration::from_millis(3300);
loop {
match time::timeout(duration, debounce_rx.recv()).await {
Ok(Some((size, peer))) => {
eprintln!("Network activity");
}
Ok(None) => {
if start == true {
eprintln!("Debounce finished");
break;
}
}
Err(_) => {
eprintln!("{:?} since network activity", duration);
}
}
}
});
// Listen for network activity
let server = task::spawn({
// async{
let debounce_tx = debounce_tx.clone();
async move {
while let Some((size, peer)) = network_rx.recv().await {
// Received a new packet
debounce_tx.send((size, peer)).await.expect("Unable to talk to debounce");
eprintln!("Received a packet {} from: {}", size, peer);
let packet_index: u16 = (buf[0] as u16) << 8 | buf[1] as u16;
if start == false { // first bytes of a new file: initialization // TODO: ADD A MUTEX to prevent many initializations
start = true;
let chunks_cnt: u32 = (buf[2] as u32) << 8 | buf[3] as u32;
let n: usize = MAX_DATAGRAM_SIZE << next_power_of_two_exponent(chunks_cnt);
unsafe {
layout.as_mut_ptr().write(Layout::from_size_align_unchecked(n, mem::align_of::<u8>()));
// /!\ data has type `*mut u8` which is not `Send`
data = alloc(layout.assume_init());
peer_addr.as_mut_ptr().write(peer);
}
let a: Vec<u16> = vec![0; chunks_cnt as usize]; //(0..chunks_cnt).map(|x| x as u16).collect(); // create a sorted vector with all the required indexes
missing_indexes = a;
}
missing_indexes[packet_index as usize] = 1;
unsafe {
let dst_ptr = data.offset((packet_index as usize * MAX_CHUNK_SIZE) as isize);
memcpy(dst_ptr, &buf[AG_HEADER], size - AG_HEADER);
};
println!("receiving packet {} from: {}", packet_index, peer);
}
}
});
// Prevent deadlocks
drop(debounce_tx);
match socket.recv_from(&mut buf).await {
Ok((size, src)) => {
network_tx.send((size, src)).await.expect("Unable to talk to network");
}
Err(e) => {
eprintln!("couldn't recieve a datagram: {}", e);
}
}
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let addr = env::args().nth(1).unwrap_or_else(|| "127.0.0.1:8080".to_string());
let socket = UdpSocket::bind(&addr).await?;
println!("Listening on: {}", socket.local_addr()?);
run_server(socket);
Ok(())
}
Since I was converting from synchronous to asynchronous code I know that, potentially, multiple thread would be writing to data, and that is probably why I encounter such error. But I don't know which syntax I could use to "clone" the mut ptr and make it unique for each thread (and same for the buffer).
As suggested by user4815162342 I think the best would be
to make pointer Send by wrapping it in a struct and declaring unsafe impl Send for NewStruct {}.
Any help strongly appreciated!
PS: Full code can be found on my github repository
Short version
Thanks to the comment of user4815162342 I decided to add an implementation for the mut ptr to be able to use it with Send and Sync, which allowed me to solve this part (there are still other issues, but beyond the scope of this question):
pub struct FileBuffer {
data: *mut u8
}
unsafe impl Send for FileBuffer {}
unsafe impl Sync for FileBuffer {}
//let mut data = std::ptr::null_mut(); // ptr for the file bytes
let mut fileBuffer: FileBuffer = FileBuffer { data: std::ptr::null_mut() };
Long version
use std::error::Error;
use std::time::Duration;
use std::env;
use tokio::net::UdpSocket;
use tokio::{sync::mpsc, task, time}; // 1.4.0
use std::alloc::{alloc, Layout};
use std::mem;
use std::mem::MaybeUninit;
use std::net::SocketAddr;
const UDP_HEADER: usize = 8;
const IP_HEADER: usize = 20;
const AG_HEADER: usize = 4;
const MAX_DATA_LENGTH: usize = (64 * 1024 - 1) - UDP_HEADER - IP_HEADER;
const MAX_CHUNK_SIZE: usize = MAX_DATA_LENGTH - AG_HEADER;
const MAX_DATAGRAM_SIZE: usize = 0x10000;
/// A wrapper for [ptr::copy_nonoverlapping] with different argument order (same as original memcpy)
unsafe fn memcpy(dst_ptr: *mut u8, src_ptr: *const u8, len: usize) {
std::ptr::copy_nonoverlapping(src_ptr, dst_ptr, len);
}
// Different from https://doc.rust-lang.org/std/primitive.u32.html#method.next_power_of_two
// Returns the [exponent] from the smallest power of two greater than or equal to n.
const fn next_power_of_two_exponent(n: u32) -> u32 {
return 32 - (n - 1).leading_zeros();
}
pub struct FileBuffer {
data: *mut u8
}
unsafe impl Send for FileBuffer {}
unsafe impl Sync for FileBuffer {}
async fn run_server(socket: UdpSocket) {
let mut missing_indexes: Vec<u16> = Vec::new();
let mut peer_addr = MaybeUninit::<SocketAddr>::uninit();
//let mut data = std::ptr::null_mut(); // ptr for the file bytes
let mut fileBuffer: FileBuffer = FileBuffer { data: std::ptr::null_mut() };
let mut len: usize = 0; // total len of bytes that will be written
let mut layout = MaybeUninit::<Layout>::uninit();
let mut buf = [0u8; MAX_DATA_LENGTH];
let mut start = false;
let (debounce_tx, mut debounce_rx) = mpsc::channel::<(usize, SocketAddr)>(3300);
let (network_tx, mut network_rx) = mpsc::channel::<(usize, SocketAddr)>(3300);
loop {
// Listen for events
let debouncer = task::spawn(async move {
let duration = Duration::from_millis(3300);
loop {
match time::timeout(duration, debounce_rx.recv()).await {
Ok(Some((size, peer))) => {
eprintln!("Network activity");
}
Ok(None) => {
if start == true {
eprintln!("Debounce finished");
break;
}
}
Err(_) => {
eprintln!("{:?} since network activity", duration);
}
}
}
});
// Listen for network activity
let server = task::spawn({
// async{
let debounce_tx = debounce_tx.clone();
async move {
while let Some((size, peer)) = network_rx.recv().await {
// Received a new packet
debounce_tx.send((size, peer)).await.expect("Unable to talk to debounce");
eprintln!("Received a packet {} from: {}", size, peer);
let packet_index: u16 = (buf[0] as u16) << 8 | buf[1] as u16;
if start == false { // first bytes of a new file: initialization // TODO: ADD A MUTEX to prevent many initializations
start = true;
let chunks_cnt: u32 = (buf[2] as u32) << 8 | buf[3] as u32;
let n: usize = MAX_DATAGRAM_SIZE << next_power_of_two_exponent(chunks_cnt);
unsafe {
layout.as_mut_ptr().write(Layout::from_size_align_unchecked(n, mem::align_of::<u8>()));
// /!\ data has type `*mut u8` which is not `Send`
fileBuffer.data = alloc(layout.assume_init());
peer_addr.as_mut_ptr().write(peer);
}
let a: Vec<u16> = vec![0; chunks_cnt as usize]; //(0..chunks_cnt).map(|x| x as u16).collect(); // create a sorted vector with all the required indexes
missing_indexes = a;
}
missing_indexes[packet_index as usize] = 1;
unsafe {
let dst_ptr = fileBuffer.data.offset((packet_index as usize * MAX_CHUNK_SIZE) as isize);
memcpy(dst_ptr, &buf[AG_HEADER], size - AG_HEADER);
};
println!("receiving packet {} from: {}", packet_index, peer);
}
}
});
// Prevent deadlocks
drop(debounce_tx);
match socket.recv_from(&mut buf).await {
Ok((size, src)) => {
network_tx.send((size, src)).await.expect("Unable to talk to network");
}
Err(e) => {
eprintln!("couldn't recieve a datagram: {}", e);
}
}
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let addr = env::args().nth(1).unwrap_or_else(|| "127.0.0.1:8080".to_string());
let socket = UdpSocket::bind(&addr).await?;
println!("Listening on: {}", socket.local_addr()?);
run_server(socket);
Ok(())
}

How to give each CPU core mutable access to a portion of a Vec? [duplicate]

This question already has an answer here:
How do I pass disjoint slices from a vector to different threads?
(1 answer)
Closed 4 years ago.
I've got an embarrassingly parallel bit of graphics rendering code that I would like to run across my CPU cores. I've coded up a test case (the function computed is nonsense) to explore how I might parallelize it. I'd like to code this using std Rust in order to learn about using std::thread. But, I don't understand how to give each thread a portion of the framebuffer. I'll put the full testcase code below, but I'll try to break it down first.
The sequential form is super simple:
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
I thought that it would help to make a buffer that was the same size, but re-arranged to be 3D & indexed by core first. This is the same computation, just a reordering of the data to show the workings.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
But, when I try to put the inner part of the code in a closure & create a thread, I get errors about the buffer & lifetimes. I basically don't understand what to do & could use some guidance. I want per_core_buffer to just temporarily refer to the data in buffer2 that belongs to that core & allow it to be written, synchronize all the threads & then read buffer2 afterwards. Is this possible?
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
The error is this & I don't understand:
error[E0597]: `buffer2` does not live long enough
--> src/main.rs:50:36
|
50 | let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
| ^^^^^^^ borrowed value does not live long enough
...
88 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
The full testcase is:
extern crate num_cpus;
use std::time::Instant;
use std::thread;
fn compute(x: i32, y: i32) -> i32 {
(x*y) % (x+y+10000)
}
fn main() {
let num_logical_cores = num_cpus::get();
const WIDTH: usize = 40000;
const HEIGHT: usize = 10000;
let y_per_core = HEIGHT/num_logical_cores + 1;
// ------------------------------------------------------------
// Serial Calculation...
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
let start0 = Instant::now();
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
let dur0 = start0.elapsed();
// ------------------------------------------------------------
// On the way to Parallel Calculation...
// Reorder the data buffer to be 3D with one 2D region per core.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let start1 = Instant::now();
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
let dur1 = start1.elapsed();
// ------------------------------------------------------------
// Actual Parallel Calculation...
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
let start2 = Instant::now();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
let dur2 = start2.elapsed();
println!("Runtime: Serial={0:.3}ms, AlmostParallel={1:.3}ms, Parallel={2:.3}ms",
1000.*dur0.as_secs() as f64 + 1e-6*(dur0.subsec_nanos() as f64),
1000.*dur1.as_secs() as f64 + 1e-6*(dur1.subsec_nanos() as f64),
1000.*dur2.as_secs() as f64 + 1e-6*(dur2.subsec_nanos() as f64));
// Sanity check
for j in 0..HEIGHT {
let c = j % num_logical_cores;
let y = j / num_logical_cores;
for i in 0..WIDTH {
if buffer0[j][i] != buffer1[c][y][i] {
println!("wtf1? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer1[c][y][i])
}
if buffer0[j][i] != buffer2[c][y][i] {
println!("wtf2? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer2[c][y][i])
}
}
}
}
Thanks to #Shepmaster for the pointers and clarification that this is not an easy problem for Rust, and that I needed to consider crates to find a reasonable solution. I'm only just starting out in Rust, so this really wasn't clear to me.
I liked the ability to control the number of threads that scoped_threadpool gives, so I went with that. Translating my code from above directly, I tried to use the 4D buffer with core as the most-significant-index and that ran into troubles because that 3D vector does not implement the Copy trait. The fact that it implements Copy makes me concerned about performance, but I went back to the original problem and implemented it more directly & found a reasonable speedup by making each row a thread. Copying each row will not be a large memory overhead.
The code that works for me is:
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
let mut pool = Pool::new(num_logical_cores as u32);
pool.scoped(|scope| {
let mut y = 0;
for e in &mut buffer2 {
scope.execute(move || {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
});
y += 1;
}
});
On a 6 core, 12 thread i7-8700K for 400000x4000 testcase this runs in 3.2 seconds serially & 481ms in parallel--a reasonable speedup.
EDIT: I continued to think about this issue and got a suggestion from Rustlang on twitter that I should consider rayon. I converted my code to rayon and got similar speedup with the following code.
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
buffer2
.par_iter_mut()
.enumerate()
.map(|(y,e): (usize, &mut Vec<i32>)| {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
})
.collect::<Vec<_>>();

How do I use a Condvar to limit multithreading?

I'm trying to use a Condvar to limit the number of threads that are active at any given time. I'm having a hard time finding good examples on how to use Condvar. So far I have:
use std::sync::{Arc, Condvar, Mutex};
use std::thread;
fn main() {
let thread_count_arc = Arc::new((Mutex::new(0), Condvar::new()));
let mut i = 0;
while i < 100 {
let thread_count = thread_count_arc.clone();
thread::spawn(move || {
let &(ref num, ref cvar) = &*thread_count;
{
let mut start = num.lock().unwrap();
if *start >= 20 {
cvar.wait(start);
}
*start += 1;
}
println!("hello");
cvar.notify_one();
});
i += 1;
}
}
The compiler error given is:
error[E0382]: use of moved value: `start`
--> src/main.rs:16:18
|
14 | cvar.wait(start);
| ----- value moved here
15 | }
16 | *start += 1;
| ^^^^^ value used here after move
|
= note: move occurs because `start` has type `std::sync::MutexGuard<'_, i32>`, which does not implement the `Copy` trait
I'm entirely unsure if my use of Condvar is correct. I tried staying as close as I could to the example on the Rust API. Wwat is the proper way to implement this?
Here's a version that compiles:
use std::{
sync::{Arc, Condvar, Mutex},
thread,
};
fn main() {
let thread_count_arc = Arc::new((Mutex::new(0u8), Condvar::new()));
let mut i = 0;
while i < 100 {
let thread_count = thread_count_arc.clone();
thread::spawn(move || {
let (num, cvar) = &*thread_count;
let mut start = cvar
.wait_while(num.lock().unwrap(), |start| *start >= 20)
.unwrap();
// Before Rust 1.42, use this:
//
// let mut start = num.lock().unwrap();
// while *start >= 20 {
// start = cvar.wait(start).unwrap()
// }
*start += 1;
println!("hello");
cvar.notify_one();
});
i += 1;
}
}
The important part can be seen from the signature of Condvar::wait_while or Condvar::wait:
pub fn wait_while<'a, T, F>(
&self,
guard: MutexGuard<'a, T>,
condition: F
) -> LockResult<MutexGuard<'a, T>>
where
F: FnMut(&mut T) -> bool,
pub fn wait<'a, T>(
&self,
guard: MutexGuard<'a, T>
) -> LockResult<MutexGuard<'a, T>>
This says that wait_while / wait consumes the guard, which is why you get the error you did - you no longer own start, so you can't call any methods on it!
These functions are doing a great job of reflecting how Condvars work - you give up the lock on the Mutex (represented by start) for a while, and when the function returns you get the lock again.
The fix is to give up the lock and then grab the lock guard return value from wait_while / wait. I've also switched from an if to a while, as encouraged by huon.
For reference, the usual way to have a limited number of threads in a given scope is with a Semaphore.
Unfortunately, Semaphore was never stabilized, was deprecated in Rust 1.8 and was removed in Rust 1.9. There are crates available that add semaphores on top of other concurrency primitives.
let sema = Arc::new(Semaphore::new(20));
for i in 0..100 {
let sema = sema.clone();
thread::spawn(move || {
let _guard = sema.acquire();
println!("{}", i);
})
}
This isn't quite doing the same thing: since each thread is not printing the total number of the threads inside the scope when that thread entered it.
I realized the code I provided didn't do exactly what I wanted it to, so I'm putting this edit of Shepmaster's code here for future reference.
use std::sync::{Arc, Condvar, Mutex};
use std::thread;
fn main() {
let thread_count_arc = Arc::new((Mutex::new(0u8), Condvar::new()));
let mut i = 0;
while i < 150 {
let thread_count = thread_count_arc.clone();
thread::spawn(move || {
let x;
let &(ref num, ref cvar) = &*thread_count;
{
let start = num.lock().unwrap();
let mut start = if *start >= 20 {
cvar.wait(start).unwrap()
} else {
start
};
*start += 1;
x = *start;
}
println!("{}", x);
{
let mut counter = num.lock().unwrap();
*counter -= 1;
}
cvar.notify_one();
});
i += 1;
}
println!("done");
}
Running this in the playground should show more or less expected behavior.
You want to use a while loop, and re-assign start at each iteration, like:
fn main() {
let thread_count_arc = Arc::new((Mutex::new(0), Condvar::new()));
let mut i = 0;
while i < 100 {
let thread_count = thread_count_arc.clone();
thread::spawn(move || {
let &(ref num, ref cvar) = &*thread_count;
let mut start = num.lock().unwrap();
while *start >= 20 {
let current = cvar.wait(start).unwrap();
start = current;
}
*start += 1;
println!("hello");
cvar.notify_one();
});
i += 1;
}
}
See also some article on the topic:
https://medium.com/#polyglot_factotum/rust-concurrency-five-easy-pieces-871f1c62906a
https://medium.com/#polyglot_factotum/rust-concurrency-patterns-condvars-and-locks-e278f18db74f

Resources