Parallelizing a self modifying loop in Rust

Parallelizing a self modifying loop in Rust - rust

I have a loop in Rust, which basically looks like this:
while let Some(next) = myqueue.pop_front() {
let result = next.activate();
if result.0 {
myqueue.extend(result.1.into_iter());
}
}
I want to paralelize this loop. Naturally, rayon crate came to my mind with the parallel executed for_each loop however the problem is that would require myqueue object to be owned by both the main thread and the child threads since the collection that is being iterated over is being modified by the threads and I do not want to use an unsafe block. Therefore, I got stuck and wanted to help with rayon create over this (or maybe a completely different approach).
Any and all help is welcome.
Thank you!

Related

Reading a vector from multiple threads [duplicate]

This question already has an answer here:
How can I pass a reference to a stack variable to a thread?
(1 answer)
Closed last month.
I have a function that returns a vector of strings, which is read by multiple threads later. How to do this in rust?
fn get_list() -> Vec<String> { ... }
fn read_vec() {
let v = get_list();
for i in 1..10 {
handles.push(thread::spawn (|| { do_work(&v); }));
}
handles.join();
}
I think I need to extend the lifetime of v to static and pass it as a immutable ref to threads. But, I am not sure , how?

The problem you are facing is that the threads spawned by thread::spawn run for an unknown amount of time. You'll need to make sure that your Vec<String> outlives these threads.
You can use atomic reference-counting by creating an Arc<Vec<String>>, and create a clone for each thread. The Vec<String> will be deallocated only when all Arcs are dropped. Docs
You can leak the Vec<String>. I personally like this approach, but only if you need the Vec<String> for the entire runtime of your program. To achieve this, you can turn your Vec<String> into a &'static [String] by using Vec::leak. Docs
You can ensure that your threads will not run after the read_vec function returns - This is what you're essentially doing by calling handles.join(). However, the compiler doesn't see that these threads are joined later, and there might be edge cases where they are not joined (what happens when the 2nd thread::spawn panics?). To make this explicit, use the scope function in std::thread. Docs
Of course, you can also just clone the Vec<String>, and give each thread a unique copy.
TL;DR:
For this particular use-case, I'd recommend std::thread::scope. If the Vec<String> lives for the entire duration of your program, leaking it using Vec::leak is a great and often under-used solution. For more complex scenarios, wrapping the Vec<String> in an Arc is probably the right way to go.

How to use async to parallelize heavy computation?

I would like to perform the following processing in multi-threads using tokio or async-std. I have read tutorials but I haven't seen any mention of parallelizing a for loop. In my program, all threads refer to the same array but will access different locations:
let input_array: Array2<f32>;
let output_array: Array2<f32>;
for i in 0..roop_num {
let res = do_some_func(&input_array, idx);
output_array.slice_mut(s![idx, ...]) .assign(&res);
}
I would like to change the for loop to use parallel processing.

Tokio or async-std deal with concurrency, not parallelism. If you need the data parallelism then rayon is a better choice. If you are using an Iterator, then .chunks() method is good. For more imperative approach you can use .par_chunks_mut().

How to ensure a piece of code is always used by one thread at any given time?

I'm writing a function that needs to use mutable static variables to work (a weird implementation of a message loop). In order to allow only one writer to access those variables at any given time, I need to make the access to this function exclusive (to the first thread that accesses it).
Using AtomicBool
My first guess was to use an AtomicBool:
use std::sync::atomic::{Ordering, AtomicBool};
static FLAG: AtomicBool = AtomicBool::new(false);
fn my_exclusive_function() {
if FLAG.load(Ordering::SeqCst) {
panic!("Haha, too late!")
}
FLAG.store(true, Ordering::SeqCst);
/* Do stuff */
}
This code has an obvious flaw: if two threads happen to read FLAG at the same time, they would both think that this is ok for them to continue.
Using Mutex<()>
Then I thought about using a Mutex<()> for its lock.
use std::sync::Mutex;
static FLAG: Mutex<()> = Mutex::new(());
fn my_exclusive_function() {
let _lock = FLAG.try_lock() {
Ok(lock) => lock,
Err(_) => panic!("Haha, too late!"),
};
/* Do stuff */
// `_lock` gets dropped here, another thread can now access the function
}
There are two problems here:
Mutex::new is not a const function which makes me unable to initialize the Mutex. I could use a library like lazy_static here but if I can avoid an additional dependency, it would be great.
Even if I use lazy_static, a user-defined function (the handler inside of the message loop actually) could panic and poison the mutex. Here again, I could use a thread-party library like parking_lot but that would be yet anther additional dependency.
I'm willing to use those if there is no alternatives but if I can avoid it, that is better.
What would the right way to do this look like? Is it just a matter of Ordering?
Related question that did not help me
How to ensure a portion of code is run by just one thread at a time? Answers are specifically focused on C# and .NET and involve features I don't have in Rust.

Is there an alternative to Arc<> wrapping for long-running threads which share data?

I have some threads, which are long running, they are fed by a Deque, which has data pushed into it by another long running thread. Currently, I'm using std::thread::spawn, and have to wrap the Deque in an Arc<> to share it between the threads. If I use &deque, I run into the classic 'static lifetime issue, hence the Arc<>. I've looked at scoped threads, however, the closure which the threads run it won't return for a very long time, so I don't think that will work for this case. Is anyone aware of an alternative solution -- short of using Unsafe? I'm not satisfied with the Arc<> solution. Each time I touch the Deque the code digs into Arc<>'s inner to get to the Deque, incurring overhead which I'd like to avoid. I've also considered making the Deque static, however it would need to be a lazy static due to the allocation restriction on static, and that comes with its own access overhead.

You can get a &Dequq out of the Arc<Deque> just once at the beginning of your long-running thread and keep using that immutable reference throughout its life. Something like this:
let dq: Arc<Deque<T>> = ....;
....
{
let dq2 = Arc::clone(dq);
thread.spawn(move || {
let dq_ref: &Deque<T> = *dq2;
// long-running calculation using dq_ref
// dq2 is dropped
});
}

How can Rust be told that a thread does not live longer than its caller? [duplicate]

This question already has an answer here:
How can I pass a reference to a stack variable to a thread?
(1 answer)
Closed 5 years ago.
I have the following code:
fn main() {
let message = "Can't shoot yourself in the foot if you ain't got no gun";
let t1 = std::thread::spawn(|| {
println!("{}", message);
});
t1.join();
}
rustc gives me the compilation error:
closure may outlive the current function, but it borrows message, which is owned by the current function
This is wrong since:
The function it's referring to here is (I believe) main. The threads will be killed or enter in UB once main is finished executing.
The function it's referring to clearly invokes .join() on said thread.
Is the previous code unsafe in any way? If so, why? If not, how can I get the compiler to understand that?
Edit: Yes I am aware I can just move the message in this case, my question is specifically asking how can I pass a reference to it (preferably without having to heap allocate it, similarly to how this code would do it:
std::thread([&message]() -> void {/* etc */});
(Just to clarify, what I'm actually trying to do is access a thread safe data structure from two threads... other solutions to the problem that don't involve making the copy work would also help).
Edit2: The question this has been marked as a duplicate of is 5 pages long and as such I'd consider it and invalid question in it's own right.

Is the previous code 'unsafe' in any way ? If so, why ?
The goal of Rust's type-checking and borrow-checking system is to disallow unsafe programs, but that does not mean that all programs that fail to compile are unsafe. In this specific case, your code is not unsafe, but it does not satisfy the type constraints of the functions you are using.
The function it's referring to clearly invokes .join() on said thread.
But there is nothing from a type-checker standpoint that requires the call the .join. A type-checking system (on its own) can't enforce that a function has or has not been called on a given object. You could just as easily imagine an example like
let message = "Can't shoot yourself in the foot if you ain't got no gun";
let mut handles = vec![];
for i in 0..3 {
let t1 = std::thread::spawn(|| {
println!("{} {}", message, i);
});
handles.push(t1);
}
for t1 in handles {
t1.join();
}
where a human can tell that each thread is joined before main exits. But a typechecker has no way to know that.
The function it's referring to here is (I believe) main. So presumably those threads will be killed when main exists anyway (and them running after main exists is ub).
From the standpoint of the checkers, main is just another function. There is no special knowledge that this specific function can have extra behavior. If this were any other function, the thread would not be auto-killed. Expanding on that, even for main there is no guarantee that the child threads will be killed instantly. If it takes 5ms for the child threads to be killed, that is still 5ms where the child threads could be accessing the content of a variable that has gone out of scope.
To gain the behavior that you are looking for with this specific snippet (as-is), the lifetime of the closure would have to be tied to the lifetime of the t1 object, such that the closure was guaranteed to never be used after the handles have been cleaned up. While that is certainly an option, it is significantly less flexible in the general case. Because it would be enforced at the type level, there would be no way to opt out of this behavior.
You could consider using crossbeam, specifically crossbeam::scope's .spawn, which enforces this lifetime requirement where the standard library does not, meaning a thread must stop execution before the scope is finished.
In your specific case, your code works fine as long as you transfer ownership of message to the child thread instead of borrowing it from the main function, because there is no risk of unsafe code with or without your call to .join. Your code works fine if you change
let t1 = std::thread::spawn(|| {
to
let t1 = std::thread::spawn(move || {

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parallelizing a self modifying loop in Rust - rust

Related

Reading a vector from multiple threads [duplicate]

How to use async to parallelize heavy computation?

How to ensure a piece of code is always used by one thread at any given time?

Is there an alternative to Arc<> wrapping for long-running threads which share data?

How can Rust be told that a thread does not live longer than its caller? [duplicate]

Categories

Resources