Is there any way to keep threads spawned in Rust - i.e., to "memoize" a thread object? - multithreading

On my runtime, I have the following Rust code:
pub fn reduce(heap: &Heap, prog: &Program, tids: &[usize], root: u64, debug: bool) -> Ptr {
// Halting flag
let stop = &AtomicBool::new(false);
let barr = &Barrier::new(tids.len());
let locs = &tids.iter().map(|x| AtomicU64::new(u64::MAX)).collect::<Vec<AtomicU64>>();
// Spawn a thread for each worker
std::thread::scope(|s| {
for tid in tids {
s.spawn(move || {
reducer(heap, prog, tids, stop, barr, locs, root, *tid, debug);
});
}
});
// Return whnf term ptr
return load_ptr(heap, root);
}
This will spawn many threads, in order to perform a parallel computation. The problem is, the reduce function is called thousands of times, and the overhead of spawning threads can be considerable. When I implemented the same thing in C, I just kept the threads open, and sent a message in order to activate them. In Rust, with the std::thread::scope idiom, I'm not sure how to do so. Is it possible to keep the threads spawned after the first call to reduce, by just modifying that one function? That is, without changing anything else on my code?

Threads spawned using the threads::scoped api won't be able to outlive the the calling function. For long-running threads you'll need to spawn them using std::thread::spawn.
Once you've made that change rustc will be very upset with you due to lifetime errors because you are sending non-static references into the spawned threads. Fixing those errors will require some changes. That path is long and full of learning.
If you want something that works really well and is simple, consider instead using the excellent rayon crate. Rayon maintains it's own global thread pool so you don't have to.
Using rayon will look something like this:
tids.par_iter()
.for_each(|tid| {
reducer(heap, prog, tids, stop, barr, locs, root, *tid, debug);
});

Related

Rust: Safe multi threading with recursion

I'm new to Rust.
For learning purposes, I'm writing a simple program to search for files in Linux, and it uses a recursive function:
fn ffinder(base_dir:String, prmtr:&'static str, e:bool, h:bool) -> std::io::Result<()>{
let mut handle_vec = vec![];
let pth = std::fs::read_dir(&base_dir)?;
for p in pth {
let p2 = p?.path().clone();
if p2.is_dir() {
if !h{ //search doesn't include hidden directories
let sstring:String = get_fname(p2.display().to_string());
let slice:String = sstring[..1].to_string();
if slice != ".".to_string() {
let handle = thread::spawn(move || {
ffinder(p2.display().to_string(),prmtr,e,h);
});
handle_vec.push(handle);
}
}
else {//search include hidden directories
let handle2 = thread::spawn(move || {
ffinder(p2.display().to_string(),prmtr,e,h);
});
handle_vec.push(handle2);
}
}
else {
let handle3 = thread::spawn(move || {
if compare(rmv_underline(get_fname(p2.display().to_string())),rmv_underline(prmtr.to_string()),e){
println!("File found at: {}",p2.display().to_string().blue());
}
});
handle_vec.push(handle3);
}
}
for h in handle_vec{
h.join().unwrap();
}
Ok(())
}
I've tried to use multi threading (thread::spawn), however it can create too many threads, exploding the OS limit, which breaks the program execution.
Is there a way to multi thread with recursion, using a safe,limited (fixed) amount of threads?
As one of the commenters mentioned, this is an absolutely perfect case for using Rayon. The blog post mentioned doesn't show how Rayon might be used in recursion, only making an allusion to crossbeam's scoped threads with a broken link. However, Rayon provides its own scoped threads implementation that solves your problem as well in that only uses as many threads as you have cores available, avoiding the error you ran into.
Here's the documentation for it:
https://docs.rs/rayon/1.0.1/rayon/fn.scope.html
Here's an example from some code I recently wrote. Basically what it does is recursively scan a folder, and each time it nests into a folder it creates a new job to scan that folder while the current thread continues. In my own tests it vastly outperforms a single threaded approach.
let source = PathBuf::from("/foo/bar/");
let (tx, rx) = mpsc::channel();
rayon::scope(|s| scan(&source, tx, s));
fn scan<'a, U: AsRef<Path>>(
src: &U,
tx: Sender<(Result<DirEntry, std::io::Error>, u64)>,
scope: &Scope<'a>,
) {
let dir = fs::read_dir(src).unwrap();
dir.into_iter().for_each(|entry| {
let info = entry.as_ref().unwrap();
let path = info.path();
if path.is_dir() {
let tx = tx.clone();
scope.spawn(move |s| scan(&path, tx, s)) // Recursive call here
} else {
// dbg!("{}", path.as_os_str().to_string_lossy());
let size = info.metadata().unwrap().len();
tx.send((entry, size)).unwrap();
}
});
}
I'm not an expert on Rayon, but I'm fairly certain the threading strategy works like this:
Rayon creates a pool of threads to match the number of logical cores you have available in your environment. The first call to the scoped function creates a job that the first available thread "steals" from the queue of jobs available. Each time we make another recursive call, it doesn't necessarily execute immediately, but it creates a new job that an idle thread can then "steal" from the queue. If all of the threads are busy, the job queue just fills up each time we make another recursive call, and each time a thread finishes its current job it steals another job from the queue.
The full code can be found here: https://github.com/1Dragoon/fcp
(Note that repo is a work in progress and the code there is currently typically broken and probably won't work at the time you're reading this.)
As an exercise to the reader, I'm more of a sys admin than an actual developer, so I also don't know if this is the ideal approach. From Rayon's documentation linked earlier:
scope() is a more flexible building block compared to join(), since a loop can be used to spawn any number of tasks without recursing
The language of that is a bit confusing. I'm not sure what they mean by "without recursing". Join seems to intend for you to already have tasks known about ahead of time and to execute them in parallel as threads become available, whereas scope seems to be more aimed at only creating jobs when you need them rather than having to know everything you need to do in advance. Either that or I'm understanding their meaning backwards.

How can I release a std::io::StdinLock externally? [duplicate]

Coming from Java, I am used to idioms along the lines of
while (true) {
try {
someBlockingOperation();
} catch (InterruptedException e) {
Thread.currentThread.interrupt(); // re-set the interrupted flag
cleanup(); // whatever is necessary
break;
}
}
This works, as far as I know, across the whole JDK for anything that might block, like reading from files, from sockets, from a queue and even for Thread.sleep().
Reading on how this is done in Rust, I find lots of seemingly special solutions mentioned like mio, tokio. I also find ErrorKind::Interrupted and tried to get this ErrorKind with sending SIGINT to the thread, but the thread seems to die immediately without leaving any (back)trace.
Here is the code I used (note: not very well versed in Rust yet, so it might look a bit strange, but it runs):
use std::io;
use std::io::Read;
use std::thread;
pub fn main() {
let sub_thread = thread::spawn(|| {
let mut buffer = [0; 10];
loop {
let d = io::stdin().read(&mut buffer);
println!("{:?}", d);
let n = d.unwrap();
if n == 0 {
break;
}
println!("-> {:?}", &buffer[0..n]);
}
});
sub_thread.join().unwrap();
}
By "blocking operations", I mean:
sleep
socket IO
file IO
queue IO (not sure yet where the queues are in Rust)
What would be the respective means to signal to a thread, like Thread.interrupt() in Java, that its time to pack up and go home?
There is no such thing. Blocking means blocking.
Instead, you deliberately use tools that are non-blocking. That's where libraries like mio, Tokio, or futures come in — they handle the architecture of sticking all of these non-blocking, asynchronous pieces together.
catch (InterruptedException e)
Rust doesn't have exceptions. If you expect to handle a failure case, that's better represented with a Result.
Thread.interrupt()
This doesn't actually do anything beyond setting a flag in the thread that some code may check and then throw an exception for. You could build the same structure yourself. One simple implementation:
use std::{
sync::{
atomic::{AtomicBool, Ordering},
Arc,
},
thread,
time::Duration,
};
fn main() {
let please_stop = Arc::new(AtomicBool::new(false));
let t = thread::spawn({
let should_i_stop = please_stop.clone();
move || {
while !should_i_stop.load(Ordering::SeqCst) {
thread::sleep(Duration::from_millis(100));
println!("Sleeping");
}
}
});
thread::sleep(Duration::from_secs(1));
please_stop.store(true, Ordering::SeqCst);
t.join().unwrap();
}
Sleep
No way of interrupting, as far as I know. The documentation even says:
On Unix platforms this function will not return early due to a signal
Socket IO
You put the socket into nonblocking mode using methods like set_nonblocking and then handle ErrorKind::WouldBlock.
See also:
Tokio
async-std
File IO
There isn't really a good cross-platform way of performing asynchronous file IO. Most implementations spin up a thread pool and perform blocking operations there, sending the data over something that does non-blocking.
See also:
Tokio
async-std
Queue IO
Perhaps you mean something like a MPSC channel, in which case you'd use tools like try_recv.
See also:
How to terminate or suspend a Rust thread from another thread?
What is the best approach to encapsulate blocking I/O in future-rs?
What does java.lang.Thread.interrupt() do?

What is the standard way to get a Rust thread out of blocking operations?

Coming from Java, I am used to idioms along the lines of
while (true) {
try {
someBlockingOperation();
} catch (InterruptedException e) {
Thread.currentThread.interrupt(); // re-set the interrupted flag
cleanup(); // whatever is necessary
break;
}
}
This works, as far as I know, across the whole JDK for anything that might block, like reading from files, from sockets, from a queue and even for Thread.sleep().
Reading on how this is done in Rust, I find lots of seemingly special solutions mentioned like mio, tokio. I also find ErrorKind::Interrupted and tried to get this ErrorKind with sending SIGINT to the thread, but the thread seems to die immediately without leaving any (back)trace.
Here is the code I used (note: not very well versed in Rust yet, so it might look a bit strange, but it runs):
use std::io;
use std::io::Read;
use std::thread;
pub fn main() {
let sub_thread = thread::spawn(|| {
let mut buffer = [0; 10];
loop {
let d = io::stdin().read(&mut buffer);
println!("{:?}", d);
let n = d.unwrap();
if n == 0 {
break;
}
println!("-> {:?}", &buffer[0..n]);
}
});
sub_thread.join().unwrap();
}
By "blocking operations", I mean:
sleep
socket IO
file IO
queue IO (not sure yet where the queues are in Rust)
What would be the respective means to signal to a thread, like Thread.interrupt() in Java, that its time to pack up and go home?
There is no such thing. Blocking means blocking.
Instead, you deliberately use tools that are non-blocking. That's where libraries like mio, Tokio, or futures come in — they handle the architecture of sticking all of these non-blocking, asynchronous pieces together.
catch (InterruptedException e)
Rust doesn't have exceptions. If you expect to handle a failure case, that's better represented with a Result.
Thread.interrupt()
This doesn't actually do anything beyond setting a flag in the thread that some code may check and then throw an exception for. You could build the same structure yourself. One simple implementation:
use std::{
sync::{
atomic::{AtomicBool, Ordering},
Arc,
},
thread,
time::Duration,
};
fn main() {
let please_stop = Arc::new(AtomicBool::new(false));
let t = thread::spawn({
let should_i_stop = please_stop.clone();
move || {
while !should_i_stop.load(Ordering::SeqCst) {
thread::sleep(Duration::from_millis(100));
println!("Sleeping");
}
}
});
thread::sleep(Duration::from_secs(1));
please_stop.store(true, Ordering::SeqCst);
t.join().unwrap();
}
Sleep
No way of interrupting, as far as I know. The documentation even says:
On Unix platforms this function will not return early due to a signal
Socket IO
You put the socket into nonblocking mode using methods like set_nonblocking and then handle ErrorKind::WouldBlock.
See also:
Tokio
async-std
File IO
There isn't really a good cross-platform way of performing asynchronous file IO. Most implementations spin up a thread pool and perform blocking operations there, sending the data over something that does non-blocking.
See also:
Tokio
async-std
Queue IO
Perhaps you mean something like a MPSC channel, in which case you'd use tools like try_recv.
See also:
How to terminate or suspend a Rust thread from another thread?
What is the best approach to encapsulate blocking I/O in future-rs?
What does java.lang.Thread.interrupt() do?

How can I work around not being able to export functions with lifetimes when using wasm-bindgen?

I'm trying to write a simple game that runs in the browser, and I'm having a hard time modeling a game loop given the combination of restrictions imposed by the browser, rust, and wasm-bindgen.
A typical game loop in the browser follows this general pattern:
function mainLoop() {
update();
draw();
requestAnimationFrame(mainLoop);
}
If I were to model this exact pattern in rust/wasm-bindgen, it would look like this:
let main_loop = Closure::wrap(Box::new(move || {
update();
draw();
window.request_animation_frame(main_loop.as_ref().unchecked_ref()); // Not legal
}) as Box<FnMut()>);
Unlike javascript, I'm unable to reference main_loop from within itself, so this doesn't work.
An alternative approach that someone suggested is to follow the pattern illustrated in the game of life example. At a high-level, it involves exporting a type that contains the game state and includes public tick() and render() functions that can be called from within a javascript game loop. This doesn't work for me because my gamestate requires lifetime parameters, since it effectively just wraps a specs World and Dispatcher struct, the latter of which has lifetime parameters. Ultimately, this means that I can't export it using #[wasm_bindgen].
I'm having a hard time finding ways to work around these restrictions, and am looking for suggestions.
The easiest way to model this is likely to leave invocations of requestAnimationFrame to JS and instead just implement the update/draw logic in Rust.
In Rust, however, what you can also do is to exploit the fact that a closure which doesn't actually capture any variables is zero-size, meaning that Closure<T> of that closure won't allocate memory and you can safely forget it. For example something like this should work:
#[wasm_bindgen]
pub fn main_loop() {
update();
draw();
let window = ...;
let closure = Closure::wrap(Box::new(|| main_loop()) as Box<Fn()>);
window.request_animation_frame(closure.as_ref().unchecked_ref());
closure.forget(); // not actually leaking memory
}
If your state has lifetimes inside of it, that is unfortunately incompatible with returning back to JS because when you return all the way back to the JS event loop then all WebAssembly stack frames have been popped, meaning that any lifetime is invalidated. This means that your game state persisted across iterations of the main_loop will need to be 'static
I'm a Rust novice, but here's how I addressed the same issue.
You can eliminate the problematic window.request_animation_frame recursion and implement an FPS cap at the same time by invoking window.request_animation_frame from a window.set_interval callback which checks a Rc<RefCell<bool>> or something to see if there's an animation frame request still pending. I'm not sure if the inactive tab behavior will be any different in practice.
I put the bool into my application state since I'm using an Rc<RefCell<...>> to that anyway for other event handling. I haven't checked that this below compiles as is, but here's the relevant parts of how I'm doing this:
pub struct MyGame {
...
should_request_render: bool, // Don't request another render until the previous runs, init to false since we'll fire the first one immediately.
}
...
let window = web_sys::window().expect("should have a window in this context");
let application_reference = Rc::new(RefCell::new(MyGame::new()));
let request_animation_frame = { // request_animation_frame is not forgotten! Its ownership is moved into the timer callback.
let application_reference = application_reference.clone();
let request_animation_frame_callback = Closure::wrap(Box::new(move || {
let mut application = application_reference.borrow_mut();
application.should_request_render = true;
application.handle_animation_frame(); // handle_animation_frame being your main loop.
}) as Box<FnMut()>);
let window = window.clone();
move || {
window
.request_animation_frame(
request_animation_frame_callback.as_ref().unchecked_ref(),
)
.unwrap();
}
};
request_animation_frame(); // fire the first request immediately
let timer_closure = Closure::wrap(
Box::new(move || { // move both request_animation_frame and application_reference here.
let mut application = application_reference.borrow_mut();
if application.should_request_render {
application.should_request_render = false;
request_animation_frame();
}
}) as Box<FnMut()>
);
window.set_interval_with_callback_and_timeout_and_arguments_0(
timer_closure.as_ref().unchecked_ref(),
25, // minimum ms per frame
)?;
timer_closure.forget(); // this leaks it, you could store it somewhere or whatever, depends if it's guaranteed to live as long as the page
You can store the result of set_interval and the timer_closure in Options in your game state so that your game can clean itself up if needed for some reason (maybe? I haven't tried this, and it would seem to cause a free of self?). The circular reference won't erase itself unless broken (you're then storing Rcs to the application inside the application effectively). It should also enable you to change the max fps while running, by stopping the interval and creating another using the same closure.

How to freeze a thread and notify it from another?

I need to pause the current thread in Rust and notify it from another thread. In Java I would write:
synchronized(myThread) {
myThread.wait();
}
and from the second thread (to resume main thread):
synchronized(myThread){
myThread.notify();
}
Is is possible to do the same in Rust?
Using a channel that sends type () is probably easiest:
use std::sync::mpsc::channel;
use std::thread;
let (tx,rx) = channel();
// Spawn your worker thread, giving it `send` and whatever else it needs
thread::spawn(move|| {
// Do whatever
tx.send(()).expect("Could not send signal on channel.");
// Continue
});
// Do whatever
rx.recv().expect("Could not receive from channel.");
// Continue working
The () type is because it's effectively zero-information, which means it's pretty clear you're only using it as a signal. The fact that it's size zero means it's also potentially faster in some scenarios (but realistically probably not any faster than a normal machine word write).
If you just need to notify the program that a thread is done, you can grab its join guard and wait for it to join.
let guard = thread::spawn( ... ); // This will automatically join when finished computing
guard.join().expect("Could not join thread");
You can use std::thread::park() and std::thread::Thread::unpark() to achieve this.
In the thread you want to wait,
fn worker_thread() {
std::thread::park();
}
in the controlling thread, which has a thread handle already,
fn main_thread(worker_thread: std::thread::Thread) {
worker_thread.unpark();
}
Note that the parking thread can wake up spuriously, which means the thread can sometimes wake up without the any other threads calling unpark on it. You should prepare for this situation in your code, or use something like std::sync::mpsc::channel that is suggested in the accepted answer.
There are multiple ways to achieve this in Rust.
The underlying model in Java is that each object contains both a mutex and a condition variable, if I remember correctly. So using a mutex and condition variable would work...
... however, I would personally switch to using a channel instead:
the "waiting" thread has the receiving end of the channel, and waits for it
the "notifying" thread has the sending end of the channel, and sends a message
It is easier to manipulate than a condition variable, notably because there is no risk to accidentally use a different mutex when locking the variable.
The std::sync::mpsc has two channels (asynchronous and synchronous) depending on your needs. Here, the asynchronous one matches more closely: std::sync::mpsc::channel.
There is a monitor crate that provides this functionality by combining Mutex with Condvar in a convenience structure.
(Full disclosure: I am the author.)
Briefly, it can be used like this:
let mon = Arc::new(Monitor::new(false));
{
let mon = mon.clone();
let _ = thread::spawn(move || {
thread::sleep(Duration::from_millis(1000));
mon.with_lock(|mut done| { // done is a monitor::MonitorGuard<bool>
*done = true;
done.notify_one();
});
});
}
mon.with_lock(|mut done| {
while !*done {
done.wait();
}
println!("finished waiting");
});
Here, mon.with_lock(...) is semantically equivalent to Java's synchronized(mon) {...}.

Resources