Why doesn't Rayon require Arc<_>? - rust

On page 465 of Programming Rust you can find the code and explanation (emphasis added by me)
use std::sync::Arc;
fn process_files_in_parallel(filenames: Vec<String>,
glossary: Arc<GigabyteMap>)
-> io::Result<()>
{
...
for worklist in worklists {
// This call to .clone() only clones the Arc and bumps the
// reference count. It does not clone the GigabyteMap.
let glossary_for_child = glossary.clone();
thread_handles.push(
spawn(move || process_files(worklist, &glossary_for_child))
);
}
...
}
We have changed the type of glossary: to run the analysis in parallel, the caller must pass in an Arc<GigabyteMap>, a smart pointer to a GigabyteMap that’s been moved into the heap, by doing Arc::new(giga_map). When we call glossary.clone(), we are making a copy of the Arc smart pointer, not the whole GigabyteMap. This amounts to incrementing a reference count. With this change, the program compiles and runs, because it no longer depends on reference lifetimes. As long as any thread owns an Arc<GigabyteMap>, it will keep the map alive, even if the parent thread bails out early. There won’t be any data races, because data in an Arc is immutable.
In the next section they show this rewritten with Rayon,
extern crate rayon;
use rayon::prelude::*;
fn process_files_in_parallel(filenames: Vec<String>, glossary: &GigabyteMap)
-> io::Result<()>
{
filenames.par_iter()
.map(|filename| process_file(filename, glossary))
.reduce_with(|r1, r2| {
if r1.is_err() { r1 } else { r2 }
})
.unwrap_or(Ok(()))
}
You can see in the section rewritten to use Rayon that it accepts &GigabyteMap rather than Arc<GigabyteMap>. They don't explain how this works though. Why doesn't Rayon require Arc<GigabyteMap>? How does Rayon get away with accepting a direct reference?

Rayon can guarantee that the iterator does not outlive the current stack frame, unlike what I assume is thread::spawn in the first code example. Specifically, par_iter under the hood uses something like Rayon's scope function, which allows one to spawn a unit of work that's "attached" to the stack and will join before the stack ends.
Because Rayon can guarantee (via lifetime bounds, from the user's perspective) that the tasks/threads are joined before the function calling par_iter exits, it can provide this API which is more ergonomic to use than the standard library's thread::spawn.
Rayon expands on this in the scope function's documentation.

Related

Can the borrow checker know when an Arc is "released"? Can a 'static lifetime granted temporarily?

I'm trying to speed up a computationally-heavy Rust function by making it concurrent using only the built-in thread support. In particular, I want to alternate between quick single-threaded phases (where the main thread has mutable access to a big structure) and concurrent phases (where many worker threads run with read-only access to the structure). I don't want to make extra copies of the structure or force it to be 'static. Where I'm having trouble is convincing the borrow checker that the worker threads have finished.
Ignoring the borrow checker, an Arc reference seems like does all that is needed. The reference count in the Arc increases with the .clone() for each worker, then decreases as the workers conclude and I join all the worker threads. If (and only if) the Arc reference count is 1, it should be safe for the main thread to resume. The borrow checker, however, doesn't seem to know about Arc reference counts, and insists that my structure needs to be 'static.
Here's some sample code which works fine if I don't use threads, but won't compile if I switch the comments to enable the multi-threaded case.
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let ref_bigstruct = &mut BigStruct { data: Vec::new() };
for i in 0..3 {
ref_bigstruct.data.push(i); // Phase where main thread has write access
run_threads(ref_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &BigStruct) {
let arc_bigstruct = Arc::new(ref_bigstruct);
{
let arc_clone_for_worker = arc_bigstruct.clone();
// SINGLE-THREADED WORKS:
worker_thread(arc_clone_for_worker);
// MULTI-THREADED DOES NOT COMPILE:
// let handle = thread::spawn(move || { worker_thread(arc_clone_for_worker); } );
// handle.join();
}
assert!(Arc::strong_count(&arc_bigstruct) == 1);
println!("??? How can I tell the borrow checker that all borrows of ref_bigstruct are done?")
}
fn worker_thread(my_struct: Arc<&BigStruct>) {
println!(" worker says len()={}", my_struct.data.len());
}
I'm still learning about Rust lifetimes, but what I think (fear?) what I need is an operation that will take an ordinary (not 'static) reference to my structure and give me an Arc that I can clone into immutable references with a 'static lifetime for use by the workers. Once all the the worker Arc references are dropped, the borrow checker needs to allow my thread-spawning function to return. For safety, I assume this would panic if the the reference count is >1. While this seems like it would generally confirm with Rust's safety requirements, I don't see how to do it.
The underlying problem is not the borrowing checker not following Arc and the solution is not to use Arc. The problem is the borrow checker being unable to understand that the reason a thread must be 'static is because it may outlive the spawning thread, and thus if I immediately .join() it it is fine.
And the solution is to use scoped threads, that is, threads that allow you to use non-'static data because they always immediately .join(), and thus the spawned thread cannot outlive the spawning thread. Problem is, there are no worker threads on the standard library. Well, there are, however they're unstable.
So if you insist on not using crates, for some reason, you have no choice but to use unsafe code (don't, really). But if you can use external crates, then you can use the well-known crossbeam crate with its crossbeam::scope function, at least til std's scoped threads are stabilized.
In Rust Arc< T>, T is per definition immutable. Which means in order to use Arc, to make threads access data that is going to change, you also need it to wrap in some type that is interiorly mutable.
Rust provides a type that is especially suited for a single write or multiple read accesses in parallel, called RwLock.
So for your simple example, this would propably look something like this
use std::{sync::{Arc, RwLock}, thread};
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let arc_bigstruct = Arc::new(RwLock::new(BigStruct { data: Vec::new() }));
for i in 0..3 {
arc_bigstruct.write().unwrap().data.push(i); // Phase where main thread has write access
run_threads(&arc_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &Arc<RwLock<BigStruct>>) {
{
let arc_clone_for_worker = ref_bigstruct.clone();
//MULTI-THREADED
let handle = thread::spawn(move || { worker_thread(&arc_clone_for_worker); } );
handle.join().unwrap();
}
assert!(Arc::strong_count(&ref_bigstruct) == 1);
}
fn worker_thread(my_struct: &Arc<RwLock<BigStruct>>) {
println!(" worker says len()={}", my_struct.read().unwrap().data.len());
}
Which outputs
worker says len()=1
worker says len()=2
worker says len()=3
As for your question, the borrow checker does not know when an Arc is released, as far as I know. The references are counted at runtime.

`RefCell<std::string::String>` cannot be shared between threads safely?

This is a continuation of How to re-use a value from the outer scope inside a closure in Rust? , opened new Q for better presentation.
// main.rs
// The value will be modified eventually inside `main`
// and a http request should respond with whatever "current" value it holds.
let mut test_for_closure :Arc<RefCell<String>> = Arc::new(RefCell::from("Foo".to_string()));
// ...
// Handler for HTTP requests
// From https://docs.rs/hyper/0.14.8/hyper/service/fn.service_fn.html
let make_svc = make_service_fn(|_conn| async {
Ok::<_, Infallible>(service_fn(|req: Request<Body>| async move {
if req.version() == Version::HTTP_11 {
let foo:String = *test_for_closure.borrow();
Ok(Response::new(Body::from(foo.as_str())))
} else {
Err("not HTTP/1.1, abort connection")
}
}))
});
Unfortunately, I get RefCell<std::string::String> cannot be shared between threads safely:
RefCell only works on single threads. You will need to use Mutex which is similar but works on multiple threads. You can read more about Mutex here: https://doc.rust-lang.org/std/sync/struct.Mutex.html.
Here is an example of moving an Arc<Mutex<>> into a closure:
use std::sync::{Arc, Mutex};
fn main() {
let mut test: Arc<Mutex<String>> = Arc::new(Mutex::from("Foo".to_string()));
let mut test_for_closure = Arc::clone(&test);
let closure = || async move {
// lock it so it cant be used in other threads
let foo = test_for_closure.lock().unwrap();
println!("{}", foo);
};
}
The first error in your error message is that Sync is not implemented for RefCell<String>. This is by design, as stated by Sync's rustdoc:
Types that are not Sync are those that have “interior mutability” in a
non-thread-safe form, such as Cell and RefCell. These types allow for
mutation of their contents even through an immutable, shared
reference. For example the set method on Cell takes &self, so it
requires only a shared reference &Cell. The method performs no
synchronization, thus Cell cannot be Sync.
Thus it's not safe to share RefCells between threads, because you can cause a data race through a regular, shared reference.
But what if you wrap it in Arc ? Well, the rustdoc is quite clear again:
Arc will implement Send and Sync as long as the T implements Send
and Sync. Why can’t you put a non-thread-safe type T in an Arc to
make it thread-safe? This may be a bit counter-intuitive at first:
after all, isn’t the point of Arc thread safety? The key is this:
Arc makes it thread safe to have multiple ownership of the same
data, but it doesn’t add thread safety to its data. Consider
Arc<RefCell>. RefCell isn’t Sync, and if Arc was always Send,
Arc<RefCell> would be as well. But then we’d have a problem:
RefCell is not thread safe; it keeps track of the borrowing count
using non-atomic operations.
In the end, this means that you may need to pair Arc with some sort
of std::sync type, usually Mutex.
Arc<T> will not be Sync unless T is Sync because of the same reason. Given that, probably you should use std/tokio Mutex instead of RefCell

Is it safe to `Send` struct containing `Rc` if strong_count is 1 and weak_count is 0?

I have a struct that is not Send because it contains Rc. Lets say that Arc has too big overhead, so I want to keep using Rc. I would still like to occasionally Send this struct between threads, but only when I can verify that the Rc has strong_count 1 and weak_count 0.
Here is (hopefully safe) abstraction that I have in mind:
mod my_struct {
use std::rc::Rc;
#[derive(Debug)]
pub struct MyStruct {
reference_counted: Rc<String>,
// more fields...
}
impl MyStruct {
pub fn new() -> Self {
MyStruct {
reference_counted: Rc::new("test".to_string())
}
}
pub fn pack_for_sending(self) -> Result<Sendable, Self> {
if Rc::strong_count(&self.reference_counted) == 1 &&
Rc::weak_count(&self.reference_counted) == 0
{
Ok(Sendable(self))
} else {
Err(self)
}
}
// There are more methods, some may clone the `Rc`!
}
/// `Send`able wrapper for `MyStruct` that does not allow you to access it,
/// only unpack it.
pub struct Sendable(MyStruct);
// Safety: `MyStruct` is not `Send` because of `Rc`. `Sendable` can be
// only created when the `Rc` has strong count 1 and weak count 0.
unsafe impl Send for Sendable {}
impl Sendable {
/// Retrieve the inner `MyStruct`, making it not-sendable again.
pub fn unpack(self) -> MyStruct {
self.0
}
}
}
use crate::my_struct::MyStruct;
fn main() {
let handle = std::thread::spawn(|| {
let my_struct = MyStruct::new();
dbg!(&my_struct);
// Do something with `my_struct`, but at the end the inner `Rc` should
// not be shared with anybody.
my_struct.pack_for_sending().expect("Some Rc was still shared!")
});
let my_struct = handle.join().unwrap().unpack();
dbg!(&my_struct);
}
I did a demo on the Rust playground.
It works. My question is, is it actually safe?
I know that the Rc is owned only by a single onwer and nobody can change that under my hands, because it can't be accessed by other threads and we wrap it into Sendable which does not allow access to the contained value.
But in some crazy world Rc could for example internally use thread local storage and this would not be safe... So is there some guarantee that I can do this?
I know that I must be extremely careful to not introduce some additional reason for the MyStruct to not be Send.
No.
There are multiple points that need to be verified to be able to send Rc across threads:
There can be no other handle (Rc or Weak) sharing ownership.
The content of Rc must be Send.
The implementation of Rc must use a thread-safe strategy.
Let's review them in order!
Guaranteeing the absence of aliasing
While your algorithm -- checking the counts yourself -- works for now, it would be better to simply ask Rc whether it is aliased or not.
fn is_aliased<T>(t: &mut Rc<T>) -> bool { Rc::get_mut(t).is_some() }
The implementation of get_mut will be adjusted should the implementation of Rc change in ways you have not foreseen.
Sendable content
While your implementation of MyStruct currently puts String (which is Send) into Rc, it could tomorrow change to Rc<str>, and then all bets are off.
Therefore, the sendable check needs to be implemented at the Rc level itself, otherwise you need to audit any change to whatever Rc holds.
fn sendable<T: Send>(mut t: Rc<T>) -> Result<Rc<T>, ...> {
if !is_aliased(&mut t) {
Ok(t)
} else {
...
}
}
Thread-safe Rc internals
And that... cannot be guaranteed.
Since Rc is not Send, its implementation can be optimized in a variety of ways:
The entire memory could be allocated using a thread-local arena.
The counters could be allocated using a thread-local arena, separately, so as to seamlessly convert to/from Box.
...
This is not the case at the moment, AFAIK, however the API allows it, so the next release could definitely take advantage of this.
What should you do?
You could make pack_for_sending unsafe, and dutifully document all assumptions that are counted on -- I suggest using get_mut to remove one of them. Then, on each new release of Rust, you'd have to double-check each assumption to ensure that your usage if still safe.
Or, if you do not mind making an allocation, you could write a conversion to Arc<T> yourself (see Playground):
fn into_arc<T>(this: Rc<T>) -> Result<Arc<T>, Rc<T>> {
Rc::try_unwrap(this).map(|t| Arc::new(t))
}
Or, you could write a RFC proposing a Rc <-> Arc conversion!
The API would be:
fn Rc<T: Send>::into_arc(this: Self) -> Result<Arc<T>, Rc<T>>
fn Arc<T>::into_rc(this: Self) -> Result<Rc<T>, Arc<T>>
This could be made very efficiently inside std, and could be of use to others.
Then, you'd convert from MyStruct to MySendableStruct, just moving the fields and converting Rc to Arc as you go, send to another thread, then convert back to MyStruct.
And you would not need any unsafe...
The only difference between Arc and Rc is that Arc uses atomic counters. The counters are only accessed when the pointer is cloned or dropped, so the difference between the two is negligible in applications which just share pointers between long-lived threads.
If you have never cloned the Rc, it is safe to send between threads. However, if you can guarantee that the pointer is unique then you can make the same guarantee about a raw value, without using a smart pointer at all!
This all seems quite fragile, for little benefit; future changes to the code might not meet your assumptions, and you will end up with Undefined Behaviour. I suggest that you at least try making some benchmarks with Arc. Only consider approaches like this when you measure a performance problem.
You might also consider using the archery crate, which provides a reference-counted pointer that abstracts over atomicity.

How to return a Rust closure to JavaScript via WebAssembly?

The comments on closure.rs are pretty great, however I can't make it work for returning a closure from a WebAssembly library.
I have a function like this:
#[wasm_bindgen]
pub fn start_game(
start_time: f64,
screen_width: f32,
screen_height: f32,
on_render: &js_sys::Function,
on_collision: &js_sys::Function,
) -> ClosureTypeHere {
// ...
}
Inside that function I make a closure, assuming Closure::wrap is one piece of the puzzle, and copying from closure.rs):
let cb = Closure::wrap(Box::new(move |time| time * 4.2) as Box<FnMut(f64) -> f64>);
How do I return this callback from start_game and what should ClosureTypeHere be?
The idea is that start_game will create local mutable objects - like a camera, and the JavaScript side should be able to call the function Rust returns in order to update that camera.
This is a good question, and one that has some nuance too! It's worth calling out the closures example in the wasm-bindgen guide (and the section about passing closures to JavaScript) as well, and it'd be good to contribute back to that as well if necessary!
To get you started, though, you can do something like this:
use wasm_bindgen::{Closure, JsValue};
#[wasm_bindgen]
pub fn start_game(
start_time: f64,
screen_width: f32,
screen_height: f32,
on_render: &js_sys::Function,
on_collision: &js_sys::Function,
) -> JsValue {
let cb = Closure::wrap(Box::new(move |time| {
time * 4.2
}) as Box<FnMut(f64) -> f64>);
// Extract the `JsValue` from this `Closure`, the handle
// on a JS function representing the closure
let ret = cb.as_ref().clone();
// Once `cb` is dropped it'll "neuter" the closure and
// cause invocations to throw a JS exception. Memory
// management here will come later, so just leak it
// for now.
cb.forget();
return ret;
}
Above the return value is just a plain-old JS object (here as a JsValue) and we create that with the Closure type you've seen already. This will allow you to quickly return a closure to JS and you'll be able to call it from JS as well.
You've also asked about storing mutable objects and such, and that can all be done through normal Rust closures, capturing, etc. For example the declaration of FnMut(f64) -> f64 above is the signature of the JS function, and that can be any set of types such as FnMut(String, MyCustomWasmBindgenType, f64) ->
Vec<u8> if you really want. For capturing local objects you can do:
let mut camera = Camera::new();
let mut state = State::new();
let cb = Closure::wrap(Box::new(move |arg1, arg2| { // note the `move`
if arg1 {
camera.update(&arg2);
} else {
state.update(&arg2);
}
}) as Box<_>);
(or something like that)
Here the camera and state variables will be owned by the closure and dropped at the same time. More info about just closures can be found in the Rust book.
It's also worth briefly covering the memory management aspect here. In the
example above we're calling forget() which leaks memory and can be a problem if the Rust function is called many times (as it would leak a lot of memory). The fundamental problem here is that there's memory allocated on the WASM heap which the created JS function object references. This allocated memory in theory needs to be deallocated whenever the JS function object is GC'd, but we have no way of knowing when that happens (until WeakRef exists!).
In the meantime we've chosen an alternate strategy. The associated memory is
deallocated whenever the Closure type itself is dropped, providing
deterministic destruction. This, however, makes it difficult to work with as we need to figure out manually when to drop the Closure. If forget doesn't work for your use case, some ideas for dropping the Closure are:
First, if it's a JS closure only invoked once, then you can use Rc/RefCell
to drop the Closure inside the the closure itself (using some interior
mutability shenanigans). We should also eventually
provide native support
for FnOnce in wasm-bindgen as well!
Next, you can return an auxiliary JS object to Rust which has a manual free
method. For example a #[wasm_bindgen]-annotated wrapper. This wrapper would
then need to be manually freed in JS when appropriate.
If you can get by, forget is by far the easiest thing to do for
now, but this is definitely a pain point! We can't wait for WeakRef to exist :)
As far as I understand from documentation, it isn't supposed to export Rust closures, they only might be passed over as parameters to imported JS functions, but all this happens in Rust code.
https://rustwasm.github.io/wasm-bindgen/reference/passing-rust-closures-to-js.html#passing-rust-closures-to-imported-javascript-functions
I made couple of experiments, and when a Rust function returns the mentioned 'Closure' type, compiler throws exception: the trait wasm_bindgen::convert::IntoWasmAbi is not implemented for wasm_bindgen::prelude::Closure<(dyn std::ops::FnMut() -> u32 + 'static)>
In all examples, closures are wrapped into an arbitrary sctuct, but after that you already can't call this on JS side.

How can I mutably share an i32 between threads?

I'm new to Rust and threading and I'm trying to print out a number while adding to it in another thread. How can I accomplish this?
use std::thread;
use std::time::Duration;
fn main() {
let mut num = 5;
thread::spawn(move || {
loop {
num += 1;
thread::sleep(Duration::from_secs(10));
}
});
output(num);
}
fn output(num: i32) {
loop {
println!("{:?}", num);
thread::sleep(Duration::from_secs(5));
}
}
The above code doesn't work: it always prints 5 as if the number is never incremented.
Please read the "Shared-State Concurrency" chapter of The Rust Book, it explains how to do this in detail.
In short:
Your program does not work because num is copied, so output() and the thread operate on different copies of the number. The Rust compiler will fail to compile with an error if num is not copyable.
Since you need to share the same variable between multiple threads, you need to wrap it in an Arc (atomic reference-counted variable)
Since you need to modify the variable inside the Arc, you need to put it in a Mutex or RwLock. You use the .lock() method to obtain a mutable reference out of a Mutex. The method will ensure exclusive access across the whole process during the lifetime of that mutable reference.
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
let num = Arc::new(Mutex::new(5));
// allow `num` to be shared across threads (Arc) and modified
// (Mutex) safely without a data race.
let num_clone = num.clone();
// create a cloned reference before moving `num` into the thread.
thread::spawn(move || {
loop {
*num.lock().unwrap() += 1;
// modify the number.
thread::sleep(Duration::from_secs(10));
}
});
output(num_clone);
}
fn output(num: Arc<Mutex<i32>>) {
loop {
println!("{:?}", *num.lock().unwrap());
// read the number.
// - lock(): obtains a mutable reference; may fail,
// thus return a Result
// - unwrap(): ignore the error and get the real
// reference / cause panic on error.
thread::sleep(Duration::from_secs(5));
}
}
You may also want to read:
Why does Rust have mutexes and other sychronization primitives, if sharing of mutable state between tasks is not allowed?
What happens when an Arc is cloned?
How do I share a mutable object between threads using Arc? (for why we need Arc<Mutex<i32>> instead of Arc<i32>)
When would you use a Mutex without an Arc? (for why we need Arc<Mutex<i32>> instead of Mutex<i32>)
The other answer solves the problem for any type, but as pnkfelix observes, atomic wrapper types are another solution that will work for the specific case of i32.
Since Rust 1.0, you can use AtomicBool, AtomicPtr<T>, AtomicIsize and AtomicUsize to synchronize multi-threaded access to bool, *mut T, isize and usize values. In Rust 1.34, several new Atomic types have been stabilized, including AtomicI32. (Check the std::sync::atomic documentation for the current list.)
Using an atomic type is most likely more efficient than locking a Mutex or RwLock, but requires more attention to the low-level details of memory ordering. If your threads share more data than can fit in one of the standard atomic types, you probably want a Mutex instead of multiple Atomics.
That said, here's a version of kennytm's answer using AtomicI32 instead of Mutex<i32>:
use std::sync::{
atomic::{AtomicI32, Ordering},
Arc,
};
use std::thread;
use std::time::Duration;
fn main() {
let num = Arc::new(AtomicI32::new(5));
let num_clone = num.clone();
thread::spawn(move || loop {
num.fetch_add(1, Ordering::SeqCst);
thread::sleep(Duration::from_secs(10));
});
output(num_clone);
}
fn output(num: Arc<AtomicI32>) {
loop {
println!("{:?}", num.load(Ordering::SeqCst));
thread::sleep(Duration::from_secs(5));
}
}
Arc is still required for shared ownership (but see How can I pass a reference to a stack variable to a thread?).
Choosing the right memory Ordering is far from trivial. SeqCst is the most conservative choice, but if there is only one memory address being shared, Relaxed should also work. See the links below for more information.
Links
std::sync::atomic module documentation
Atomics (chapter of The Rustonomicon)
LLVM Memory Model for Concurrent Operations and Atomic Instructions and Concurrency Guide

Resources