Avoid locking on single threaded async - rust

Let's say I have the following struct:
struct Time {
hour: u8,
minute: u8
}
impl Time {
pub fn set_time(&mut self, hour: u8, minute: u8) {
self.hour = hour;
self.minute = minute;
}
}
On a multithreaded program, having a mutable reference to it shared across multiple threads could cause race conditions, but on a single-threaded one that can't happen (there is no way for the task to yield inside set_time).
Is there any way to avoid having to use locks in such a situation?
Here is an example where two tasks are running on a single thread and could share the mutable reference without a problem:
use tokio::join;
struct Time {
hour: u8,
minute: u8
}
impl Time {
pub fn set_time(&mut self, hour: u8, minute: u8) {
self.hour = hour;
self.minute = minute;
}
}
fn main() {
let mut runtime_builder = tokio::runtime::Builder::new_current_thread();
runtime_builder.enable_time();
let runtime = runtime_builder.build().unwrap();
runtime.block_on(async_main());
}
async fn async_main() {
let mut time = Time {hour: 0, minute: 0};
join!(
task_1(&mut time),
task_2(&mut time) // <- Rust wont allow this
);
}
async fn task_1(time: &mut Time) {
loop {
// Do something
tokio::task::yield_now();
}
}
async fn task_2(time: &mut Time) {
loop {
// Do something
tokio::task::yield_now();
}
}

Mutable reference aliasing is never allowed by the borrow checker. This is one of conditions that the borrow checker enforces to guarantee memory safety at compile time.
This is the reason why the following code doesn't compile. There is 2 mutable references at the same time.
join!(
task_1(&mut time), // <- first mutable reference
task_2(&mut time) // <- second mutable reference ERROR
);
This rule is not specific to multithreaded context.
In the other hand, immutable reference aliasing is allowed by the borrow checker. but the following code would not compile because we can't mutate a field behind a shared reference.
use tokio::join;
struct Time {
hour: u8,
minute: u8
}
impl Time {
pub fn set_time(&self, hour: u8, minute: u8) {
self.hour = hour; // Error
self.minute = minute; // Error
}
}
[...]
async fn async_main() {
let mut time = Time {hour: 0, minute: 0};
join!(
task_1(&time),
task_2(&time)
);
}
[...]
}
To have both aliasing and mutation capabilities, you need to use the interior mutability pattern.
Interior mutability:
"A type has interior mutability if its internal state can be changed through a shared reference to it. This goes against the usual requirement that the value pointed to by a shared reference is not mutated." (see rust reference)
There is several types which implement interior mutability pattern:
Cell
RefCell
Atomics kind
Mutex
RwLock
with single threaded runtime, you can use either RefCell or Cell
With RefCell<T>, The rules still applies but at runtime instead of compile time. With a single threaded runtime, RefCell allows us to mutate shared reference in separate task, since there is no parallelism involved.
use std::sync::Arc;
use std::cell::RefCell;
use tokio::join;
#[derive(Debug)]
struct Time {
hour: RefCell<u8>,
minute: RefCell<u8>,
}
impl Time {
pub fn set_time(&self, hour: u8, minute: u8) {
*self.hour.borrow_mut() = hour;
*self.minute.borrow_mut() = minute;
}
}
async fn task_1(time: &Time) {
time.set_time(11, 54);
println!("Task 1: {:?}", time);
tokio::task::yield_now().await;
}
async fn task_2(time: &Time) {
time.set_time(8, 12);
println!("Task 2: {:?}", time);
tokio::task::yield_now().await;
}
fn main() {
let mut runtime_builder = tokio::runtime::Builder::new_current_thread();
runtime_builder.enable_time();
let runtime = runtime_builder.build().unwrap();
runtime.block_on(async {
let time = Time {
hour: RefCell::new(0),
minute: RefCell::new(0),
};
let _ = join!(task_1(&time), task_2(&time));
});
}
[based on trentcl suggestion]
If you want to avoid the cost of runtime check, you can use Cell<T>. The API is pretty convenient when T is Copy.
use std::cell::Cell;
use tokio::join;
#[derive(Debug)]
struct Time {
hour: Cell<u8>,
minute: Cell<u8>,
}
impl Time {
pub fn set_time(&self, hour: u8, minute: u8) {
self.hour.replace(hour);
self.minute.replace(minute);
}
}
async fn task_1(time: &Time) {
time.set_time(11, 54);
println!("Task 1: {:?}", time);
tokio::task::yield_now().await;
}
async fn task_2(time: &Time) {
time.set_time(8, 12);
println!("Task 2: {:?}", time);
tokio::task::yield_now().await;
}
fn main() {
let mut runtime_builder = tokio::runtime::Builder::new_current_thread();
runtime_builder.enable_time();
let runtime = runtime_builder.build().unwrap();
runtime.block_on(async {
let time = Time {
hour: Cell::new(0),
minute: Cell::new(0),
};
let _ = join!(task_1(&time), task_2(&time));
});
}
With multithreaded runtime, you can use atomics kind, Mutex, RwLock, and channel messaging.
If we take the specific case of AtomicU8, this type provides interior mutability and its store method is lock-free (at least on x86).
by using AtomicU8, we can aliased and mutate our Time struct with no data race, and no blocking.
AtomicU8 is Sync and Send, but we need to satisfy 'static bound for tokio::spawn, so taking a shared ref is not an option. We need to wrap the structure into an Arc.
use std::sync::atomic::{AtomicU8, Ordering};
use tokio::join;
use std::sync::Arc;
#[derive(Debug)]
struct Time {
hour: AtomicU8,
minute: AtomicU8,
}
impl Time {
pub fn set_time(&self, hour: u8, minute: u8) {
self.hour.store(hour, Ordering::SeqCst); // <-- mutation OK
self.minute.store(minute, Ordering::SeqCst); // <-- mutation OK
}
}
async fn task_1(time: Arc<Time>) {
time.set_time(11, 54);
println!("Task 1: {:?}", time);
tokio::task::yield_now().await;
}
async fn task_2(time: Arc<Time>) {
time.set_time(8, 12);
println!("Task 2: {:?}", time);
tokio::task::yield_now().await;
}
fn main() {
let mut runtime_builder = tokio::runtime::Builder::new_multi_thread();
runtime_builder.enable_time();
let runtime = runtime_builder.build().unwrap();
runtime.block_on(async {
let time = Arc::new(Time {
hour: AtomicU8::new(0),
minute: AtomicU8::new(0),
});
let h1 = tokio::spawn(task_1(time.clone()));
let h2 = tokio::spawn(task_2(time));
let _ = join!(
h1, // <-- aliasing Ok
h2 // <-- aliasing Ok
);
});
}

Related

How to idiomatically share data between closures with wasm-bindgen?

In my browser application, two closures access data stored in a Rc<RefCell<T>>. One closure mutably borrows the data, while the other immutably borrows it. The two closures are invoked independently of one another, and this will occasionally result in a BorrowError or BorrowMutError.
Here is my attempt at an MWE, though it uses a future to artificially inflate the likelihood of the error occurring:
use std::cell::RefCell;
use std::future::Future;
use std::pin::Pin;
use std::rc::Rc;
use std::task::{Context, Poll, Waker};
use wasm_bindgen::prelude::*;
use wasm_bindgen::JsValue;
#[wasm_bindgen]
extern "C" {
#[wasm_bindgen(js_namespace = console)]
pub fn log(s: &str);
#[wasm_bindgen(js_name = setTimeout)]
fn set_timeout(closure: &Closure<dyn FnMut()>, millis: u32) -> i32;
#[wasm_bindgen(js_name = setInterval)]
fn set_interval(closure: &Closure<dyn FnMut()>, millis: u32) -> i32;
}
pub struct Counter(u32);
#[wasm_bindgen(start)]
pub async fn main() -> Result<(), JsValue> {
console_error_panic_hook::set_once();
let counter = Rc::new(RefCell::new(Counter(0)));
let counter_clone = counter.clone();
let log_closure = Closure::wrap(Box::new(move || {
let c = counter_clone.borrow();
log(&c.0.to_string());
}) as Box<dyn FnMut()>);
set_interval(&log_closure, 1000);
log_closure.forget();
let counter_clone = counter.clone();
let increment_closure = Closure::wrap(Box::new(move || {
let counter_clone = counter_clone.clone();
wasm_bindgen_futures::spawn_local(async move {
let mut c = counter_clone.borrow_mut();
// In reality this future would be replaced by some other
// time-consuming operation manipulating the borrowed data
SleepFuture::new(5000).await;
c.0 += 1;
});
}) as Box<dyn FnMut()>);
set_timeout(&increment_closure, 3000);
increment_closure.forget();
Ok(())
}
struct SleepSharedState {
waker: Option<Waker>,
completed: bool,
closure: Option<Closure<dyn FnMut()>>,
}
struct SleepFuture {
shared_state: Rc<RefCell<SleepSharedState>>,
}
impl Future for SleepFuture {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let mut shared_state = self.shared_state.borrow_mut();
if shared_state.completed {
Poll::Ready(())
} else {
shared_state.waker = Some(cx.waker().clone());
Poll::Pending
}
}
}
impl SleepFuture {
fn new(duration: u32) -> Self {
let shared_state = Rc::new(RefCell::new(SleepSharedState {
waker: None,
completed: false,
closure: None,
}));
let state_clone = shared_state.clone();
let closure = Closure::wrap(Box::new(move || {
let mut state = state_clone.borrow_mut();
state.completed = true;
if let Some(waker) = state.waker.take() {
waker.wake();
}
}) as Box<dyn FnMut()>);
set_timeout(&closure, duration);
shared_state.borrow_mut().closure = Some(closure);
SleepFuture { shared_state }
}
}
panicked at 'already mutably borrowed: BorrowError'
The error makes sense, but how should I go about resolving it?
My current solution is to have the closures use try_borrow or try_borrow_mut, and if unsuccessful, use setTimeout for an arbitrary amount of time before attempting to borrow again.
Think about this problem independently of Rust's borrow semantics. You have a long-running operation that's updating some shared state.
How would you do it if you were using threads? You would put the shared state behind a lock. RefCell is like a lock except that you can't block on unlocking it — but you can emulate blocking by using some kind of message-passing to wake up the reader.
How would you do it if you were using pure JavaScript? You don't automatically have anything like RefCell, so either:
The state can be safely read while the operation is still ongoing (in a concurrency-not-parallelism sense): in this case, emulate that by not holding a single RefMut (result of borrow_mut()) alive across an await boundary.
The state is not safe to be read: you'd either write something lock-like as described above, or perhaps arrange so that it's only written once when the operation is done, and until then, the long-running operation has its own private state not shared with the rest of the application (so there can be no BorrowError conflicts).
Think about what your application actually needs and pick a suitable solution. Implementing any of these solutions will most likely involve having additional interior-mutable objects used for communication.

How to update in one thread and read from many?

I've failed to get this code past the borrow-checker:
use std::sync::Arc;
use std::thread::{sleep, spawn};
use std::time::Duration;
#[derive(Debug, Clone)]
struct State {
count: u64,
not_copyable: Vec<u8>,
}
fn bar(thread_num: u8, arc_state: Arc<State>) {
let state = arc_state.clone();
loop {
sleep(Duration::from_millis(1000));
println!("thread_num: {}, state.count: {}", thread_num, state.count);
}
}
fn main() -> std::io::Result<()> {
let mut state = State {
count: 0,
not_copyable: vec![],
};
let arc_state = Arc::new(state);
for i in 0..2 {
spawn(move || {
bar(i, arc_state.clone());
});
}
loop {
sleep(Duration::from_millis(300));
state.count += 1;
}
}
I'm probably trying the wrong thing.
I want one (main) thread which can update state and many threads which can read state.
How should I do this in Rust?
I have read the Rust book on shared state, but that uses mutexes which seem overly complex for a single writer / multiple reader situation.
In C I would achieve this with a generous sprinkling of _Atomic.
Atomics are indeed a proper way, there are plenty of those in std (link. Your example needs 2 fixes.
Arc must be cloned before moving into the closure, so your loop becomes:
for i in 0..2 {
let arc_state = arc_state.clone();
spawn(move || { bar(i, arc_state); });
}
Using AtomicU64 is fairly straight forward, though you need explicitly use newtype methods with specified Ordering (Playground):
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::thread::{sleep, spawn};
use std::time::Duration;
#[derive(Debug)]
struct State {
count: AtomicU64,
not_copyable: Vec<u8>,
}
fn bar(thread_num: u8, arc_state: Arc<State>) {
let state = arc_state.clone();
loop {
sleep(Duration::from_millis(1000));
println!(
"thread_num: {}, state.count: {}",
thread_num,
state.count.load(Ordering::Relaxed)
);
}
}
fn main() -> std::io::Result<()> {
let state = State {
count: AtomicU64::new(0),
not_copyable: vec![],
};
let arc_state = Arc::new(state);
for i in 0..2 {
let arc_state = arc_state.clone();
spawn(move || {
bar(i, arc_state);
});
}
loop {
sleep(Duration::from_millis(300));
// you can't use `state` here, because it moved
arc_state.count.fetch_add(1, Ordering::Relaxed);
}
}

Maintaining a mutable reference to struct in HashMap

Is it possible to borrow a mutable reference to the contents of a HashMap and use it for an extended period of time without impeding read-only access?
This is for trying to maintain a window into the state of various components in a system that are running independently (via Tokio) and need to be monitored.
As an example:
use std::sync::Arc;
use std::collections::HashMap;
struct Container {
running : bool,
count : u8
}
impl Container {
fn run(&mut self) {
for i in 1..100 {
self.count = i;
}
self.running = false;
}
}
fn main() {
let mut map = HashMap::new();
let mut container = Arc::new(
Box::new(
Container {
running: true,
count: 0
}
)
);
map.insert(0, container.clone());
container.run();
map.remove(&0);
}
This is for a Tokio-driven program where multiple operations will be happening asynchronously and visibility into the overall state of them is required.
There's this question where a temporary mutable reference can be borrowed, but that won't work as the run() function needs time to complete.
Based on suggestions from Jmb and Stargateur reworked this to use a RwLock internally. These internals could be reworked by having methods that manipulate them, but the basics are here:
use std::sync::Arc;
use std::sync::RwLock;
use std::collections::HashMap;
#[derive(Debug)]
struct ContainerState {
running : bool,
count : u8
}
struct Container {
state : Arc<RwLock<ContainerState>>
}
impl Container {
fn run(&self) {
for i in 1..100 {
let mut state = self.state.write().unwrap();
state.count = i;
}
{
let mut state = self.state.write().unwrap();
state.running = false;
}
}
}
fn main() {
let mut map = HashMap::new();
let state = Arc::new(
RwLock::new(
ContainerState {
running: true,
count: 0
}
)
);
map.insert(0, state);
let container = Container {
state: map[&0].clone()
};
container.run();
println!("Final state: {:?}", map[&0]);
map.remove(&0);
}
Where the key thing I was missing is you can have a mutable reference or multiple immutable references, and they're mutually exclusive. My initial understanding was that these two limits were independent.

How to implement a long running process with progress in Rust, available via a Rest api?

I am a beginner in Rust.
I have a long running IO-bound process that I want to spawn and monitor via a REST API. I chose Iron for that, following this tutorial . Monitoring means getting its progress and its final result.
When I spawn it, I give it an id and map that id to a resource that I can GET to get the progress. I don't have to be exact with the progress; I can report the progress from 5 seconds ago.
My first attempt was to have a channel via which I send request for progress and receive the status. I got stuck where to store the receiver, as in my understanding it belongs to one thread only. I wanted to put it in the context of the request, but that won't work as there are different threads handling subsequent requests.
What would be the idiomatic way to do this in Rust?
I have a sample project.
Later edit:
Here is a self contained example which follows the sample principle as the answer, namely a map where each thread updates its progress:
extern crate iron;
extern crate router;
extern crate rustc_serialize;
use iron::prelude::*;
use iron::status;
use router::Router;
use rustc_serialize::json;
use std::io::Read;
use std::sync::{Mutex, Arc};
use std::thread;
use std::time::Duration;
use std::collections::HashMap;
#[derive(Debug, Clone, RustcEncodable, RustcDecodable)]
pub struct Status {
pub progress: u64,
pub context: String
}
#[derive(RustcEncodable, RustcDecodable)]
struct StartTask {
id: u64
}
fn start_process(status: Arc<Mutex<HashMap<u64, Status>>>, task_id: u64) {
let c = status.clone();
thread::spawn(move || {
for i in 1..100 {
{
let m = &mut c.lock().unwrap();
m.insert(task_id, Status{ progress: i, context: "in progress".to_string()});
}
thread::sleep(Duration::from_secs(1));
}
let m = &mut c.lock().unwrap();
m.insert(task_id, Status{ progress: 100, context: "done".to_string()});
});
}
fn main() {
let status: Arc<Mutex<HashMap<u64, Status>>> = Arc::new(Mutex::new(HashMap::new()));
let status_clone: Arc<Mutex<HashMap<u64, Status>>> = status.clone();
let mut router = Router::new();
router.get("/:taskId", move |r: &mut Request| task_status(r, &status.lock().unwrap()));
router.post("/start", move |r: &mut Request|
start_task(r, status_clone.clone()));
fn task_status(req: &mut Request, statuses: & HashMap<u64,Status>) -> IronResult<Response> {
let ref task_id = req.extensions.get::<Router>().unwrap().find("taskId").unwrap_or("/").parse::<u64>().unwrap();
let payload = json::encode(&statuses.get(&task_id)).unwrap();
Ok(Response::with((status::Ok, payload)))
}
// Receive a message by POST and play it back.
fn start_task(request: &mut Request, statuses: Arc<Mutex<HashMap<u64, Status>>>) -> IronResult<Response> {
let mut payload = String::new();
request.body.read_to_string(&mut payload).unwrap();
let task_start_request: StartTask = json::decode(&payload).unwrap();
start_process(statuses, task_start_request.id);
Ok(Response::with((status::Ok, json::encode(&task_start_request).unwrap())))
}
Iron::new(router).http("localhost:3000").unwrap();
}
One possibility is to use a global HashMap that associate each worker id with the progress (and result). Here is simple example (without the rest stuff)
#[macro_use]
extern crate lazy_static;
use std::sync::Mutex;
use std::collections::HashMap;
use std::thread;
use std::time::Duration;
lazy_static! {
static ref PROGRESS: Mutex<HashMap<usize, usize>> = Mutex::new(HashMap::new());
}
fn set_progress(id: usize, progress: usize) {
// insert replaces the old value if there was one.
PROGRESS.lock().unwrap().insert(id, progress);
}
fn get_progress(id: usize) -> Option<usize> {
PROGRESS.lock().unwrap().get(&id).cloned()
}
fn work(id: usize) {
println!("Creating {}", id);
set_progress(id, 0);
for i in 0..100 {
set_progress(id, i + 1);
// simulates work
thread::sleep(Duration::new(0, 50_000_000));
}
}
fn monitor(id: usize) {
loop {
if let Some(p) = get_progress(id) {
if p == 100 {
println!("Done {}", id);
// to avoid leaks, remove id from PROGRESS.
// maybe save that the task ends in a data base.
return
} else {
println!("Progress {}: {}", id, p);
}
}
thread::sleep(Duration::new(1, 0));
}
}
fn main() {
let w = thread::spawn(|| work(1));
let m = thread::spawn(|| monitor(1));
w.join().unwrap();
m.join().unwrap();
}
You need to register one channel per request thread, because if cloning Receivers were possible the responses might/will end up with the wrong thread if two request are running at the same time.
Instead of having your thread create a channel for answering requests, use a future. A future allows you to have a handle to an object, where the object doesn't exist yet. You can change the input channel to receive a Promise, which you then fulfill, no output channel necessary.

Threaded calling of functions in a vector

I have an EventRegistry which people can use to register event listeners. It then calls the appropriate listeners when an event is broadcast. But, when I try to multithread it, it doesn't compile. How would I get this code working?
use std::collections::HashMap;
use std::thread;
struct EventRegistry<'a> {
event_listeners: HashMap<&'a str, Vec<Box<Fn() + Sync>>>
}
impl<'a> EventRegistry<'a> {
fn new() -> EventRegistry<'a> {
EventRegistry {
event_listeners: HashMap::new()
}
}
fn add_event_listener(&mut self, event: &'a str, listener: Box<Fn() + Sync>) {
match self.event_listeners.get_mut(event) {
Some(listeners) => {
listeners.push(listener);
return
},
None => {}
};
let mut listeners = Vec::with_capacity(1);
listeners.push(listener);
self.event_listeners.insert(event, listeners);
}
fn broadcast_event(&mut self, event: &str) {
match self.event_listeners.get(event) {
Some(listeners) => {
for listener in listeners.iter() {
let _ = thread::spawn(|| {
listener();
});
}
}
None => {}
}
}
}
fn main() {
let mut main_registry = EventRegistry::new();
main_registry.add_event_listener("player_move", Box::new(|| {
println!("Hey, look, the player moved!");
}));
main_registry.broadcast_event("player_move");
}
Playpen (not sure if it's minimal, but it produces the error)
If I use thread::scoped, it works too, but that's unstable, and I think it only works because it immediately joins back to the main thread.
Updated question
I meant "call them in their own thread"
The easiest thing to do is avoid the Fn* traits, if possible. If you know that you are only using full functions, then it's straightforward:
use std::thread;
fn a() { println!("a"); }
fn b() { println!("b"); }
fn main() {
let fns = vec![a as fn(), b as fn()];
for &f in &fns {
thread::spawn(move || f());
}
thread::sleep_ms(500);
}
If you can't use that for some reason (like you want to accept closures), then you will need to be a bit more explicit and use Arc:
use std::thread;
use std::sync::Arc;
fn a() { println!("a"); }
fn b() { println!("b"); }
fn main() {
let fns = vec![
Arc::new(Box::new(a) as Box<Fn() + Send + Sync>),
Arc::new(Box::new(b) as Box<Fn() + Send + Sync>),
];
for f in &fns {
let my_f = f.clone();
thread::spawn(move || my_f());
}
thread::sleep_ms(500);
}
Here, we can create a reference-counted trait object. We can clone the trait object (increasing the reference count) each time we spawn a new thread. Each thread gets its own reference to the trait object.
If I use thread::scoped, it works too
thread::scoped is pretty awesome; it's really unfortunate that it needed to be marked unstable due to some complex interactions that weren't the best.
One of the benefits of a scoped thread is that the thread is guaranteed to end by a specific time: when the JoinGuard is dropped. That means that scoped threads are allowed to contain non-'static references, so long as those references last longer than the thread!
A spawned thread has no such guarantees about how long they live; these threads may live "forever". Any references they take must also live "forever", thus the 'static restriction.
This serves to explain your original problem. You have a vector with a non-'static lifetime, but you are handing references that point into that vector to the thread. If the vector were to be deallocated before the thread exited, you could attempt to access undefined memory, which leads to crashes in C or C++ programs. This is Rust helping you out!
Original question
Call functions in vector without consuming them
The answer is that you just call them:
fn a() { println!("a"); }
fn b() { println!("b"); }
fn main() {
let fns = vec![Box::new(a) as Box<Fn()>, Box::new(b) as Box<Fn()>];
fns[0]();
fns[1]();
fns[0]();
fns[1]();
}
Playpen

Resources