Mutex<bool> with an atomic read&write

Mutex<bool> with an atomic read&write - rust

I need to have a global boolean flag that will be accessed by multiple threads.
Here is an example of what I need:
static GLOBAL_FLAG: SyncLazy<Mutex<bool>> = SyncLazy::new(|| {
Mutex::new(false)
});
fn set_flag_to_true() { // can be called by 2+ threads concurrently
*GLOBAL_FLAG.lock().unwrap() = true;
}
fn get_flag_and_set_to_true() -> bool { // only one thread is calling this function
let v = *GLOBAL_FLAG.lock().unwrap(); // Obtain current flag value
*GLOBAL_FLAG.lock().unwrap() = true; // Always set the flag to true
v // Return the previous value
}
The get_flag_and_set_to_true() implementation doesn't feel quite right. I imagine it would be best if I only locked once. What's the best way to do that?
BTW I suppose Arc<[AtomicBool]> can also be used and should in theory be faster, although in my particular case the speed benefit will be unnoticeable.

BTW I suppose Arc<[AtomicBool]> can also be used and should in theory be faster, although in my particular case the speed benefit will be unnoticeable.
It's not just about benefit in performance, but also in amount of code and ease of reasoning about the code. With AtomicBool you don't need either SyncLazy or the mutex, and the code is shorter and clearer:
use std::sync::atomic::{AtomicBool, Ordering};
static GLOBAL_FLAG: AtomicBool = AtomicBool::new(false);
pub fn set_flag_to_true() {
GLOBAL_FLAG.store(true, Ordering::SeqCst);
}
pub fn get_flag_and_set_to_true() -> bool {
GLOBAL_FLAG.swap(true, Ordering::SeqCst)
}
Playground

Conceivably, another thread could come in between when you read GLOBAL_FLAG and when you set GLOBAL_FLAG to true. To work around this you can directly store the MutexGuard (docs) that GLOBAL_FLAG.lock().unwrap() returns:
fn get_flag_and_set_to_true() -> bool { // only one thread is calling this function
let mut global_flag = GLOBAL_FLAG.lock().unwrap();
let v = *global_flag; // Obtain current flag value
*global_flag = true; // Always set the flag to true
v // Return the previous value
}
global_flag will keep the mutex locked until it gets dropped.

Related

Is it possible to create a global that is assigned once at the beginning of runtime and then borrowed across threads?

lazy_static doesn't work because I need to assign to this variable at runtime after some user interaction. thread_local doesn't work because I need to read this variable across threads.
From a system perspective, I think what I'm trying to do should be simple. At the beginning of execution I'm single threaded, I initialize some things, and then I tokio::spawn some tasks which only need to read those things.
I can get past the problem by using a mutex, but I don't really see why I should need to use a mutex when I can guarantee that no tasks will ever try to get mutable access other than at the very beginning of runtime when I'm still in a single thread. Is there a better way that using a mutex?
This is what I have so far, in case anyone is curious:
lazy_static! {
pub static ref KEYPAIRSTORE_GLOBAL: Mutex<KeypairStore> = Mutex::new(KeypairStore::new());
}
// ...
// at top of main:
let mut keypairstore = KEYPAIRSTORE_GLOBAL.lock().unwrap();
*keypairstore = KeypairStore::new_from_password();
// somewhere later in a tokio::spawn:
let keypair_store = KEYPAIRSTORE_GLOBAL.lock().unwrap();
let keypair = keypair_store.get_keypair();
println!("{}", keypair.address());
I don't see why I need to use this mutex... I'd be happy to use unsafe during assignment, but I'd rather not have to use it every time I want to read.

As written, you need the Mutex because you are mutating it after it is initialised. Instead, do the mutation during the initialisation:
lazy_static! {
pub static ref KEYPAIRSTORE_GLOBAL: KeypairStore = {
let mut keystore = KeypairStore::new_from_password();
// ... more initialisation here...
keystore
}
}
// somewhere later in a tokio::spawn:
let keypair = KEYPAIRSTORE_GLOBAL.get_keypair();
println!("{}", keypair.address());
This is assuming that the signature of get_keypair is:
pub fn get_keypair(&self) -> Keypair;

Is it possible to share a HashMap between threads without locking the entire HashMap?

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don't want to lock the whole HashMap for a single update/remove, so my HashMap looks something like HashMap<u8, Mutex<u8>>. This works, but it makes no sense since the thread will lock the whole map anyways.
Here's this working version, without threads; I don't think that's necessary for the example.
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
fn main() {
let s = Arc::new(Mutex::new(S::new()));
let z = s.clone();
let _ = z.lock().unwrap();
}
struct S {
x: HashMap<u8, Mutex<u8>>, // other non-mutable fields
}
impl S {
pub fn new() -> S {
S {
x: HashMap::default(),
}
}
}
Playground
Is this possible in any way? Is there something obvious I missed in the documentation?
I've been trying to get this working, but I'm not sure how. Basically every example I see there's always a Mutex (or RwLock, or something like that) guarding the inner value.

I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?
In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.
Here's an example:
use std::{
collections::HashMap,
sync::{Arc, Mutex, RwLock},
thread,
time::Duration,
};
fn main() {
let data = Arc::new(RwLock::new(HashMap::new()));
let threads: Vec<_> = (0..10)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
})
.collect();
for t in threads {
t.join().expect("Thread panicked");
}
println!("{:?}", data);
}
fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
loop {
// Assume that the element already exists
let map = data.read().expect("RwLock poisoned");
if let Some(element) = map.get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
thread::sleep(Duration::from_secs(1));
return;
}
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
drop(map);
let mut map = data.write().expect("RwLock poisoned");
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
thread::sleep(Duration::from_millis(50));
map.entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
}
}
This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.
This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.
In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.
The next step up would be to use a crate like arc-swap, which says:
Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.
The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.
I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.
As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.

Suppose the key of the data is map-able to a u8
You can have Arc<HashMap<u8,Mutex<HashMap<Key,Value>>>
When you initialize the data structure you populate all the first level map before putting it in Arc (it will be immutable after initialization)
When you want a value from the map you will need to do a double get, something like:
data.get(&map_to_u8(&key)).unwrap().lock().expect("poison").get(&key)
where the unwrap is safe because we initialized the first map with all the value.
to write in the map something like:
data.get(&map_to_u8(id)).unwrap().lock().expect("poison").entry(id).or_insert_with(|| value);
It's easy to see contention is reduced because we now have 256 Mutex and the probability of multiple threads asking the same Mutex is low.
#Shepmaster example with 100 threads takes about 10s on my machine, the following example takes a little more than 1 second.
use std::{
collections::HashMap,
sync::{Arc, Mutex, RwLock},
thread,
time::Duration,
};
fn main() {
let mut inner = HashMap::new( );
for i in 0..=u8::max_value() {
inner.insert(i, Mutex::new(HashMap::new()));
}
let data = Arc::new(inner);
let threads: Vec<_> = (0..100)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
})
.collect();
for t in threads {
t.join().expect("Thread panicked");
}
println!("{:?}", data);
}
fn worker_thread(id: u8, data: Arc<HashMap<u8,Mutex<HashMap<u8,Mutex<i32>>>>> ) {
loop {
// first unwrap is safe to unwrap because we populated for every `u8`
if let Some(element) = data.get(&id).unwrap().lock().expect("poison").get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
thread::sleep(Duration::from_secs(1));
return;
}
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
thread::sleep(Duration::from_millis(50));
data.get(&id).unwrap().lock().expect("poison").entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
}
}

Maybe you want to consider evmap:
A lock-free, eventually consistent, concurrent multi-value map.
The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.

What are the performance implications of consuming self and returning it?

I've been reading questions like Why does a function that accepts a Box<MyType> complain of a value being moved when a function that accepts self works?, Preferable pattern for getting around the "moving out of borrowed self" checker, and How to capture self consuming variable in a struct?, and now I'm curious about the performance characteristics of consuming self but possibly returning it to the caller.
To make a simpler example, imagine I want to make a collection type that's guaranteed to be non-empty. To achieve this, the "remove" operation needs to consume the collection and optionally return itself.
struct NonEmptyCollection { ... }
impl NonEmptyCollection {
fn pop(mut self) -> Option<Self> {
if self.len() == 1 {
None
} else {
// really remove the element here
Some(self)
}
}
}
(I suppose it should return the value it removed from the list too, but it's just an example.) Now let's say I call this function:
let mut c = NonEmptyCollection::new(...);
if let Some(new_c) = c.pop() {
c = new_c
} else {
// never use c again
}
What actually happens to the memory of the object? What if I have some code like:
let mut opt: Option<NonEmptyCollection> = Some(NonEmptyCollection::new(...));
opt = opt.take().pop();
The function's signature can't guarantee that the returned object is actually the same one, so what optimizations are possible? Does something like the C++ return value optimization apply, allowing the returned object to be "constructed" in the same memory it was in before? If I have the choice between an interface like the above, and an interface where the caller has to deal with the lifetime:
enum PopResult {
StillValid,
Dead
};
impl NonEmptyCollection {
fn pop(&mut self) -> PopResult {
// really remove the element
if self.len() == 0 { PopResult::Dead } else { PopResult::StillValid }
}
}
is there ever a reason to choose this dirtier interface for performance reasons? In the answer to the second example I linked, trentcl recommends storing Options in a data structure to allow the caller to do a change in-place instead of doing remove followed by insert every time. Would this dirty interface be a faster alternative?

YMMV
Depending on the optimizer's whim, you may end up with:
close to a no-op,
a few register moves,
a number of bit-copies.
This will depend whether:
the call is inlined, or not,
the caller re-assigns to the original variable or creates a fresh variable (and how well LLVM handles reusing dead space),
the size_of::<Self>().
The only guarantees you get is that no deep-copy will occur, as there is no .clone() call.
For anything else, you need to check the LLVM IR or assembly.

Per-thread initialization in Rayon

I am trying to optimize my function using Rayon's par_iter().
The single threaded version is something like:
fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
let result = txs.iter().map(|tx| {
tx.verify_and_store(store)
}).collect();
...
}
Each Store instance must be used only by one thread, but multiple instances of Store can be used concurrently, so I can make this multithreaded by clone-ing store:
fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
let result = txs.par_iter().map(|tx| {
let mut local_store = store.clone();
tx.verify_and_store(&mut local_store)
}).collect();
...
}
However, this clones the store on every iteration, which is way too slow. I would like to use one store instance per thread.
Is this possible with Rayon? Or should I resort to manual threading and a work-queue?

It is possible to use a thread-local variable to ensure that local_store is not created more than once in a given thread.
For example, this compiles (full source):
fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
use std::cell::RefCell;
thread_local!(static STORE: RefCell<Option<Store>> = RefCell::new(None));
let mut result = Vec::new();
txs.par_iter().map(|tx| {
STORE.with(|cell| {
let mut local_store = cell.borrow_mut();
if local_store.is_none() {
*local_store = Some(store.clone());
}
tx.verify_and_store(local_store.as_mut().unwrap())
})
}).collect_into(&mut result);
}
There are two problems with this code, however. One, if the clones of store need to do something when par_iter() is done, such as flush their buffers, it simply won't happen - their Drop will only be called when Rayon's worker threads exit, and even that is not guaranteed.
The second, and more serious problem, is that the clones of store are created exactly once per worker thread. If Rayon caches its thread pool (and I believe it does), this means that an unrelated later call to verify_and_store will continue working with last known clones of store, which possibly have nothing to do with the current store.
This can be rectified by complicating the code somewhat:
Store the cloned variables in a Mutex<Option<...>> instead of Option, so that they can be accessed by the thread that invoked par_iter(). This will incur a mutex lock on every access, but the lock will be uncontested and therefore cheap.
Use an Arc around the mutex in order to collect references to the created store clones in a vector. This vector is used to clean up the stores by resetting them to None after the iteration has finished.
Wrap the whole call in an unrelated mutex, so that two parallel calls to verify_and_store don't end up seeing each other's store clones. (This might be avoidable if a new thread pool were created and installed before the iteration.) Hopefully this serialization won't affect the performance of verify_and_store, since each call will utilize the whole thread pool.
The result is not pretty, but it compiles, uses only safe code, and appears to work:
fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
use std::sync::{Arc, Mutex};
type SharedStore = Arc<Mutex<Option<Store>>>;
lazy_static! {
static ref STORE_CLONES: Mutex<Vec<SharedStore>> = Mutex::new(Vec::new());
static ref NO_REENTRY: Mutex<()> = Mutex::new(());
}
thread_local!(static STORE: SharedStore = Arc::new(Mutex::new(None)));
let mut result = Vec::new();
let _no_reentry = NO_REENTRY.lock();
txs.par_iter().map({
|tx| {
STORE.with(|arc_mtx| {
let mut local_store = arc_mtx.lock().unwrap();
if local_store.is_none() {
*local_store = Some(store.clone());
STORE_CLONES.lock().unwrap().push(arc_mtx.clone());
}
tx.verify_and_store(local_store.as_mut().unwrap())
})
}
}).collect_into(&mut result);
let mut store_clones = STORE_CLONES.lock().unwrap();
for store in store_clones.drain(..) {
store.lock().unwrap().take();
}
}

Old question, but I feel the answer needs revisiting. In general, there are two methods:
Use map_with. This will clone every time a thread steals a work item from another thread. This will possibly clone more stores than there are threads, but it should be fairly low. If the clones are too expensive, you can increase the size rayon will split workloads with with_min_len.
fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
let result = txs.iter().map_with(|| store.clone(), |store, tx| {
tx.verify_and_store(store)
}).collect();
...
}
Or use the scoped ThreadLocal from the thread_local crate. This will ensure that you only use as many objects as there are threads, and that they are destroyed once the ThreadLocal object goes out of scope.
fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
let tl = ThreadLocal::new();
let result = txs.iter().map(|tx| {
let store = tl.get_or(|| Box::new(RefCell::new(store.clone)));
tx.verify_and_store(store.get_mut());
}).collect();
...
}

How can I get a unique thread identifier?

In an attempt to build an "emulated" Reentrant mutex, I need an identifier that is unique to each thread. I can get the current thread via thread::current, but Thread doesn't seem to have anything that could be used (or abused) as an identifier.
For my purposes, I believe the identifier can be reused once a thread exits, although I would be also interested in answers that didn't reuse identifiers as those may be useful in other cases.

Another way is, if you can use libc:
fn get_thread_id() -> libc::pthread_t {
unsafe { libc::pthread_self() }
}
pthread_t will map to the right target per plattform.

Although it would be much nicer to use something built-in to the threading system, one solution is to track our own thread IDs. These can be created using a combination of atomic and thread-local variables:
use std::sync::atomic;
use std::thread;
static THREAD_COUNT: atomic::AtomicUsize = atomic::ATOMIC_USIZE_INIT;
thread_local!(static THREAD_ID: usize = THREAD_COUNT.fetch_add(1, atomic::Ordering::SeqCst));
fn thread_id() -> usize {
THREAD_ID.with(|&id| id)
}
// Example usage
fn main() {
println!("{}", thread_id());
let handles: Vec<_> = (0..10).map(|_| {
thread::spawn(|| {
println!("{}", thread_id());
})
}).collect();
for h in handles { h.join().unwrap() }
}

but Thread doesn't seem to have anything that could be used (or abused) as an identifier.
This was rectified in Rust 1.19 via Thread::id.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Mutex<bool> with an atomic read&write - rust

Related

Is it possible to create a global that is assigned once at the beginning of runtime and then borrowed across threads?

Is it possible to share a HashMap between threads without locking the entire HashMap?

What are the performance implications of consuming self and returning it?

Per-thread initialization in Rayon

How can I get a unique thread identifier?

Categories

Resources