Atomic wrappers vs primitives - multithreading

I'm trying to understand few differences between std::sync::atomic::Atomic* structs and primitives such as i32, usize, bool in scope of multithreading.
First question, will another thread see changes to the non atomic type from another thread?
fn main() {
let mut counter = 0;
std::thread::scope(|scope| {
scope.spawn(|| counter += 1)
});
println!("{counter}");
}
Can I be sure that counter will be 1 right after another thread will write this value into it, or thread could cache this value? If not, will it work only with atomic type only?
fn main() {
let counter = AtomicI32::new(0);
std::thread::scope(|scope| {
scope.spawn(|| counter.store(1, Ordering::Release))
});
println!("{}", counter.load(Ordering::Acquire)); // Ordering::Acquire to prevent from reordering previous instructions
}
Second question, does Ordering type affects when value in store will be visible in other threads, or it will be visible right after store, even if Ordering::Relaxed was applied? As for example, will same code but with Ordering::Relaxed and no instructions reorder show 1 in counter?
fn main() {
let counter = AtomicI32::new(0);
std::thread::scope(|scope| {
scope.spawn(|| counter.store(1, Ordering::Relaxed))
});
println!("{}", counter.load(Ordering::Relaxed));
}
I understand difference between atomic and non atomic writes to same variable, I'm only interested if another thread will see changes, even if this changes won't be consistent.

First question, will another thread see changes to the non atomic type from another thread?
Yes. The difference between atomic and non-atomic variables is that you can change atomic variables using shared references, &AtomicX, and not just using mutable references, &mut X. This means that they can be changed in parallel in different threads. For primitives, the compiler will reject attempting that, e.g.:
fn main() {
let mut counter = 0;
std::thread::scope(|scope| {
scope.spawn(|| counter += 1);
scope.spawn(|| counter += 1);
});
println!("{counter}");
}
Or even the following, where we use the variable on the main thread but before the spawned thread is joined:
fn main() {
let mut counter = 0;
std::thread::scope(|scope| {
scope.spawn(|| counter += 1);
counter += 1;
});
println!("{counter}");
}
While with atomics this will work:
fn main() {
let counter = AtomicI32::new(0);
std::thread::scope(|scope| {
scope.spawn(|| counter.store(1, Ordering::Relaxed));
scope.spawn(|| counter.store(1, Ordering::Relaxed));
});
println!("{}", counter.load(Ordering::Relaxed));
}
Second question, does Ordering type affects when value in store will be visible in other threads, or it will be visible right after store, even if Ordering::Relaxed was applied? As for example, will same code but with Ordering::Relaxed and no instructions reorder show 1 in counter?
No. Ordering does not change what other threads will observe with this variable. And therefore, your usage of Release and Acquire is wrong.
On the other hand, Relaxed here will suffice, for other reasons.
You are guaranteed to see the value 1 in your code no matter what ordering you will use, because std::thread::scope() implicitly joins all spawned threads on exit, and joining a thread forms a happens-before relationship between everything done in this thread and the code after the join. In other words, you are guaranteed that everything done in the thread (and the includes storing to counter) will happen before everything you do after you join it (and that includes reading counter).
If there was not a join, for example, in this code:
fn main() {
let counter = AtomicI32::new(0);
std::thread::scope(|scope| {
scope.spawn(|| counter.store(1, Ordering::Release));
scope.spawn(|| println!("{}", counter.load(Ordering::Acquire)));
});
}
Then you are not guaranteed, despite the Release and Acquire orderings, to read the updated value. It may happen so, or may happen that you will read the old value.
Orderings are useful to create a happens-before relationship with different variables and code. But this is a complicated subject. I recommend reading this book (written by a Rust libs team member).

Related

Rust Multithreading only lock specific indices of vector

Situation
I have an array of f32
I have some threads that each will change a small part of the array
I do not know which indices will be changed
Every thread has to lock the array and then spend some time on an expensive calculation
Afterwards, it will change the index and release the array
Take a look at the commented minimal example below
The Problem
The first thread will lock the array and other threads can not edit it anymore. Thus wasting a lot of time. Other threads that need to edit different indices and would never touch the ones required by the first thread could have been executed at the same time.
Possible Solution
I know that the array outlives all threads so unsafe Rust is a viable option
I already posted a solution using the AtomicFloat external crate.
You may come up with a stdlib-only solution.
Minimal example:
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
use rand::Rng;
fn main() {
// Store the mutex
let container = Arc::new(Mutex::new([0.0; 10]));
// This will keep track of the created threads
let mut threads = vec![];
// Create new Threads
for _ in 0..10 {
// Create a copy of the mutex reference
let clone = Arc::clone(&container);
threads.push(thread::spawn(move || {
// The function somehow calculates the index that has to be changed
// In our case its simulated by picking a random index to emphasize that we do not know the index
let mut rng = rand::thread_rng();
let index = rng.gen_range(0..10);
// Unfortuantely we have to lock the array before the intense calculation !!!
// If we could just lock the index of the array, other threads could change other indices in parallel
// But now all of them need to wait for the lock
let mut myarray = clone.lock().unwrap();
// simulate intense calculation
thread::sleep(Duration::from_millis(1000));
// Now the index can be changed
println!("Changing index {}", index);
myarray[index] += 1.0;
}));
}
// Wait for all threads to finish
for thread in threads {
thread.join().unwrap();
}
// I know that myarray outlives the runtime of all threads.
// Therefore someone may come up with an unsafe solution
// Print the result
println!("{:?}", container);
}
This is a solution I came up with using the atomic_float crate.
Introduction to scoped threads
AtomicF32 docs
It works because the scoped thread ensures that all values live as long as the scope.
AtomicF32 protects the values from concurrent access
Note that this does not lock the index before the heavy work. When describing my problem I simplified it a bit. In reality, I had a loop that will often access the data thus almost constantly locking the array
use atomic_float::AtomicF32;
use rand::Rng;
use std::sync::atomic::Ordering;
use std::thread::{self, sleep};
use std::time::Duration;
fn main() {
// Create a new atomic float array
// atomic floats ensure that the value is protected from concurrent access
// Thus locking only the indecies
// Notice the array is not mutable
let myarray = [
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
AtomicF32::new(0.0),
];
// This is a scoped thread
thread::scope(|s| {
// The loop has to be inside the scope
// All threads spawned within thread::scope must terminate before thread::scope can return
// That's how it makes sure the scoped variables exist at least as long as the spawned threads
for _ in 0..10 {
//Create a new thread
s.spawn(|| {
let mut rng = rand::thread_rng();
let index = rng.gen_range(0..10);
// Simulate heavy work
sleep(Duration::from_millis(3000));
// Now the index can be changed
println!("Changing index {}", index);
// This is the atomic operation. The value is therefore protected from concurrent access
myarray[index].fetch_add(1.0, Ordering::SeqCst)
});
}
});
println!("{:?}", myarray);
}

Multi thread using Arc without a lock

is it possible to use Arc without a lock? Because I really don't care the order of reading data. Here is my playground.
use std::{sync::Arc, thread};
fn main() {
println!("Hello, world!");
let x = Arc::new(0);
let y = Arc::clone(&x);
thread::spawn(move || {
for i in 0..10 {
println!("exe:{},{}", i, y);
}
});
while *x < 100000 {
// Is it ok to change the value of x here?
*x += 1;
}
}
No. Even if you don't care about anything at all, Rust still does not allow data races. And if you only don't care about the order, you can still get unexpected results because the operation isn't atomic at CPU-level or because of compiler optimizations.
However, you don't have to use mutexes or other similar locks. You can use atomics. And if you don't care about order you can use Ordering::Relaxed that is going to be almost free (free on loads on x86 and I think ARM too, adds will have some overhead but it is marginal probably):
use std::sync::atomic::{AtomicI32, Ordering};
fn main() {
println!("Hello, world!");
let x = Arc::new(AtomicI32::new(0));
let y = Arc::clone(&x);
thread::spawn(move || {
for i in 0..10 {
println!("exe:{},{}", i, y.load(Ordering::Relaxed));
}
});
while x.load(Ordering::Relaxed) < 100000 {
x.fetch_add(1, Ordering::Relaxed);
}
}
Arc protects the reference count itself, it doesn't protect the data it references. Per the Arc docs on Thread Safety:
Arc<T> makes it thread safe to have multiple ownership of the same data, but it doesn’t add thread safety to its data.
The general docs for Arc note:
Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.
In short, Arc is not the appropriate type for what you're doing by itself. You could make it work just fine with an Arc<AtomicI32> or the like though, where the Arc maintains the lifetime, and the AtomicI32 protects access to the data itself.

rust closures definition insdie a for loop

I faced the same problem as mentioned in this question. In short his problem is borrowing an object as mutable, due to its usage inside a closure, and borrowing it as immutable due to usage inside of a function (or macro in this case).
fn main() {
let mut count = 0;
let mut inc = || {
count += 2;
};
for _index in 1..5 {
inc();
println!("{}", count);
}
}
One solution to this problem is defining the closure inside the for loop instead of outside the for loop, or avoid capture of a variable by passing the mutable reference using the parameters of the closure:
1.
fn main() {
let mut count = 0;
for _index in 1..5 {
let mut inc = || {
count += 2;
};
inc();
println!("{}", count);
}
}
fn main() {
let mut count = 0;
let inc = | count: &mut i32| {
*count += 2;
};
for _index in 1..5 {
inc(&mut count);
println!("{}", count);
}
}
So I have the following questions on my mind:
Which one of these follows the best practice solutions?
Is there a 3rd way of doing things the right way?
According to my un understanding, closures are just anonymous functions, so defining them multiple times is as efficient as defining them a single time. But I am not able to find a definite answer to this question on the official rust references. Help!
Regarding which one is the right solutions, I would say it depends on the use case. They are so similar it shouldn't matter in most cases unless there is something else to sway the decision. I don't know of any third solution.
However, closures are not just anonymous functions but also anonymous structs: A closures is an anonymous struct that calls an anonymous function. The members of the struct are the references to borrows values. This is important because structs need to be initialized and potentially moved around, unlike functions. This means the more values your closure borrows, the more expensive it is to initialize and pass as an argument to functions (by value). Likewise, if you initialize your closure inside a loop, the initialization might happen every iteration (if it is not optimized out of the loop), making it less performant than initializing it outside the loop.
We can try and desugar the first example into the following code:
struct IncClusureStruct<'a> {
count: &'a mut i32,
}
fn inc_closure_fn<'a>(borrows: &mut IncClusureStruct<'a>) {
*borrows.count += 2
}
fn main() {
let mut count = 0;
for _index in 1..5 {
let mut inc_struct = IncClusureStruct { count: &mut count };
inc_closure_fn(&mut inc_struct);
println!("{}", count);
}
}
Note: The compiler doesn't necessarily do exactly like this, but it is a useful approximation.
Here you can see the closure struct IncClusureStructand its function inc_closure_fn, which together are used to provide the functionality of inc. You can see we initialize the struct in the loop and then call it immediately. If we were to desugar the second example, IncClusureStruct would have no members, but inc_closure_fn would take an additional argument that references the counter. The counter reference would then go to the function call instead of the struct initializer.
These two examples end up being the same efficiency-wise because the number of actual values passed to the function is the same in both cases: 1 reference. Initialize a struct with one member is the same as simply initializing the member itself, the wrapping struct is gone by the time you reach machine code. I tried this on Godbolt and as far as I can tell, the resulting assembly is the same.
However, optimizations don't catch all situations. So, if performance is important, benchmarking is the way to go.

Is it possible to share a HashMap between threads without locking the entire HashMap?

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don't want to lock the whole HashMap for a single update/remove, so my HashMap looks something like HashMap<u8, Mutex<u8>>. This works, but it makes no sense since the thread will lock the whole map anyways.
Here's this working version, without threads; I don't think that's necessary for the example.
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
fn main() {
let s = Arc::new(Mutex::new(S::new()));
let z = s.clone();
let _ = z.lock().unwrap();
}
struct S {
x: HashMap<u8, Mutex<u8>>, // other non-mutable fields
}
impl S {
pub fn new() -> S {
S {
x: HashMap::default(),
}
}
}
Playground
Is this possible in any way? Is there something obvious I missed in the documentation?
I've been trying to get this working, but I'm not sure how. Basically every example I see there's always a Mutex (or RwLock, or something like that) guarding the inner value.
I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?
In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.
Here's an example:
use std::{
collections::HashMap,
sync::{Arc, Mutex, RwLock},
thread,
time::Duration,
};
fn main() {
let data = Arc::new(RwLock::new(HashMap::new()));
let threads: Vec<_> = (0..10)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
})
.collect();
for t in threads {
t.join().expect("Thread panicked");
}
println!("{:?}", data);
}
fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
loop {
// Assume that the element already exists
let map = data.read().expect("RwLock poisoned");
if let Some(element) = map.get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
thread::sleep(Duration::from_secs(1));
return;
}
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
drop(map);
let mut map = data.write().expect("RwLock poisoned");
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
thread::sleep(Duration::from_millis(50));
map.entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
}
}
This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.
This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.
In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.
The next step up would be to use a crate like arc-swap, which says:
Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.
The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.
I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.
As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.
Suppose the key of the data is map-able to a u8
You can have Arc<HashMap<u8,Mutex<HashMap<Key,Value>>>
When you initialize the data structure you populate all the first level map before putting it in Arc (it will be immutable after initialization)
When you want a value from the map you will need to do a double get, something like:
data.get(&map_to_u8(&key)).unwrap().lock().expect("poison").get(&key)
where the unwrap is safe because we initialized the first map with all the value.
to write in the map something like:
data.get(&map_to_u8(id)).unwrap().lock().expect("poison").entry(id).or_insert_with(|| value);
It's easy to see contention is reduced because we now have 256 Mutex and the probability of multiple threads asking the same Mutex is low.
#Shepmaster example with 100 threads takes about 10s on my machine, the following example takes a little more than 1 second.
use std::{
collections::HashMap,
sync::{Arc, Mutex, RwLock},
thread,
time::Duration,
};
fn main() {
let mut inner = HashMap::new( );
for i in 0..=u8::max_value() {
inner.insert(i, Mutex::new(HashMap::new()));
}
let data = Arc::new(inner);
let threads: Vec<_> = (0..100)
.map(|i| {
let data = Arc::clone(&data);
thread::spawn(move || worker_thread(i, data))
})
.collect();
for t in threads {
t.join().expect("Thread panicked");
}
println!("{:?}", data);
}
fn worker_thread(id: u8, data: Arc<HashMap<u8,Mutex<HashMap<u8,Mutex<i32>>>>> ) {
loop {
// first unwrap is safe to unwrap because we populated for every `u8`
if let Some(element) = data.get(&id).unwrap().lock().expect("poison").get(&id) {
let mut element = element.lock().expect("Mutex poisoned");
// Perform our normal work updating a specific element.
// The entire HashMap only has a read lock, which
// means that other threads can access it.
*element += 1;
thread::sleep(Duration::from_secs(1));
return;
}
// If we got this far, the element doesn't exist
// Get rid of our read lock and switch to a write lock
// You want to minimize the time we hold the writer lock
// We use HashMap::entry to handle the case where another thread
// inserted the same key while where were unlocked.
thread::sleep(Duration::from_millis(50));
data.get(&id).unwrap().lock().expect("poison").entry(id).or_insert_with(|| Mutex::new(0));
// Let the loop start us over to try again
}
}
Maybe you want to consider evmap:
A lock-free, eventually consistent, concurrent multi-value map.
The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.

Explain the behavior of *Rc::make_mut and why it differs compared to Mutex

I needed to pass a resource between several functions which use a closure as an argument. And within these the data was handled, but it looked for that the changes that were realized to a variable will be reflected in the rest.
The first thing I thought was to use Rc. I had previously used Arc to handle the data between different threads, but since these functions aren't running in different threads I chose Rc instead.
The most simplified code that I have, to show my doubts:
The use of RefCell was because maybe I had to see that this syntax will not work as I expected:
*Rc::make_mut(&mut rc_pref_temp)...
use std::sync::Arc;
use std::rc::Rc;
use std::sync::Mutex;
use std::cell::RefCell;
use std::cell::Cell;
fn main() {
test2();
println!("---");
test();
}
#[derive(Debug, Clone)]
struct Prefe {
name_test: RefCell<u64>,
}
impl Prefe {
fn new() -> Prefe {
Prefe {
name_test: RefCell::new(3 as u64),
}
}
}
fn test2(){
let mut prefe: Prefe = Prefe::new();
let mut rc_pref = Rc::new(Mutex::new(prefe));
println!("rc_pref Mutex: {:?}", rc_pref.lock().unwrap().name_test);
let mut rc_pref_temp = rc_pref.clone();
*rc_pref_temp.lock().unwrap().name_test.get_mut() += 1;
println!("rc_pref_clone Mutex: {:?}", rc_pref_temp.lock().unwrap().name_test);
*rc_pref_temp.lock().unwrap().name_test.get_mut() += 1;
println!("rc_pref_clone Mutex: {:?}", rc_pref_temp.lock().unwrap().name_test);
println!("rc_pref Mutex: {:?}", rc_pref.lock().unwrap().name_test);
}
fn test(){
let mut prefe: Prefe = Prefe::new();
let mut rc_pref = Rc::new(prefe);
println!("rc_pref: {:?}", rc_pref.name_test);
let mut rc_pref_temp = rc_pref.clone();
*((*Rc::make_mut(&mut rc_pref_temp)).name_test).get_mut() += 1;
println!("rc_pref_clone: {:?}", rc_pref_temp.name_test);
*((*Rc::make_mut(&mut rc_pref_temp)).name_test).get_mut() += 1;
println!("rc_pref_clone: {:?}", rc_pref_temp.name_test);
println!("rc_pref: {:?}", rc_pref.name_test);
}
The code is simplified, the scenario where it is used is totally different. I note this to avoid comments like "you can lend a value to the function", because what interests me is to know why the cases exposed work in this way.
stdout:
rc_pref Mutex : RefCell { value: 3 }
rc_pref_clone Mutex : RefCell { value: 4 }
rc_pref_clone Mutex : RefCell { value: 5 }
rc_pref Mutex : RefCell { value: 5 }
---
rc_pref : RefCell { value: 3 }
rc_pref_clone : RefCell { value: 4 }
rc_pref_clone : RefCell { value: 5 }
rc_pref : RefCell { value: 3 }
About test()
I'm new to Rust so I don't know if this crazy syntax is the right way.
*((*Rc::make_mut(&mut rc_pref_temp)).name_test).get_mut() += 1;
When running test() you can see that the previous syntax works, because it increases the value, but this increase does not affect the clones. I expected that with the use of *Rc::make_mut(& mut rc_pref_temp)... that the clones of a shared reference would reflect the same values.
If Rc has references to the same object, why do the changes to an object not apply to the rest of the clones? Why does this work this way? Am I doing something wrong?
Note: I use RefCell because in some tests I thought that maybe I had something to do.
About test2()
I've got it working as expected using Mutex with Rc, but I do not know if this is the correct way. I have some ideas of how Mutex and Arc works, but after using this syntax:
*Rc::make_mut(&mut rc_pref_temp)...
With the use of Mutex in test2(), I wonder if Mutex is not only responsible for changing the data in but also the one in charge of reflecting the changes in all the cloned references.
Do the shared references actually point to the same object? I want to think they do, but with the above code where the changes are not reflected without the use of Mutex, I have some doubts.
You need to read and understand the documentation for functions you use before you use them. Rc::make_mut says, emphasis mine:
Makes a mutable reference into the given Rc.
If there are other Rc or Weak pointers to the same value, then
make_mut will invoke clone on the inner value to ensure unique
ownership. This is also referred to as clone-on-write.
See also get_mut, which will fail rather than cloning.
You have multiple Rc pointers because you called rc_pref.clone(). Thus, when you call make_mut, the inner value will be cloned and the Rc pointers will now be disassociated from each other:
use std::rc::Rc;
fn main() {
let counter = Rc::new(100);
let mut counter_clone = counter.clone();
println!("{}", Rc::strong_count(&counter)); // 2
println!("{}", Rc::strong_count(&counter_clone)); // 2
*Rc::make_mut(&mut counter_clone) += 50;
println!("{}", Rc::strong_count(&counter)); // 1
println!("{}", Rc::strong_count(&counter_clone)); // 1
println!("{}", counter); // 100
println!("{}", counter_clone); // 150
}
The version with the Mutex works because it's completely different. You aren't calling a function which clones the inner value anymore. Of course, it doesn't make sense to use a Mutex when you don't have threads. The single-threaded equivalent of a Mutex is... RefCell!
I honestly don't know how you found Rc::make_mut; I've never even heard of it before. The module documentation for cell doesn't mention it, nor does the module documentation for rc.
I'd highly encourage you to take a step back and re-read through the documentation. The second edition of The Rust Programming Language has a chapter on smart pointers, including Rc and RefCell. Read the module-level documentation for rc and cell as well.
Here's what your code should look like. Note the usage of borrow_mut.
fn main() {
let prefe = Rc::new(Prefe::new());
println!("prefe: {:?}", prefe.name_test); // 3
let prefe_clone = prefe.clone();
*prefe_clone.name_test.borrow_mut() += 1;
println!("prefe_clone: {:?}", prefe_clone.name_test); // 4
*prefe_clone.name_test.borrow_mut() += 1;
println!("prefe_clone: {:?}", prefe_clone.name_test); // 5
println!("prefe: {:?}", prefe.name_test); // 5
}

Resources