The documentation says it is deprecated. What's the system semaphore? And what's the best replacement for this struct now?
Deprecated since 1.7.0: easily confused with system semaphore and not used enough to pull its weight
System semaphore refers to whatever semaphore the operating system provides. On POSIX (Linux, MacOS) these are the methods you get from #include <semaphore.h> (man page). std::sync::Semaphore was implemented in rust and was separate from the OS's semaphore, although it did use some OS level synchronization primitives (std::sync::Condvar which is based on pthread_cond_t on linux).
std::sync::Semaphore was never stabilized. The source code for Semaphore contains an unstable attribute
#![unstable(feature = "semaphore",
reason = "the interaction between semaphores and the acquisition/release \
of resources is currently unclear",
issue = "27798")]
The issue number in the header specifies the discussion about this feature.
The best replacement within std is either a std::sync::CondVar or a busy loop paired with a std::sync::Mutex. Pick a CondVar over a busy loop if you think you might be waiting more than a few thousand clock cycles.
The documentation for Condvar has a good example of how to use it as a (binary) semaphore
use std::sync::{Arc, Mutex, Condvar};
use std::thread;
let pair = Arc::new((Mutex::new(false), Condvar::new()));
let pair2 = Arc::clone(&pair);
// Inside of our lock, spawn a new thread, and then wait for it to start.
thread::spawn(move|| {
let (lock, cvar) = &*pair2;
let mut started = lock.lock().unwrap();
*started = true;
// We notify the condvar that the value has changed.
cvar.notify_one();
});
// Wait for the thread to start up.
let (lock, cvar) = &*pair;
let mut started = lock.lock().unwrap();
while !*started {
started = cvar.wait(started).unwrap();
}
This example could be adapted to work as a counting semaphore by changing Mutex::new(false) to Mutex::new(0) and a few corresponding changes.
Related
[PLAYGROUND]
I need to execute parallel calls(in this example 2) once and insert result values into the same mutable HashMap defined earlier, then only after all are completed (running once) the program progresses further and extracts the HashMap from Mutex<>.
let mut REZN:Mutex<HashMap<u8, (u128, u128)>> = Mutex::new(HashMap::new());
let b=vec![0, 1, 2, (...), 4999, 5000];
let payload0 = &b[0..2500];
let payload1 = &b[2500..5000];
tokio::spawn(async move{
let result_ = //make calls
for (i,j) in izip!(payload0.iter(), result_.iter()){
REZN.lock().unwrap().insert(*i, (j[0], j[1]));
};
});
tokio::spawn(async move{
let result_ = //make calls
for (i,j) in izip!(payload1.iter(), result_.iter()){
REZN.lock().unwrap().insert(*i, (j[0], j[1]));
};
});
I'm just starting with multithreading in Rust. Both the hashmap and the object used to make calls are moved into the spawned thread. I read that cloning should be done and I tried it, but the compiler says:
&mut REZN.lock().unwrap().clone().insert(*i, (j[0], j[1]));
| |---- use occurs due to use in generator
what does that mean? what's a generator in that context?
and
value moved here, in previous iteration of loop errors are abundant.
I don't want it to do more than 1 iteration. How can I put a stop once each is done its job inserting into the HashMap?
Later, I'm trying to escape the lock/extract the Hashmap from inside of Mutex<>:
let mut REZN:HashMap<u8, (u128, u128)> = *REZN.lock().unwrap();
| ^^^^^^^^^^^^^^^^^^^^^
| |
| move occurs because value has type `HashMap<u8, (u128, u128)>`, which does not implement the `Copy` trait
| help: consider borrowing here: `&*REZN.lock().unwrap()`
But if I borrow here errors appear elsewhere. Could this work though if there was no conflict? I read that Mutex is removed automatically when threads are done working on it, but I don't know how that happens exactly on a lower level (if you can reccomend resources I'll be glad to read up on that).
I tried clone() both in the threads and the later attempt of extracting the HashMap, and they fail unfortunately. Am I doing it wrong?
Finally, how can I await until both are completed to proceed further in my program?
what does that mean? what's a generator in that context?
An async block compiles to a generator.
I tried clone() both in the threads and the later attempt of extracting the HashMap, and they fail unfortunately. Am I doing it wrong?
Yes. If you clone inside the thread/tasks, then first the map is moved into the routine then it's cloned when used. That's not helpful, because once the map has been moved it can't be used from the caller anymore.
A common solution to that is the "capture clause pattern", where you use an outer block which can then do the setup for a closure or inner block:
tokio::spawn({
let REZN = REZN.clone();
async move{
let result_ = [[6, 406], [7,407]];//make calls
for (i,j) in izip!(payload0.iter(), result_.iter()){
REZN.lock().unwrap().insert(*i, (j[0], j[1]));
};
});
This way only the cloned map will be moved into the closure.
However this is not very useful, or efficient, or convenient: by cloning the map, each tasks gets its own map (a copy of the original), and you're left with just the unmodified original. This means there's nothing to extract, because in practice it's as if nothing had happened. This also makes the mutex redundant: since each tasks has its own (copy of the) map, there's no need for synchronisation because there's no sharing.
The solution is to use shared ownership primitives, namely Arc:
let REZN: Arc<Mutex<HashMap<u8, (u128, u128)>>> = Arc::new(Mutex::new(HashMap::new()));
this way you can share the map between all coroutines, and the mutex will synchronise access: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=33ce606b1ab7c2dfc7f4897de69855ef
Alternatively, Rust threads and tasks can return values, so each task could create a map internally, return it after it's done, and the parent can get those maps and merge them:
let task1 = tokio::spawn(async move {
let mut map = Map::new();
let result_ = [[6, 406], [7, 407]]; //make calls
for (i, j) in izip!(payload0.iter(), result_.iter()) {
map.insert(*i, (j[0], j[1]));
}
map
});
let task2 = tokio::spawn(async move {
let mut map = Map::new();
let result_ = [[6, 106], [7, 907]]; //make calls
for (i, j) in izip!(payload1.iter(), result_.iter()) {
map.insert(*i, (j[0], j[1]));
}
map
});
match tokio::join![task1, task2] {
(Ok(mut m1), Ok(m2)) => {
m1.extend(m2.into_iter());
eprintln!("{:?}", m1);
}
e => eprintln!("Error {:?}", e),
}
This has a higher number of allocations, but there is no synchronisation necessary between the workers.
A mutex will give you safe multithread access, but you also need to share the ownership of the mutex itself between those threads.
If you used scoped threads, you could just use a &Mutex<HashMap<...>>, but if you want or need to use normal tokio spawned tasks, you cannot pass a reference because normal tokio tasks require the callback to be 'static, and a reference to a local variable will not comply. In this case the idiomatic solution is to use an Arc<Mutex<HashMap<...>>>.
let REZN:Mutex<HashMap<u8, (u128, u128)>> = Mutex::new(HashMap::new());
let REZN = Arc::new(REZN);
And then pass a clone of the Arc to the spawned tasks. There are several ways to write that but my favourite currently is this:
let task1 = {
let REZN = Arc::clone(&REZN);
tokio::spawn(async move{
//...
})
};
A little known fact about Arc is that you can extract the inner value using Arc::try_unwrap(), but that will only work if your Arc is the only one pointing to this value. In your case you can ensure that by waiting for (joining) the spawned tasks.
task1.await.unwrap();
task2.await.unwrap();
And then you can unwrap the Arc and the Mutex with this nice looking line:
let REZN = Arc::try_unwrap(REZN).unwrap().into_inner().unwrap();
These four unwraps are for the following:
Arc::try_unwrap(REZN) gets to the inner value of the Arc.
But only if this is the only clone of the Arc so we have a Result that we have to unwrap().
We get a Mutex that we unwrap using into_inner(). Note that we do not lock the mutex to extract the inner value: since into_inner() requires the mutex by value we are sure that it is not borrowed anywhere and we are sure we have exclusive access.
But this can fail too if the mutex is poisoned, so another unwrap() to get the real value. This is not needed if you use tokio::Mutex instead, because they don't have poisoning.
You can see the whole thing in this playground.
I have read that the RP2040 has two cores. How can I use the second core in a Rust program?
I do not need to go all the way to generic multithreading, I just want to have two threads, each of which owns one of the cores, and they can communicate with each other.
The Rust book's section about Fearless Concurrency (suggested by Jeremy) is not much help.
thread::spawn(|| {
let mut x = 0;
x = x + 1;
});
fails to compile
error[E0433]: failed to resolve: use of undeclared crate or module `thread`
--> src/main.rs:108:5
|
108 | thread::spawn(|| {
| ^^^^^^ use of undeclared crate or module `thread`
which is hardly surprising given that thread is part of std and the RP2040 is a #![no_std] environment.
In the C API there is a function multicore_launch_core1. Is there an equivalent Rust API?
As you have already discovered, the multi threading facilities of the Rust std library rely on the facilities of an OS kernel which are not available when working in a bare metal embedded environment.
The actual process of getting the second core to execute code is a little complex and low level. It is described in the RP2040 datasheet in the section titled "2.8.2. Launching Code On Processor Core 1".
In summary - after the second core boots up, it goes into a sleep state waiting for instructions to be sent to it over the SIO FIFO, which is a communications channel between the two cores. The instructions sent through provides an interrupt vector table, a stack pointer and an entry point for the core to begin executing.
Luckily, the rp2040_hal crate provides a higher level abstraction for this . The example below is from the multicore module of this crate:
use rp2040_hal::{pac, gpio::Pins, sio::Sio, multicore::{Multicore, Stack}};
static mut CORE1_STACK: Stack<4096> = Stack::new();
fn core1_task() -> ! {
loop {}
}
fn main() -> ! {
let mut pac = pac::Peripherals::take().unwrap();
let mut sio = Sio::new(pac.SIO);
// Other init code above this line
let mut mc = Multicore::new(&mut pac.PSM, &mut pac.PPB, &mut sio.fifo);
let cores = mc.cores();
let core1 = &mut cores[1];
let _test = core1.spawn(unsafe { &mut CORE1_STACK.mem }, core1_task);
// The rest of your application below this line
}
In the above example, the code within the core1_task function will be executed on the second core, while the first core continues to execute the main function. There are more complete examples in the crate's examples directory.
Disclaimer: I have not used this crate or microcontroller myself - all info was found from online documentation.
Does into_inner() return all the relaxed writes in this example program? If so, which concept guarantees this?
extern crate crossbeam;
use std::sync::atomic::{AtomicUsize, Ordering};
fn main() {
let thread_count = 10;
let increments_per_thread = 100000;
let i = AtomicUsize::new(0);
crossbeam::scope(|scope| {
for _ in 0..thread_count {
scope.spawn(|| {
for _ in 0..increments_per_thread {
i.fetch_add(1, Ordering::Relaxed);
}
});
}
});
println!(
"Result of {}*{} increments: {}",
thread_count,
increments_per_thread,
i.into_inner()
);
}
(https://play.rust-lang.org/?gist=96f49f8eb31a6788b970cf20ec94f800&version=stable)
I understand that crossbeam guarantees that all threads are finished and since the ownership goes back to the main thread, I also understand that there will be no outstanding borrows, but the way I see it, there could still be outstanding pending writes, if not on the CPUs, then in the caches.
Which concept guarantees that all writes are finished and all caches are synced back to the main thread when into_inner() is called? Is it possible to lose writes?
Does into_inner() return all the relaxed writes in this example program? If so, which concept guarantees this?
It's not into_inner that guarantees it, it's join.
What into_inner guarantees is that either some synchronization has been performed since the final concurrent write (join of thread, last Arc having been dropped and unwrapped with try_unwrap, etc.), or the atomic was never sent to another thread in the first place. Either case is sufficient to make the read data-race-free.
Crossbeam documentation is explicit about using join at the end of a scope:
This [the thread being guaranteed to terminate] is ensured by having the parent thread join on the child thread before the scope exits.
Regarding losing writes:
Which concept guarantees that all writes are finished and all caches are synced back to the main thread when into_inner() is called? Is it possible to lose writes?
As stated in various places in the documentation, Rust inherits the C++ memory model for atomics. In C++11 and later, the completion of a thread synchronizes with the corresponding successful return from join. This means that by the time join completes, all actions performed by the joined thread must be visible to the thread that called join, so it is not possible to lose writes in this scenario.
In terms of atomics, you can think of a join as an acquire read of an atomic that the thread performed a release store on just before it finished executing.
I will include this answer as a potential complement to the other two.
The kind of inconsistency that was mentioned, namely whether some writes could be missing before the final reading of the counter, is not possible here. It would have been undefined behaviour if writes to a value could be postponed until after its consumption with into_inner. However, there are no unexpected race conditions in this program, even without the counter being consumed with into_inner, and even without the help of crossbeam scopes.
Let us write a new version of the program without crossbeam scopes and where the counter is not consumed (Playground):
let thread_count = 10;
let increments_per_thread = 100000;
let i = Arc::new(AtomicUsize::new(0));
let threads: Vec<_> = (0..thread_count)
.map(|_| {
let i = i.clone();
thread::spawn(move || for _ in 0..increments_per_thread {
i.fetch_add(1, Ordering::Relaxed);
})
})
.collect();
for t in threads {
t.join().unwrap();
}
println!(
"Result of {}*{} increments: {}",
thread_count,
increments_per_thread,
i.load(Ordering::Relaxed)
);
This version still works pretty well! Why? Because a synchronizes-with relation is established between the ending thread and its corresponding join. And so, as well explained in a separate answer, all actions performed by the joined thread must be visible to the caller thread.
One could probably also wonder whether even the relaxed memory ordering constraint is sufficient to guarantee that the full program behaves as expected. This part is addressed by the Rust Nomicon, emphasis mine:
Relaxed accesses are the absolute weakest. They can be freely re-ordered and provide no happens-before relationship. Still, relaxed operations are still atomic. That is, they don't count as data accesses and any read-modify-write operations done to them occur atomically. Relaxed operations are appropriate for things that you definitely want to happen, but don't particularly otherwise care about. For instance, incrementing a counter can be safely done by multiple threads using a relaxed fetch_add if you're not using the counter to synchronize any other accesses.
The mentioned use case is exactly what we are doing here. Each thread is not required to observe the incremented counter in order to make decisions, and yet all operations are atomic. In the end, the thread joins synchronize with the main thread, thus implying a happens-before relation, and guaranteeing that the operations are made visible there. As Rust adopts the same memory model as C++11's (this is implemented by LLVM internally), we can see regarding the C++ std::thread::join function that "The completion of the thread identified by *this synchronizes with the corresponding successful return". In fact, the very same example in C++ is available in cppreference.com as part of the explanation on the relaxed memory order constraint:
#include <vector>
#include <iostream>
#include <thread>
#include <atomic>
std::atomic<int> cnt = {0};
void f()
{
for (int n = 0; n < 1000; ++n) {
cnt.fetch_add(1, std::memory_order_relaxed);
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f);
}
for (auto& t : v) {
t.join();
}
std::cout << "Final counter value is " << cnt << '\n';
}
The fact that you can call into_inner (which consumes the AtomicUsize) means that there are no more borrows on that backing storage.
Each fetch_add is an atomic with the Relaxed ordering, so once the threads are complete there shouldn't be any thing that changes it (if so, then there's a bug in crossbeam).
See the description on into_inner for more info
I am a newbie to Rust, and I want to sum up a large amount of numbers using concurrency. I found this code:
use std::thread;
use std::sync::{Arc, Mutex};
static NTHREAD: usize = 10;
fn main() {
let mut threads = Vec::new();
let x = 0;
// A thread-safe, sharable mutex object
let data = Arc::new(Mutex::new(x));
for i in 1..(NTHREAD+1) {
// Increment the count of the mutex
let mutex = data.clone();
threads.push(thread::spawn(move || {
// Lock the mutex
let n = mutex.lock();
match n {
Ok(mut n) => *n += i,
Err(str) => println!("{}", str)
}
}));
}
// Wait all threads ending
for thread in threads {
let _ = thread.join().unwrap();
}
assert_eq!(*data.lock().unwrap(), 55);
}
This works when the threads are 10, but does not work when the threads are larger than 20.
I think it should be fine in any number of threads.
Do I misunderstand something? Is there another way to sum up from 1 to 1000000 with concurrency?
There are several problems with the provided code.
thread::spawn creates an OS-level thread, which means the existing code cannot possibly scale to numbers up to a million as indicated in the title. That would require a million threads in parallel, where typical modern OS'es support up to a few thousands of threads at best. More constrained environments, such as embedded systems or virtual/paravirtual machines, allow much less than that; for example, the Rust playground appears to allow a maximum of 24 concurrent threads. Instead, one needs to create a fixed small number of threads, and carefully divide the work among them.
The function executing in each thread runs inside a lock, which effectively serializes the work done by the threads. Even if one could spawn arbitrarily many threads, the loop as written would execute no faster than what would be achieved by a single thread - and in practice it would be orders of magnitude slower because it would spend a lot of time on locking/unlocking of a heavily contended mutex.
One good way to approach this kind of problem while still managing threads manually is provided in the comment by Boiethios: if you have 4 threads, just sum 1..250k, 250k..500k, etc. in each thread and then sum up the return of the threaded functions.
Or is there another way to sum up from 1 to 1000000 with concurrency?
I would recommend using a higher-level library that encapsulates creation/pooling of worker threads and division of work among them. Rayon is an excellent one, providing a "parallel iteration" facility, which works like iteration, but automatically dividing up the work among multiple cores. Using Rayon, parallel summing of integers would look like this:
extern crate rayon;
use rayon::prelude::*;
fn main() {
let sum: usize = (1..1000001).collect::<Vec<_>>().par_iter().sum();
assert_eq!(sum, 500000500000);
}
I'm porting my C++ chess engine in Rust. I have a big hash table shared between search threads and in the C++ version this table is lock-less; there is no mutex for sharing read/write access. Here is the theory, if you are interested.
In the Rust version of this code, it is working fine, but uses a Mutex:
let shared_hash = Arc::new(Mutex::new(new_hash()));
for _ in 0..n_cpu {
println!("start thread");
let my_hash = shared_hash.clone();
thread_pool.push(thread::spawn(move || {
let mut my_hash = my_hash.lock().unwrap();
let mut search_engine = SearchEngine::new();
search_engine.search(&mut myhash);
}));
}
for i in thread_pool {
let _ = i.join();
}
How could I share the table between threads without a mutex?
Quite simply, actually: the Mutex is unnecessary if the underlying structure is already Sync.
In your case, an array of structs of atomics for example would work. You can find Rust's available atomics here.
Data races are undefined behavior in both C++ and Rust. Just Say No.
The right way is to build your table out of atomic integers. It's rocket science. You have to decide case by case how much you care about the order of memory operations. This does clutter up your code:
// non-atomic array access
table[h] = 0;
// atomic array access
table[h].store(0, Ordering::SeqCst);
But it's worth it.
There's no telling what the performance penalty will be -- you just have to try it out.