std::sync::mpsc::channel always in the same order - rust

No matter how many times I run the program, it always shows the numbers in the same order:
use std::sync::mpsc::channel;
use std::thread;
fn main() {
let (tx, rx) = channel();
for i in 0 ..10 {
let tx = tx.clone();
thread::spawn(move || {
tx.send(i).unwrap();
});
}
for _ in 0..10 {
println!("{}", rx.recv().unwrap());
}
}
Code on the playground. The output is:
6
7
8
5
9
4
3
2
1
0
If I rebuild the project, the sequence will change. Is the sequence decided at compile time?

What order would you expect them to be in? For what it's worth, on my machine I ran the same binary twice and got slightly different results.
Ultimately, this comes down to how your operating system decides to schedule threads. You create 10 new threads and then ask the OS to run each of them when convenient. A hypothetical thread scheduler might look like this:
for thread in threads {
if thread.runnable() {
thread.run_for_a_time_slice();
}
}
Where threads stores the threads in the order they were created. It's unlikely that any OS would be this naïve, but it shows the idea.
In your case, every thread is ready to run immediately, and is very short so it can run all the way to completion before the time is up.
Additionally, there might be some fairness being applied to the lock that guards the channel. Perhaps it always lets the first of multiple competing threads submit a value. Unfortunately, the implementation of channels is reasonably complex, so I can't immediately say if that's the case or not.

Related

CPU time sleep instead of wall-clock time sleep

Currently, I have the following Rust toy program:
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis
.par_iter()
.for_each(|_| thread::sleep(time::Duration::from_millis(1000)));
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
Basically, my program accepts one input argument n. Then, I sleep for 1 second n times. The program executes the sleep tasks in parallel using rayon.
However, this is not exactly what I want. As far as I know, thread::sleep sleeps according to wall-clock time. However, I would like to keep a virtual CPU busy for 1 second in CPU time.
Is there any way to do this?
EDIT
I would like to make this point clear: I don't mind if the OS preempts the tasks. However, if this happens, then I don't want to consider the time the task spends in the ready/waiting queue.
EDIT
This is a simple, illustrative example of what I need to do. In reality, I have to develop a benchmark for a crate that allows defining and simulating models using the DEVS formalism. The benchmark aims to compare DEVS-compliant libraries with each other, and it explicitly says that the models must spend a fixed, known amount of CPU time. That is why I need to make sure of that. Thus, I cannot use a simple busy loop nor simply sleep.
I followed Sven Marnach's suggestions and implemented the following function:
use cpu_time::ThreadTime;
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis.par_iter().for_each(|_| {
let duration = time::Duration::from_millis(1000);
let mut x: u32 = 0;
let now = ThreadTime::now(); // get current thread time
while now.elapsed() < duration { // active sleep
std::hint::black_box(&mut x); // to avoid compiler optimizations
x = x.wrapping_add(1);
}
});
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
If I set n to 8, it takes 2 seconds more or less. I'd expect a better performance (1 second, as I have 8 vCPUs), but I guess that the overhead corresponds to the OS scheduling policy.

Is rayon's parallelism limited to the cores of the machine?

I have the following toy Rust program:
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 second n times
fn seq_sleep(n: usize) {
for _ in 0..n {
thread::sleep(time::Duration::from_millis(1000));
}
}
/// Launches n threads that sleep 1 second
fn thread_sleep(n: usize) {
let mut handles = Vec::new();
for _ in 0..n {
handles.push(thread::spawn(|| {
thread::sleep(time::Duration::from_millis(1000))
}));
}
for handle in handles {
handle.join().unwrap();
}
}
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis
.par_iter()
.for_each(|_| thread::sleep(time::Duration::from_millis(1000)));
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
seq_sleep(n);
println!("sequential: {:?}", now.elapsed());
let now = time::Instant::now();
thread_sleep(n);
println!("thread: {:?}", now.elapsed());
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
Basically, I want to compare the degree of parallelism of i) sequential code, ii) basic threads, and iii) rayon. To do so, my program accepts one input parameter n and, depending on the method, it sleeps for 1 second n times.
For n = 8, I get the following output:
sequential: 8.016809707s
thread: 1.006029845s
rayon: 1.004957395s
So far so good. However, for n = 9, I get the following output:
sequential: 9.012422104s
thread: 1.003085005s
rayon: 2.011378713s
The sequential and basic thread versions make sense to me. However, I expected rayon to take 1 second. My machine has 4 cores and hyper threading. This leads me to think that rayon internally limits the number of parallel threads according to the cores/threads that your machine supports. Is this correct?
Yes:
rayon::ThreadPoolBuilder::build_global():
Initializes the global thread pool. This initialization is optional. If you do not call this function, the thread pool will be automatically initialized with the default configuration.
rayon::ThreadPoolBuilder::num_threads():
If num_threads is 0, or you do not call this function, then the Rayon runtime will select the number of threads automatically. At present, this is based on the RAYON_NUM_THREADS environment variable (if set), or the number of logical CPUs (otherwise). In the future, however, the default behavior may change to dynamically add or remove threads as needed.

How do I use both cores on an RP2040 in Rust?

I have read that the RP2040 has two cores. How can I use the second core in a Rust program?
I do not need to go all the way to generic multithreading, I just want to have two threads, each of which owns one of the cores, and they can communicate with each other.
The Rust book's section about Fearless Concurrency (suggested by Jeremy) is not much help.
thread::spawn(|| {
let mut x = 0;
x = x + 1;
});
fails to compile
error[E0433]: failed to resolve: use of undeclared crate or module `thread`
--> src/main.rs:108:5
|
108 | thread::spawn(|| {
| ^^^^^^ use of undeclared crate or module `thread`
which is hardly surprising given that thread is part of std and the RP2040 is a #![no_std] environment.
In the C API there is a function multicore_launch_core1. Is there an equivalent Rust API?
As you have already discovered, the multi threading facilities of the Rust std library rely on the facilities of an OS kernel which are not available when working in a bare metal embedded environment.
The actual process of getting the second core to execute code is a little complex and low level. It is described in the RP2040 datasheet in the section titled "2.8.2. Launching Code On Processor Core 1".
In summary - after the second core boots up, it goes into a sleep state waiting for instructions to be sent to it over the SIO FIFO, which is a communications channel between the two cores. The instructions sent through provides an interrupt vector table, a stack pointer and an entry point for the core to begin executing.
Luckily, the rp2040_hal crate provides a higher level abstraction for this . The example below is from the multicore module of this crate:
use rp2040_hal::{pac, gpio::Pins, sio::Sio, multicore::{Multicore, Stack}};
static mut CORE1_STACK: Stack<4096> = Stack::new();
fn core1_task() -> ! {
loop {}
}
fn main() -> ! {
let mut pac = pac::Peripherals::take().unwrap();
let mut sio = Sio::new(pac.SIO);
// Other init code above this line
let mut mc = Multicore::new(&mut pac.PSM, &mut pac.PPB, &mut sio.fifo);
let cores = mc.cores();
let core1 = &mut cores[1];
let _test = core1.spawn(unsafe { &mut CORE1_STACK.mem }, core1_task);
// The rest of your application below this line
}
In the above example, the code within the core1_task function will be executed on the second core, while the first core continues to execute the main function. There are more complete examples in the crate's examples directory.
Disclaimer: I have not used this crate or microcontroller myself - all info was found from online documentation.

How can I do these operations in parallel?

So what I'm trying to do
use std::io::{self, Read, Write};
use std::thread;
use std::time::Duration;
use termion::color;
use termion::event::Key;
use termion::input::TermRead;
use termion::raw::IntoRawMode;
use chrono::{DateTime, TimeZone, Utc};
fn main() {
// Initialize stdios.
let stdout = io::stdout();
let stdout = stdout.lock();
let mut stdout = stdout.into_raw_mode().unwrap();
let stdin = termion::async_stdin();
let mut keys = stdin.keys();
let period = 30;
let mut scheduled_time = Utc::now().timestamp() + period;
loop {
let now = Utc::now().timestamp();
if now > scheduled_time {
foo(); // Do some operations,
// this function needs to be called in fixed period of time.
// eg. per 30 seconds or per 1 hour.
scheduled_time += period;
}
write!(stdout, "Log after foo is done\r\n").unwrap();
stdout.flush().unwrap();
thread::sleep(Duration::from_secs(period as u64 - 1)); // Wait for some fixed time to perform foo again.
// In this example it is 30 seconds.
// Check for input from user in parallel, by using termion's AsyncReader this does not block.
match keys.next() {
Some(Ok(Key::Char('q'))) => break,
_ => (),
}
}
}
First 8-10 lines is initializing stdios.
In my main loop. I want to call a function foo and do some operations.
But, it needs to be called in some period of time. That is why
I inserted thread::sleep function there. Because if I don't call thread::sleep,
It will constantly check condition and will not call foo.
And that causes 100% CPU usage all the time.
However sleeping caused another problem. Let's say that period was 1 hour. If the user wants
to quit the program in the middle of the sleep. It does not quit until the thread wakes (I guess).
I'm very unfamiliar with threads but I need some idea about how to do this.
I know a little about AsyncReader. I guess it creates a thread to not block main thread
while waiting for input.
The issue is that the line let mut stdout = stdout.into_raw_mode().unwrap(); is telling stdout to use raw mode, which disables the TTY device's processing of input characters. Commenting out that line (and making the previous stdout definition mutable) will allow a ^C interrupt to kill the process.
Even though it is the stdout device being put into raw mode, the file pointer for stdout and stdin are often pointing to the same file (unless they've been redirected), so putting stdout in raw mode seems to be affecting stdin as well. The TTY drivers in the OS are usually responsible for converting the ^C character into a SIGINT signal, which is sent to your process, which will normally terminate the process

How can I sum up using concurrency from 1 to 1000000 with Rust?

I am a newbie to Rust, and I want to sum up a large amount of numbers using concurrency. I found this code:
use std::thread;
use std::sync::{Arc, Mutex};
static NTHREAD: usize = 10;
fn main() {
let mut threads = Vec::new();
let x = 0;
// A thread-safe, sharable mutex object
let data = Arc::new(Mutex::new(x));
for i in 1..(NTHREAD+1) {
// Increment the count of the mutex
let mutex = data.clone();
threads.push(thread::spawn(move || {
// Lock the mutex
let n = mutex.lock();
match n {
Ok(mut n) => *n += i,
Err(str) => println!("{}", str)
}
}));
}
// Wait all threads ending
for thread in threads {
let _ = thread.join().unwrap();
}
assert_eq!(*data.lock().unwrap(), 55);
}
This works when the threads are 10, but does not work when the threads are larger than 20.
I think it should be fine in any number of threads.
Do I misunderstand something? Is there another way to sum up from 1 to 1000000 with concurrency?
There are several problems with the provided code.
thread::spawn creates an OS-level thread, which means the existing code cannot possibly scale to numbers up to a million as indicated in the title. That would require a million threads in parallel, where typical modern OS'es support up to a few thousands of threads at best. More constrained environments, such as embedded systems or virtual/paravirtual machines, allow much less than that; for example, the Rust playground appears to allow a maximum of 24 concurrent threads. Instead, one needs to create a fixed small number of threads, and carefully divide the work among them.
The function executing in each thread runs inside a lock, which effectively serializes the work done by the threads. Even if one could spawn arbitrarily many threads, the loop as written would execute no faster than what would be achieved by a single thread - and in practice it would be orders of magnitude slower because it would spend a lot of time on locking/unlocking of a heavily contended mutex.
One good way to approach this kind of problem while still managing threads manually is provided in the comment by Boiethios: if you have 4 threads, just sum 1..250k, 250k..500k, etc. in each thread and then sum up the return of the threaded functions.
Or is there another way to sum up from 1 to 1000000 with concurrency?
I would recommend using a higher-level library that encapsulates creation/pooling of worker threads and division of work among them. Rayon is an excellent one, providing a "parallel iteration" facility, which works like iteration, but automatically dividing up the work among multiple cores. Using Rayon, parallel summing of integers would look like this:
extern crate rayon;
use rayon::prelude::*;
fn main() {
let sum: usize = (1..1000001).collect::<Vec<_>>().par_iter().sum();
assert_eq!(sum, 500000500000);
}

Resources