Why do my Futures not max out the CPU? - multithreading

I am creating a few hundred requests to download the same file (this is a toy example). When I run the equivalent logic with Go, I get 200% CPU usage and return in ~5 seconds w/ 800 reqs. In Rust with only 100 reqs, it takes nearly 5 seconds and spawns 16 OS threads with 37% CPU utilization.
Why is there such a difference?
From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
From the perf data, it seems like I am only using 1 core despite the ThreadPoolExecutor.
extern crate curl;
extern crate fibers;
extern crate futures;
extern crate futures_cpupool;
use std::io::{Write, BufWriter};
use curl::easy::Easy;
use futures::future::*;
use std::fs::File;
use futures_cpupool::CpuPool;
fn make_file(x: i32, data: &mut Vec<u8>) {
let f = File::create(format!("./data/{}.txt", x)).expect("Unable to open file");
let mut writer = BufWriter::new(&f);
writer.write_all(data.as_mut_slice()).unwrap();
}
fn collect_request(x: i32, url: &str) -> Result<i32, ()> {
let mut data = Vec::new();
let mut easy = Easy::new();
easy.url(url).unwrap();
{
let mut transfer = easy.transfer();
transfer
.write_function(|d| {
data.extend_from_slice(d);
Ok(d.len())
})
.unwrap();
transfer.perform().unwrap();
}
make_file(x, &mut data);
Ok(x)
}
fn main() {
let url = "https://en.wikipedia.org/wiki/Immanuel_Kant";
let pool = CpuPool::new(16);
let output_futures: Vec<_> = (0..100)
.into_iter()
.map(|ind| {
pool.spawn_fn(move || {
let output = collect_request(ind, url);
output
})
})
.collect();
// println!("{:?}", output_futures.Item());
for i in output_futures {
i.wait().unwrap();
}
}
My equivalent Go code

From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
This is not correct. The documentation for CpuPool states, emphasis mine:
A thread pool intended to run CPU intensive work.
Downloading a file is not CPU-bound, it's IO-bound. All you have done is spin up many threads then told each thread to block while waiting for IO to complete.
Instead, use tokio-curl, which adapts the curl library to the Future abstraction. You can then remove the threadpool completely. This should drastically improve your throughput.

Related

How can I execute an action after each end of thread?

In Rust, I would like to do multiple tasks in parallel and when each task finishes, I would like to do another task handled by the main process.
I know that tasks will finish at different timings, and I don't want to wait for all the tasks to do the next task.
I've tried doing multiple threads handled by the main process but I have to wait for all the threads to finish before doing another action or maybe I did not understand.
for handle in handles {
handle.join().unwrap();
}
How can I manage to do a task handled by the main process after each end of threads without blocking the whole main thread?
Here is a diagram to explain what I want to do :
If i'm not clear or if you have a better idea to handle my problem, don't mind to tell me!
Here's an example how to implement this using FuturesUnordered and Tokio:
use futures::{stream::FuturesUnordered, StreamExt};
use tokio::time::sleep;
use std::{time::Duration, future::ready};
#[tokio::main]
async fn main() {
let tasks = FuturesUnordered::new();
tasks.push(some_task(1000));
tasks.push(some_task(2000));
tasks.push(some_task(500));
tasks.push(some_task(1500));
tasks.for_each(|result| {
println!("Task finished after {} ms.", result);
ready(())
}).await;
}
async fn some_task(delay_ms: u64) -> u64 {
sleep(Duration::from_millis(delay_ms)).await;
delay_ms
}
If you run this code, you can see that the closure passed to for_each() is executed immediately whenever a task finishes, even though they don't finish in the order they were created.
Note that Tokio takes care of scheduling the tasks to different threads for you. By default, there will be one thread per CPU core.
To compile this, you need to add this to your Cargo.toml file:
[dependencies]
futures = "0.3"
tokio = { version = "1", features = ["full"] }
If you want to add some proper error propagation, the code becomes only slightly more complex – most of the added code is for the custom error type:
use futures::{stream::FuturesUnordered, TryStreamExt};
use tokio::time::sleep;
use std::{time::Duration, future::ready};
#[tokio::main]
async fn main() -> Result<(), MyError> {
let tasks = FuturesUnordered::new();
tasks.push(some_task(1000));
tasks.push(some_task(2000));
tasks.push(some_task(500));
tasks.push(some_task(1500));
tasks.try_for_each(|result| {
println!("Task finished after {} ms.", result);
ready(Ok(()))
}).await
}
async fn some_task(delay_ms: u64) -> Result<u64, MyError> {
sleep(Duration::from_millis(delay_ms)).await;
Ok(delay_ms)
}
#[derive(Debug)]
struct MyError {}
impl std::fmt::Display for MyError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "MyError occurred")
}
}
impl std::error::Error for MyError {}

How can I change the number of threads Rayon uses?

I'm using the Rayon library:
extern crate rayon;
const N: usize = 1_000_000_000;
const W: f64 = 1f64/(N as f64);
fn f(x: f64) -> f64 {
4.0/(1.0+x*x)
}
fn main() {
use rayon::prelude::*;
let sum : f64 = (0..N)
.into_par_iter()
.map(|i| f(W*((i as f64)+0.5)))
.sum::<f64>();
println!("pi = {}", W*sum);
}
I want to run this code using different number of threads: 1, 2, 3 and 4.
I have read the documentation about How many threads will Rayon spawn? which says:
By default, Rayon uses the same number of threads as the number of CPUs available. Note that on systems with hyperthreading enabled this equals the number of logical cores and not the physical ones.
If you want to alter the number of threads spawned, you can set the environmental variable RAYON_NUM_THREADS to the desired number of threads or use the ThreadPoolBuilder::build_global function method.
However, the steps are not clear for me. How can I do this on my Windows 10 PC?
Just include in fn main(). The num_threads accepts the number of threads.
rayon::ThreadPoolBuilder::new().num_threads(4).build_global().unwrap();
If you don't want to set a global, you can create a function called 'create_pool'. This helper function constructs a Rayon ThreadPool object from num_threads.
pub fn create_pool(num_threads: usize) -> Result<rayon::ThreadPool, YOURERRORENUM> {
match rayon::ThreadPoolBuilder::new()
.num_threads(num_threads)
.build()
{
Err(e) => Err(e.into()),
Ok(pool) => Ok(pool),
}
}
Then call your code from inside this create_pool. This will limit all Rayon functions to the num_threads you set.
[...]
create_pool(num_threads)?.install(|| {
YOURCODE
})?;
[...]
For more see https://towardsdatascience.com/nine-rules-for-writing-python-extensions-in-rust-d35ea3a4ec29

Different Context or Waker for each Future in Async Rust

I am trying to understand how polling works in a Async Rust Future. Using this following code, I tried to run two futures Fut0 and Fut1, such that they interleave as following Fut0 -> Fut1 -> Fut0 -> Fut0.
extern crate futures; // 0.3.1
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll, Waker};
use std::cell::RefCell;
use std::rc::Rc;
use std::collections::HashMap;
use futures::executor::block_on;
use futures::future::join_all;
#[derive(Default, Debug)]
struct Fut {
id: usize,
step: usize,
wakers: Rc<RefCell<HashMap<usize, Waker>>>,
}
impl Future for Fut {
type Output = ();
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
self.step += 1;
println!("Fut{} at step {}", self.id, self.step);
{
let mut wakers = self.wakers.borrow_mut();
wakers.insert(self.id, cx.waker().clone());
}
{
let next_id = (self.id + self.step) % 2;
let wakers = self.wakers.borrow();
if let Some(w) = wakers.get(&next_id) {
println!("Waking up Fut{} from Fut{}", next_id, self.id);
w.wake_by_ref();
}
}
if self.step > 1 {
Poll::Ready(())
} else {
Poll::Pending
}
}
}
macro_rules! create_fut {
($i:ident, $e:expr, $w:expr) => (
let $i = Fut {
id: $e,
step: 0,
wakers: $w.clone(),
};
)
}
fn main() {
let wakers = Rc::new(RefCell::new(HashMap::new()));
create_fut!(fut0, 0, wakers);
create_fut!(fut1, 1, wakers);
block_on(join_all(vec![fut0, fut1]));
}
But they are always being polled in round robin fashion i.e. Fut0 -> Fut1 -> Fut0 -> Fut1 -> ....
Fut0 at step 1
Fut1 at step 1
Waking up Fut0 from Fut1
Fut0 at step 2
Waking up Fut0 from Fut0
Fut1 at step 2
Waking up Fut1 from Fut1
It seems, all of their Contexts are same, hence the Wakers for the each Futures are same too. So waking one of them wakes the other. Is it possible to have different Context(or Waker) for each future?
The method futures::future::join_all returns a future that polls the given futures in sequence, instead of in parallel. The way you should look at it, is that futures are nested and the executor will only have a reference to the top-most future that is scheduled (in this case the future returned by futures::future::join_all).
This means that when the join_all future is polled, it passes the context to the nested future its currently executing. Thereafter the join_all future will pass it to the next nested future and so on. Effectively using the same context for all nested futures. This can be verified by viewing the source code of the JoinAll future in the futures crate.
The block_on executor can only execute a single future at the time. Executors such as tokio that use thread pools can actually execute futures in parallel, and thus will use different contexts for different scheduled futures (But still the same one for JoinAll futures for the reasons described above).

Is it safe to modify an Arc<Mutex<T>> from both a Rust thread and a foreign thread?

Are there any general rules, design documentation or something similar that explains how the Rust standard library deals with threads that were not spawned by std::thread?
I have a cdylib crate and want to use it from another language in a threaded manner:
use std::mem;
use std::sync::{Arc, Mutex};
use std::thread;
type jlong = usize;
type SharedData = Arc<Mutex<u32>>;
struct Foo {
data: SharedData,
}
#[no_mangle]
pub fn Java_com_example_Foo_init(shared_data: &SharedData) -> jlong {
let this = Box::into_raw(Box::new(Foo { data: shared_data.clone() }));
this as jlong
}
#[cfg(target_pointer_width = "32")]
unsafe fn jlong_to_pointer<T>(val: jlong) -> *mut T {
mem::transmute::<u32, *mut T>(val as u32)
}
#[cfg(target_pointer_width = "64")]
unsafe fn jlong_to_pointer<T>(val: jlong) -> *mut T {
mem::transmute::<jlong, *mut T>(val)
}
#[no_mangle]
pub fn Java_com_example_Foo_f(this: jlong) {
let mut this = unsafe { jlong_to_pointer::<Foo>(this).as_mut().unwrap() };
let data = this.data.clone();
let mut data = data.lock().unwrap();
*data = *data + 5;
}
specifically in
let shared_data = Arc::new(Mutex::new(5));
let foo = Java_com_example_Foo_init(&shared_data);
is it safe to modify shared_data from a thread spawned by thread::spawn if Java_com_example_Foo_f will be called from an unknown JVM thread?
Possible reason why it can be bad.
Yes. The issue you linked relates to librustrt, which was removed before Rust 1.0. RFC 230, which removed librustrt, specifically notes:
When embedding Rust code into other contexts -- whether calling from C code or embedding in high-level languages -- there is a fair amount of setup needed to provide the "runtime" infrastructure that libstd relies on. If libstd was instead bound to the native threading and I/O system, the embedding setup would be much simpler.
Additionally, see PR #19654 which implemented that RFC:
When using Rust in an embedded context, it should now be possible to call a Rust function directly as a C function with absolutely no setup, though in that case panics will cause the process to abort. In this regard, the C/Rust interface will look much like the C/C++ interface.
For current documentation, the Rustonomicon chapter on FFI's examples of Rust code to be called from C make use of libstd (including Mutex, I believe, though that's an implementation detail of println!) without any caveats relating to runtime setup.

Running interruptible Rust program that spawns threads

I am trying to write a program that spawns a bunch of threads and then joins the threads at the end. I want it to be interruptible, because my plan is to make this a constantly running program in a UNIX service.
The idea is that worker_pool will contain all the threads that have been spawned, so terminate can be called at any time to collect them.
I can't seem to find a way to utilize the chan_select crate to do this, because this requires I spawn a thread first to spawn my child threads, and once I do this I can no longer use the worker_pool variable when joining the threads on interrupt, because it had to be moved out for the main loop. If you comment out the line in the interrupt that terminates the workers, it compiles.
I'm a little frustrated, because this would be really easy to do in C. I could set up a static pointer, but when I try and do that in Rust I get an error because I am using a vector for my threads, and I can't initialize to an empty vector in a static. I know it is safe to join the workers in the interrupt code, because execution stops here waiting for the signal.
Perhaps there is a better way to do the signal handling, or maybe I'm missing something that I can do.
The error and code follow:
MacBook8088:video_ingest pjohnson$ cargo run
Compiling video_ingest v0.1.0 (file:///Users/pjohnson/projects/video_ingest)
error[E0382]: use of moved value: `worker_pool`
--> src/main.rs:30:13
|
24 | thread::spawn(move || run(sdone, &mut worker_pool));
| ------- value moved (into closure) here
...
30 | worker_pool.terminate();
| ^^^^^^^^^^^ value used here after move
<chan macros>:42:47: 43:23 note: in this expansion of chan_select! (defined in <chan macros>)
src/main.rs:27:5: 35:6 note: in this expansion of chan_select! (defined in <chan macros>)
|
= note: move occurs because `worker_pool` has type `video_ingest::WorkerPool`, which does not implement the `Copy` trait
main.rs
#[macro_use]
extern crate chan;
extern crate chan_signal;
extern crate video_ingest;
use chan_signal::Signal;
use video_ingest::WorkerPool;
use std::thread;
use std::ptr;
///
/// Starts processing
///
fn main() {
let mut worker_pool = WorkerPool { join_handles: vec![] };
// Signal gets a value when the OS sent a INT or TERM signal.
let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
// When our work is complete, send a sentinel value on `sdone`.
let (sdone, rdone) = chan::sync(0);
// Run work.
thread::spawn(move || run(sdone, &mut worker_pool));
// Wait for a signal or for work to be done.
chan_select! {
signal.recv() -> signal => {
println!("received signal: {:?}", signal);
worker_pool.terminate(); // <-- Comment out to compile
},
rdone.recv() => {
println!("Program completed normally.");
}
}
}
fn run(sdone: chan::Sender<()>, worker_pool: &mut WorkerPool) {
loop {
worker_pool.ingest();
worker_pool.terminate();
}
}
lib.rs
extern crate libc;
use std::thread;
use std::thread::JoinHandle;
use std::os::unix::thread::JoinHandleExt;
use libc::pthread_join;
use libc::c_void;
use std::ptr;
use std::time::Duration;
pub struct WorkerPool {
pub join_handles: Vec<JoinHandle<()>>
}
impl WorkerPool {
///
/// Does the actual ingestion
///
pub fn ingest(&mut self) {
// Use 9 threads for an example.
for i in 0..10 {
self.join_handles.push(
thread::spawn(move || {
// Get the videos
println!("Getting videos for thread {}", i);
thread::sleep(Duration::new(5, 0));
})
);
}
}
///
/// Joins all threads
///
pub fn terminate(&mut self) {
println!("Total handles: {}", self.join_handles.len());
for handle in &self.join_handles {
println!("Joining thread...");
unsafe {
let mut state_ptr: *mut *mut c_void = 0 as *mut *mut c_void;
pthread_join(handle.as_pthread_t(), state_ptr);
}
}
self.join_handles = vec![];
}
}
terminate can be called at any time to collect them.
I don't want to stop the threads; I want to collect them with join. I agree stopping them would not be a good idea.
These two statements don't make sense to me. You can only join a thread when it's complete. The word "interruptible" and "at any time" would mean that you could attempt to stop a thread while it is still doing some processing. Which behavior do you want?
If you want to be able to stop a thread that has partially completed, you have to enhance your code to check if it should exit early. This is usually complicated by the fact that you are doing some big computation that you don't have control over. Ideally, you break that up into chunks and check your exit flag frequently. For example, with video work, you could check every frame. Then the response delay is roughly the time to process a frame.
this would be really easy to do in C.
This would be really easy to do incorrectly. For example, the code currently presented attempts to perform mutation to the pool from two different threads without any kind of synchronization. That's a sure-fire recipe to make broken, hard-to-debug code.
// Use 9 threads for an example.
0..10 creates 10 threads.
Anyway, it seems like the missing piece of knowledge is Arc and Mutex. Arc allows sharing ownership of a single item between threads, and Mutex allows for run-time mutable borrowing between threads.
#[macro_use]
extern crate chan;
extern crate chan_signal;
use chan_signal::Signal;
use std::thread::{self, JoinHandle};
use std::sync::{Arc, Mutex};
fn main() {
let worker_pool = Arc::new(Mutex::new(WorkerPool::new()));
let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
let (work_done_tx, work_done_rx) = chan::sync(0);
let worker_pool_clone = worker_pool.clone();
thread::spawn(move || run(work_done_tx, worker_pool_clone));
// Wait for a signal or for work to be done.
chan_select! {
signal.recv() -> signal => {
println!("received signal: {:?}", signal);
let mut pool = worker_pool.lock().expect("Unable to lock the pool");
pool.terminate();
},
work_done_rx.recv() => {
println!("Program completed normally.");
}
}
}
fn run(_work_done_tx: chan::Sender<()>, worker_pool: Arc<Mutex<WorkerPool>>) {
loop {
let mut worker_pool = worker_pool.lock().expect("Unable to lock the pool");
worker_pool.ingest();
worker_pool.terminate();
}
}
pub struct WorkerPool {
join_handles: Vec<JoinHandle<()>>,
}
impl WorkerPool {
pub fn new() -> Self {
WorkerPool {
join_handles: vec![],
}
}
pub fn ingest(&mut self) {
self.join_handles.extend(
(0..10).map(|i| {
thread::spawn(move || {
println!("Getting videos for thread {}", i);
})
})
)
}
pub fn terminate(&mut self) {
for handle in self.join_handles.drain(..) {
handle.join().expect("Unable to join thread")
}
}
}
Beware that the program logic itself is still poor; even though an interrupt is sent, the loop in run continues to execute. The main thread will lock the mutex, join all the current threads1, unlock the mutex and exit the program. However, the loop can lock the mutex before the main thread has exited and start processing some new data! And then the program exits right in the middle of processing. It's almost the same as if you didn't handle the interrupt at all.
1: Haha, tricked you! There are no running threads at that point. Since the mutex is locked for the entire loop, the only time another lock can be made is when the loop is resetting. However, since the last instruction in the loop is to join all the threads, there won't be anymore running.
I don't want to let the program terminate before all threads have completed.
Perhaps it's an artifact of the reduced problem, but I don't see how the infinite loop can ever exit, so the "I'm done" channel seems superfluous.
I'd probably just add a flag that says "please stop" when an interrupt is received. Then I'd check that instead of the infinite loop and wait for the running thread to finish before exiting the program.
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
let worker_pool = WorkerPool::new();
let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
let please_stop = Arc::new(AtomicBool::new(false));
let threads_please_stop = please_stop.clone();
let runner = thread::spawn(|| run(threads_please_stop, worker_pool));
// Wait for a signal
chan_select! {
signal.recv() -> signal => {
println!("received signal: {:?}", signal);
please_stop.store(true, Ordering::SeqCst);
},
}
runner.join().expect("Unable to join runner thread");
}
fn run(please_stop: Arc<AtomicBool>, mut worker_pool: WorkerPool) {
while !please_stop.load(Ordering::SeqCst) {
worker_pool.ingest();
worker_pool.terminate();
}
}

Resources