How can I change the number of threads Rayon uses?

How can I change the number of threads Rayon uses? - multithreading

I'm using the Rayon library:
extern crate rayon;
const N: usize = 1_000_000_000;
const W: f64 = 1f64/(N as f64);
fn f(x: f64) -> f64 {
4.0/(1.0+x*x)
}
fn main() {
use rayon::prelude::*;
let sum : f64 = (0..N)
.into_par_iter()
.map(|i| f(W*((i as f64)+0.5)))
.sum::<f64>();
println!("pi = {}", W*sum);
}
I want to run this code using different number of threads: 1, 2, 3 and 4.
I have read the documentation about How many threads will Rayon spawn? which says:
By default, Rayon uses the same number of threads as the number of CPUs available. Note that on systems with hyperthreading enabled this equals the number of logical cores and not the physical ones.
If you want to alter the number of threads spawned, you can set the environmental variable RAYON_NUM_THREADS to the desired number of threads or use the ThreadPoolBuilder::build_global function method.
However, the steps are not clear for me. How can I do this on my Windows 10 PC?

Just include in fn main(). The num_threads accepts the number of threads.
rayon::ThreadPoolBuilder::new().num_threads(4).build_global().unwrap();

If you don't want to set a global, you can create a function called 'create_pool'. This helper function constructs a Rayon ThreadPool object from num_threads.
pub fn create_pool(num_threads: usize) -> Result<rayon::ThreadPool, YOURERRORENUM> {
match rayon::ThreadPoolBuilder::new()
.num_threads(num_threads)
.build()
{
Err(e) => Err(e.into()),
Ok(pool) => Ok(pool),
}
}
Then call your code from inside this create_pool. This will limit all Rayon functions to the num_threads you set.
[...]
create_pool(num_threads)?.install(|| {
YOURCODE
})?;
[...]
For more see https://towardsdatascience.com/nine-rules-for-writing-python-extensions-in-rust-d35ea3a4ec29

Related

Different Context or Waker for each Future in Async Rust

I am trying to understand how polling works in a Async Rust Future. Using this following code, I tried to run two futures Fut0 and Fut1, such that they interleave as following Fut0 -> Fut1 -> Fut0 -> Fut0.
extern crate futures; // 0.3.1
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll, Waker};
use std::cell::RefCell;
use std::rc::Rc;
use std::collections::HashMap;
use futures::executor::block_on;
use futures::future::join_all;
#[derive(Default, Debug)]
struct Fut {
id: usize,
step: usize,
wakers: Rc<RefCell<HashMap<usize, Waker>>>,
}
impl Future for Fut {
type Output = ();
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
self.step += 1;
println!("Fut{} at step {}", self.id, self.step);
{
let mut wakers = self.wakers.borrow_mut();
wakers.insert(self.id, cx.waker().clone());
}
{
let next_id = (self.id + self.step) % 2;
let wakers = self.wakers.borrow();
if let Some(w) = wakers.get(&next_id) {
println!("Waking up Fut{} from Fut{}", next_id, self.id);
w.wake_by_ref();
}
}
if self.step > 1 {
Poll::Ready(())
} else {
Poll::Pending
}
}
}
macro_rules! create_fut {
($i:ident, $e:expr, $w:expr) => (
let $i = Fut {
id: $e,
step: 0,
wakers: $w.clone(),
};
)
}
fn main() {
let wakers = Rc::new(RefCell::new(HashMap::new()));
create_fut!(fut0, 0, wakers);
create_fut!(fut1, 1, wakers);
block_on(join_all(vec![fut0, fut1]));
}
But they are always being polled in round robin fashion i.e. Fut0 -> Fut1 -> Fut0 -> Fut1 -> ....
Fut0 at step 1
Fut1 at step 1
Waking up Fut0 from Fut1
Fut0 at step 2
Waking up Fut0 from Fut0
Fut1 at step 2
Waking up Fut1 from Fut1
It seems, all of their Contexts are same, hence the Wakers for the each Futures are same too. So waking one of them wakes the other. Is it possible to have different Context(or Waker) for each future?

The method futures::future::join_all returns a future that polls the given futures in sequence, instead of in parallel. The way you should look at it, is that futures are nested and the executor will only have a reference to the top-most future that is scheduled (in this case the future returned by futures::future::join_all).
This means that when the join_all future is polled, it passes the context to the nested future its currently executing. Thereafter the join_all future will pass it to the next nested future and so on. Effectively using the same context for all nested futures. This can be verified by viewing the source code of the JoinAll future in the futures crate.
The block_on executor can only execute a single future at the time. Executors such as tokio that use thread pools can actually execute futures in parallel, and thus will use different contexts for different scheduled futures (But still the same one for JoinAll futures for the reasons described above).

How to create a pool of contigous memory across multiple structs in Rust?

I'm trying to create a set of structs in Rust that use a contiguous block of memory. E.g:
<------------ Memory Pool -------------->
[ Child | Child | Child | Child ]
These structs:
may each contain slices of the pool of different sizes
should allow access to their slice of the pool without any blocking operations once initialized (I intend to access them on an audio thread).
I'm very new to Rust but I'm well versed in C++ so the main hurdle thus far has been working with the ownership semantics - I'm guessing there's a trivial way to achieve this (without using unsafe) but the solution isn't clear to me. I've written a small (broken) example of what I'm trying to do:
pub struct Child<'a> {
pub slice: &'a mut [f32],
}
impl Child<'_> {
pub fn new<'a>(s: &mut [f32]) -> Child {
Child {
slice: s,
}
}
}
pub struct Parent<'a> {
memory_pool: Vec<f32>,
children: Vec<Child<'a>>,
}
impl Parent<'_> {
pub fn new<'a>() -> Parent<'a> {
const SIZE: usize = 100;
let p = vec![0f32; SIZE];
let mut p = Parent {
memory_pool: p,
children: Vec::new(),
};
// Two children using different parts of the memory pool:
let (lower_pool, upper_pool) = p.memory_pool.split_at_mut(SIZE / 2);
p.children = vec!{ Child::new(lower_pool), Child::new(upper_pool) };
return p; // ERROR - p.memory_pool is borrowed 2 lines earlier
}
}
I would prefer a solution that doesn't involve unsafe but I'm not entirely opposed to using it. Any suggestions would be very much appreciated, as would any corrections on how I'm (mis?)using Rust in my example.

Yes, it's currently impossible (or quite hard) in Rust to contain references into sibling data, for example, as you have here, a Vec and slices into that Vec as fields in the same struct. Depending on the architecture of your program, you might solve this by storing the original Vec at some higher-level of your code (for example, it could live on the stack in main() if you're not writing a library) and the slice references at some lower level in a way that the compiler can clearly infer it won't go out of scope before the Vec (doing it in main() after the Vec has been instantiated could work, for example).

This is the perfect use case for an arena allocator. There are quite a few. The following demonstration uses bumpalo:
//# bumpalo = "2.6.0"
use bumpalo::Bump;
use std::mem::size_of;
struct Child1(u32, u32);
struct Child2(f32, f32, f32);
fn main() {
let arena = Bump::new();
let c1 = arena.alloc(Child1(1, 2));
let c2 = arena.alloc(Child2(1.0, 2.0, 3.0));
let c3 = arena.alloc(Child1(10, 11));
// let's verify that they are indeed continuous in memory
let ptr1 = c1 as *mut _ as usize;
let ptr2 = c2 as *mut _ as usize;
let ptr3 = c3 as *mut _ as usize;
assert_eq!(ptr1 + size_of::<Child1>(), ptr2);
assert_eq!(ptr1 + size_of::<Child1>() + size_of::<Child2>(), ptr3);
}
There are caveats, too. The main concern is of course the alignment; there may be some padding between two consecutive allocations. It is up to you to make sure that doesn't gonna happen if it is a deal breaker.
The other is allocator specific. The bumpalo arena allocator used here, for example, doesn't drop object when itself gets deallocated.
Other than that, I do believe a higher level abstraction like this will benefit your project. Otherwise, it'll just be pointer manipulating c/c++ disguised as rust.

Why do my Futures not max out the CPU?

I am creating a few hundred requests to download the same file (this is a toy example). When I run the equivalent logic with Go, I get 200% CPU usage and return in ~5 seconds w/ 800 reqs. In Rust with only 100 reqs, it takes nearly 5 seconds and spawns 16 OS threads with 37% CPU utilization.
Why is there such a difference?
From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
From the perf data, it seems like I am only using 1 core despite the ThreadPoolExecutor.
extern crate curl;
extern crate fibers;
extern crate futures;
extern crate futures_cpupool;
use std::io::{Write, BufWriter};
use curl::easy::Easy;
use futures::future::*;
use std::fs::File;
use futures_cpupool::CpuPool;
fn make_file(x: i32, data: &mut Vec<u8>) {
let f = File::create(format!("./data/{}.txt", x)).expect("Unable to open file");
let mut writer = BufWriter::new(&f);
writer.write_all(data.as_mut_slice()).unwrap();
}
fn collect_request(x: i32, url: &str) -> Result<i32, ()> {
let mut data = Vec::new();
let mut easy = Easy::new();
easy.url(url).unwrap();
{
let mut transfer = easy.transfer();
transfer
.write_function(|d| {
data.extend_from_slice(d);
Ok(d.len())
})
.unwrap();
transfer.perform().unwrap();
}
make_file(x, &mut data);
Ok(x)
}
fn main() {
let url = "https://en.wikipedia.org/wiki/Immanuel_Kant";
let pool = CpuPool::new(16);
let output_futures: Vec<_> = (0..100)
.into_iter()
.map(|ind| {
pool.spawn_fn(move || {
let output = collect_request(ind, url);
output
})
})
.collect();
// println!("{:?}", output_futures.Item());
for i in output_futures {
i.wait().unwrap();
}
}
My equivalent Go code

From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
This is not correct. The documentation for CpuPool states, emphasis mine:
A thread pool intended to run CPU intensive work.
Downloading a file is not CPU-bound, it's IO-bound. All you have done is spin up many threads then told each thread to block while waiting for IO to complete.
Instead, use tokio-curl, which adapts the curl library to the Future abstraction. You can then remove the threadpool completely. This should drastically improve your throughput.

Is it safe to modify an Arc<Mutex<T>> from both a Rust thread and a foreign thread?

Are there any general rules, design documentation or something similar that explains how the Rust standard library deals with threads that were not spawned by std::thread?
I have a cdylib crate and want to use it from another language in a threaded manner:
use std::mem;
use std::sync::{Arc, Mutex};
use std::thread;
type jlong = usize;
type SharedData = Arc<Mutex<u32>>;
struct Foo {
data: SharedData,
}
#[no_mangle]
pub fn Java_com_example_Foo_init(shared_data: &SharedData) -> jlong {
let this = Box::into_raw(Box::new(Foo { data: shared_data.clone() }));
this as jlong
}
#[cfg(target_pointer_width = "32")]
unsafe fn jlong_to_pointer<T>(val: jlong) -> *mut T {
mem::transmute::<u32, *mut T>(val as u32)
}
#[cfg(target_pointer_width = "64")]
unsafe fn jlong_to_pointer<T>(val: jlong) -> *mut T {
mem::transmute::<jlong, *mut T>(val)
}
#[no_mangle]
pub fn Java_com_example_Foo_f(this: jlong) {
let mut this = unsafe { jlong_to_pointer::<Foo>(this).as_mut().unwrap() };
let data = this.data.clone();
let mut data = data.lock().unwrap();
*data = *data + 5;
}
specifically in
let shared_data = Arc::new(Mutex::new(5));
let foo = Java_com_example_Foo_init(&shared_data);
is it safe to modify shared_data from a thread spawned by thread::spawn if Java_com_example_Foo_f will be called from an unknown JVM thread?
Possible reason why it can be bad.

Yes. The issue you linked relates to librustrt, which was removed before Rust 1.0. RFC 230, which removed librustrt, specifically notes:
When embedding Rust code into other contexts -- whether calling from C code or embedding in high-level languages -- there is a fair amount of setup needed to provide the "runtime" infrastructure that libstd relies on. If libstd was instead bound to the native threading and I/O system, the embedding setup would be much simpler.
Additionally, see PR #19654 which implemented that RFC:
When using Rust in an embedded context, it should now be possible to call a Rust function directly as a C function with absolutely no setup, though in that case panics will cause the process to abort. In this regard, the C/Rust interface will look much like the C/C++ interface.
For current documentation, the Rustonomicon chapter on FFI's examples of Rust code to be called from C make use of libstd (including Mutex, I believe, though that's an implementation detail of println!) without any caveats relating to runtime setup.

thread '<main>' has overflowed its stack in Rust

I got an error trying this code, which realizes a simple linked list.
use std::rc::Rc;
use std::cell::RefCell;
struct Node {
a : Option<Rc<RefCell<Node>>>,
value: i32
}
impl Node {
fn new(value: i32) -> Rc<RefCell<Node>> {
let node = Node {
a: None,
value: value
};
Rc::new(RefCell::new(node))
}
}
fn main() {
let first = Node::new(0);
let mut t = first.clone();
for i in 1 .. 10_000
{
if t.borrow().a.is_none() {
t.borrow_mut().a = Some(Node::new(i));
}
if t.borrow().a.is_some() {
t = t.borrow().a.as_ref().unwrap().clone();
}
}
println!("Done!");
}
Why does it happen? Does this mean that Rust is not as safe as positioned?
UPD:
If I add this method, the program does not crash.
impl Drop for Node {
fn drop(&mut self) {
let mut children = mem::replace(&mut self.a, None);
loop {
children = match children {
Some(mut n) => mem::replace(&mut n.borrow_mut().a, None),
None => break,
}
}
}
}
But I am not sure that this is the right solution.

Does this mean that Rust is not as safe as positioned?
Rust is only safe against certain kinds of failures; specifically memory corrupting crashes, which are documented here: http://doc.rust-lang.org/reference.html#behavior-considered-undefined
Unfortunately there is a tendency to sometimes expect rust to be more robust against certain sorts of failures that are not memory corrupting. Specifically, you should read http://doc.rust-lang.org/reference.html#behavior-considered-undefined.
tldr; In rust, many things can cause a panic. A panic will cause the current thread to halt, performing shutdown operations.
This may superficially appear similar to a memory corrupting crash from other languages, but it is important to understand although it is an application failure, it is not a memory corrupting failure.
For example, you can treat panic's like exceptions by running actions in a different thread and gracefully handling failure when the thread panics (for whatever reason).
In this specific example, you're using up too much memory on the stack.
This simple example will also fail:
fn main() {
let foo:&mut [i8] = &mut [1i8; 1024 * 1024];
}
(On most rustc; depending on the stack size on that particularly implementation)
I would have thought that moving your allocations to the stack using Box::new() would fix it in this example...
use std::rc::Rc;
use std::cell::RefCell;
#[derive(Debug)]
struct Node {
a : Option<Box<Rc<RefCell<Node>>>>,
value: i32
}
impl Node {
fn new(value: i32) -> Box<Rc<RefCell<Node>>> {
let node = Node {
a: None,
value: value
};
Box::new(Rc::new(RefCell::new(node)))
}
}
fn main() {
let first = Node::new(0);
let mut t = first.clone();
for i in 1 .. 10000
{
if t.borrow().a.is_none() {
t.borrow_mut().a = Some(Node::new(i));
}
if t.borrow().a.is_some() {
let c:Box<Rc<RefCell<Node>>>;
{ c = t.borrow().a.as_ref().unwrap().clone(); }
t = c;
println!("{:?}", t);
}
}
println!("Done!");
}
...but it doesn't. I don't really understand why, but hopefully someone else can look at this and post a more authoritative answer about what exactly is causing stack exhaustion in your code.

For those who come here and are specifically interested in the case where the large struct is a contiguous chunk of memory (instead of a tree of boxes), I found this GitHub issue with further discussion, as well as a solution that worked for me:
https://github.com/rust-lang/rust/issues/53827
Vec's method into_boxed_slice() returns a Box<[T]>, and does not overflow the stack for me.
vec![-1; 3000000].into_boxed_slice()
A note of difference with the vec! macro and array expressions from the docs:
This will use clone to duplicate an expression, so one should be careful using this with types having a nonstandard Clone implementation.
There is also the with_capacity() method on Vec, which is shown in the into_boxed_slice() examples.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I change the number of threads Rayon uses? - multithreading

Just include in fn main(). The num_threads accepts the number of threads. rayon::ThreadPoolBuilder::new().num_threads(4).build_global().unwrap();

Related

Different Context or Waker for each Future in Async Rust

How to create a pool of contigous memory across multiple structs in Rust?

Why do my Futures not max out the CPU?

Is it safe to modify an Arc<Mutex<T>> from both a Rust thread and a foreign thread?

thread '<main>' has overflowed its stack in Rust

Categories

Resources