How to have seedable RNG in parallel in rust

How to have seedable RNG in parallel in rust - rust

I am learning rust by implementing a raytracer. I have a working prototype that is single threaded and I am trying to make it multithreaded.
In my code, I have a sampler which is basically a wrapper around StdRng::seed_from_u64(123) (this will change when I will add different types of samplers) that is mutable because of StdRNG. I need to have a repeatable behaviour that is why i am seeding the random number generator.
In my rendering loop I use the sampler in the following way
let mut sampler = create_sampler(&self.sampler_value);
let sample_count = sampler.sample_count();
println!("Rendering ...");
let progress_bar = get_progress_bar(image.size());
// Generate multiple rays for each pixel in the image
for y in 0..image.size_y {
for x in 0..image.size_x {
image[(x, y)] = (0..sample_count)
.into_iter()
.map(|_| {
let pixel = Vec2::new(x as f32, y as f32) + sampler.next2f();
let ray = self.camera.generate_ray(&pixel);
self.integrator.li(self, &mut sampler, &ray)
})
.sum::<Vec3>()
/ (sample_count as f32);
progress_bar.inc(1);
}
}
When I replace into_iter by par_into_iter the compiler tells me cannot borrow sampler as mutable, as it is a captured variable in a Fn closure
What should I do in this situation?
Thanks!
P.s. If it is of any use, this is the repo : https://github.com/jgsimard/rustrt

Even if Rust wasn't stopping you, you cannot just use a seeded PRNG with parallelism and get a reproducible result out.
Think about it this way: a PRNG with a certain seed/state produces a certain sequence of numbers. Reproducibility (determinism) requires not just that the numbers are the same, but that the way they are taken from the sequence is the same. But if you have multiple threads computing different pixels (different uses) which are racing with each other to fetch numbers from the single PRNG, then the pixels will fetch different numbers on different runs.
In order to get the determinism you want, you must deterministically choose which random number is used for which purpose.
One way to do this would be to make up an “image” of random numbers, computed sequentially, and pass that to the parallel loop. Then each ray has its own random number, which it can use as its seed for another PRNG that only that ray uses.
Another way that can be much more efficient and usable (because it doesn't require any sequentiality at all) is to use hash functions instead of PRNGs. Whenever you want a random number, use a hash function (like those which implement the std::hash::Hasher trait in Rust, but not necessarily the particular one std provides since it's not the fastest) to combine a bunch of information, like
the seed value
the pixel x and y location
which bounce or secondary ray of this pixel you're computing
into a single value which you can use as a pseudorandom number. This way, the “random” results are the same for the same circumstances (because you explicitly specfied that it should be computed from them) even if some other part of the program execution changes (whether that's a code change or a thread scheduling decision by the OS).

Your sampler is not thread-safe, if only because it is a &mut Sampler and mutable references cannot be shared between threads, obviously.
The easy thing would be to wrap it into an Arc<Mutex<Sampler>> and clone it to every closure. Something like (untested):
let sampler = Arc::new(Mutex::new(create_sampler(&self.sampler_value)));
//...
for y in 0..image.size_y {
for x in 0..image.size_x {
image[(x, y)] = (0..sample_count)
.par_into_iter()
.map({
let sampler = Arc::clone(sampler);
move |_| {
let mut sampler = sampler.lock().unwrap();
// use the sampler
}
})
.sum::<Vec3>() //...
But that may not be very efficient way because the mutex will be locked most of the time, and you will kill the paralellism. You may try locking/unlocking the mutex during the ray tracing and see if it improves.
The ideal solution would be to make the Sampler thread-safe and inner mutable, so that the next2f and friends do not need the &mut self part (Sampler::next2f(&self)). Again, the easiest way is having an internal mutex.
Or you can try going lock-less! I mean, your current implementation of that function is:
fn next2f(&mut self) -> Vec2 {
self.current_dimension += 2;
Vec2::new(self.rng.gen(), self.rng.gen())
}
You could replace the current_dimension with an AtomicI32 and the rng with a rand::thread_rng (also untested):
fn next2f(&self) -> Vec2 {
self.current_dimension.fetch_add(2, Ordering::SeqCst);
let mut rng = rand::thread_rng();
Vec2::new(rng.gen(), rng.gen())
}

This is the way I did it. I used this ressource : rust-random.github.io/book/guide-parallel.html. So I used ChaCha8Rng with the set_stream function to get seedable PRNG in parallel. I had to put the image[(x, y)] outside of the iterator because into_par_iter does not allow mutable borrow inside a closure. If you see something dumb in my solution, please tell me!
let size_x = image.size_x;
let img: Vec<Vec<Vec3>> = (0..image.size_y)
.into_par_iter()
.map(|y| {
(0..image.size_x)
.into_par_iter()
.map(|x| {
let mut rng = ChaCha8Rng::seed_from_u64(sampler.seed());
rng.set_stream((y * size_x + x) as u64);
let v = (0..sample_count)
.into_iter()
.map(|_| {
let pixel = Vec2::new(x as f32, y as f32) + sampler.next2f(&mut rng);
let ray = self.camera.generate_ray(&pixel);
self.integrator.li(self, &sampler, &mut rng, &ray)
})
.sum::<Vec3>()
/ (sample_count as f32);
progress_bar.inc(1);
v
}).collect()
}).collect();
for (y, row) in img.into_iter().enumerate() {
for (x, p) in row.into_iter().enumerate() {
image[(x, y)] = p;
}
}

Related

Is there a zero-copy way to find the intersection of an arbitrary number of sets?

Here is a simple example demonstrating what I'm trying to do:
use std::collections::HashSet;
fn main() {
let mut sets: Vec<HashSet<char>> = vec![];
let mut set = HashSet::new();
set.insert('a');
set.insert('b');
set.insert('c');
set.insert('d');
sets.push(set);
let mut set = HashSet::new();
set.insert('a');
set.insert('b');
set.insert('d');
set.insert('e');
sets.push(set);
let mut set = HashSet::new();
set.insert('a');
set.insert('b');
set.insert('f');
set.insert('g');
sets.push(set);
// Simple intersection of two sets
let simple_intersection = sets[0].intersection(&sets[1]);
println!("Intersection of 0 and 1: {:?}", simple_intersection);
let mut iter = sets.iter();
let base = iter.next().unwrap().clone();
let intersection = iter.fold(base, |acc, set| acc.intersection(set).map(|x| x.clone()).collect());
println!("Intersection of all: {:?}", intersection);
}
This solution uses fold to "accumulate" the intersection, using the first element as the initial value.
Intersections are lazy iterators which iterate through references to the involved sets. Since the accumulator has to have the same type as the first element, we have to clone each set's elements. We can't make a set of owned data from references without cloning. I think I understand this.
For example, this doesn't work:
let mut iter = sets.iter();
let mut base = iter.next().unwrap();
let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
println!("Intersection of all: {:?}", intersection);
error[E0277]: a value of type `&HashSet<char>` cannot be built from an iterator over elements of type `&char`
--> src/main.rs:41:73
|
41 | let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
| ^^^^^^^ value of type `&HashSet<char>` cannot be built from `std::iter::Iterator<Item=&char>`
|
= help: the trait `FromIterator<&char>` is not implemented for `&HashSet<char>`
Even understanding this, I still don't want to clone the data. In theory it shouldn't be necessary, I have the data in the original vector, I should be able to work with references. That would speed up my algorithm a lot. This is a purely academic pursuit, so I am interested in getting it to be as fast as possible.
To do this, I would need to accumulate in a HashSet<&char>s, but I can't do that because I can't intersect a HashSet<&char> with a HashSet<char> in the closure. So it seems like I'm stuck. Is there any way to do this?
Alternatively, I could make a set of references for each set in the vector, but that doesn't really seem much better. Would it even work? I might run into the same problem but with double references instead.
Finally, I don't actually need to retain the original data, so I'd be okay moving the elements into the accumulator set. I can't figure out how to make this happen, since I have to go through intersection which gives me references.
Are any of the above proposals possible? Is there some other zero copy solution that I'm not seeing?

Finally, I don't actually need to retain the original data.
This makes it really easy.
First, optionally sort the sets by size. Then:
let (intersection, others) = sets.split_at_mut(1);
let intersection = &mut intersection[0];
for other in others {
intersection.retain(|e| other.contains(e));
}

You can do it in a fully lazy way using filter and all:
sets[0].iter().filter (move |c| sets[1..].iter().all (|s| s.contains (c)))
Playground

Finally, I don't actually need to retain the original data, so I'd be okay moving the elements into the accumulator set.
The retain method will work perfectly for your requirements then:
fn intersection(mut sets: Vec<HashSet<char>>) -> HashSet<char> {
if sets.is_empty() {
return HashSet::new();
}
if sets.len() == 1 {
return sets.pop().unwrap();
}
let mut result = sets.pop().unwrap();
result.retain(|item| {
sets.iter().all(|set| set.contains(item))
});
result
}
playground

How can I use Rayon to split a big range into chunks of ranges and have each thread find within a chunk?

I am making a program that brute forces a password by parallelization. At the moment the password to crack is already available as plain text, I'm just attempting to brute force it anyway.
I have a function called generate_char_array() which, based on an integer seed, converts base and returns a u8 slice of characters to try and check. This goes through the alphabet first for 1 character strings, then 2, etc.
let found_string_index = (0..1e12 as u64).into_par_iter().find_any(|i| {
let mut array = [0u8; 20];
let bytes = generate_char_array(*i, &mut array);
return &password_bytes == &bytes;
});
With the found string index (or seed integer rather), I can generate the found string.
The problem is that the way Rayon parallelizes this for me is split the arbitrary large integer range into thread_count-large slices (e.g. for 4 threads, 0..2.5e11, 2.5e11..5e11 etc). This is not good, because the end of the range is for arbitrarily super large password lengths (10+, I don't know), whereas most passwords (including the fixed "zzzzz" I tend to try) are much shorter, and thus what I get is that the first thread does all the work, and the rest of the threads just waste time testing way too long passwords and synchronizing; getting actually slower than single thread performance as a result.
How could I instead split the arbitrary big range (doesn't have to have an end actually) into chunks of ranges and have each thread find within chunks? That would make the workers in different threads actually useful.

This goes through the alphabet first for 1 character strings, then 2
You wish to impose some sequencing on your data processing, but the whole point of Rayon is to go in parallel.
Instead, use regular iterators to sequentially go up in length and then use parallel iterators inside a specific length to quickly process all of the values of that length.
Since you haven't provided enough code for a runnable example, I've made this rough approximation to show the general shape of such a solution:
extern crate rayon;
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
use std::ops::RangeInclusive;
type Seed = u8;
const LENGTHS: RangeInclusive<usize> = 1..=3;
const SEEDS: RangeInclusive<Seed> = 0..=std::u8::MAX;
fn find<F>(test_password: F) -> Option<(usize, Seed)>
where
F: Fn(usize, Seed) -> bool + Sync,
{
// Rayon doesn't support RangeInclusive yet
let seeds: Vec<_> = SEEDS.collect();
// Step 1-by-1 through the lengths, sequentially
LENGTHS.flat_map(|length| {
// In parallel, investigate every value in this length
// This doesn't do that, but it shows how the parallelization
// would be introduced
seeds
.par_iter()
.find_any(|&&seed| test_password(length, seed))
.map(|&seed| (length, seed))
}).next()
}
fn main() {
let pass = find(|l, s| {
println!("{}, {}", l, s);
// Actually generate and check the password based on the search criteria
l == 3 && s == 250
});
println!("Found password length and seed: {:?}", pass);
}
This can "waste" a little time at the end of each length as the parallel threads spin down one-by-one before spinning back up for the next length, but that seems unlikely to be a primary concern.

This is a version of what I suggested in my comment.
The main loop is parallel and is only over the first byte of each attempt. For each first byte, do the full brute force search for the remainder.
let matched_bytes = (0 .. 0xFFu8).into_par_iter().filter_map(|n| {
let mut array = [0u8; 8];
// the first digit is always the same in this run
array[0] = n;
// The highest byte is 0 because it's provided from the outer loop
(0 ..= 0x0FFFFFFFFFFFFFFF as u64).into_iter().filter_map(|i| {
// pass a slice so that the first byte is not affected
generate_char_array(i, &mut array[1 .. 8]);
if &password_bytes[..] == &array[0 .. password_bytes.len()] {
Some(array.clone())
} else {
None
}
}).next()
}).find_any(|_| true);
println!("found = {:?}", matched_bytes);
Also, even for a brute force method, this is probably highly inefficient still!

If Rayon splits the slices as you described, then apply simple math to balance the password lengths:
let found_string_index = (0..max_val as u64).into_par_iter().find_any(|i| {
let mut array = [0u8; 20];
let v = i/span + (i%span) * num_cpu;
let bytes = generate_char_array(*v, &mut array);
return &password_bytes == &bytes;
});
The span value depends on the number of CPUs (the number of threads used by Rayon), in your case:
let num_cpu = 4;
let span = 2.5e11 as u64;
let max_val = span * num_cpu;
Note the performance of this approach is highly dependent on how Rayon performs the split of the sequence on parallel threads. Verify that it works as you reported in the question.

Why doesn't this compile - use of undeclared type name `thread::scoped`

I'm trying to get my head around Rust. I've got an alpha version of 1.
Here's the problem I'm trying to program: I have a vector of floats. I want to set up some threads asynchronously. Each thread should wait for the number of seconds specified by each element of the vector, and return the value of the element, plus 10. The results need to be in input order.
It's an artificial example, to be sure, but I wanted to see if I could implement something simple before moving onto more complex code. Here is my code so far:
use std::thread;
use std::old_io::timer;
use std::time::duration::Duration;
fn main() {
let mut vin = vec![1.4f64, 1.2f64, 1.5f64];
let mut guards: Vec<thread::scoped> = Vec::with_capacity(3);
let mut answers: Vec<f64> = Vec::with_capacity(3);
for i in 0..3 {
guards[i] = thread::scoped( move || {
let ms = (1000.0f64 * vin[i]) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", vin[i]);
answers[i] = 10.0f64 + (vin[i] as f64);
})};
for i in 0..3 {guards[i].join(); };
for i in 0..3 {println!("{}", vin[i]); }
}
So the input vector is [1.4, 1.2, 1.5], and I'm expecting the output vector to be [11.4, 11.2, 11.5].
There appear to be a number of problems with my code, but the first one is that I get a compilation error:
threads.rs:7:25: 7:39 error: use of undeclared type name `thread::scoped`
threads.rs:7 let mut guards: Vec<thread::scoped> = Vec::with_capacity(3);
^~~~~~~~~~~~~~
error: aborting due to previous error
There also seem to be a number of other problems, including using vin within a closure. Also, I have no idea what move does, other than the fact that every example I've seen seems to use it.

Your error is due to the fact that thread::scoped is a function, not a type. What you want is a Vec<T> where T is the result type of the function. Rust has a neat feature that helps you here: It automatically detects the correct type of your variables in many situations.
If you use
let mut guards = Vec::with_capacity(3);
the type of guards will be chosen when you use .push() the first time.
There also seem to be a number of other problems.
you are accessing guards[i] in the first for loop, but the length of the guards vector is 0. Its capacity is 3, which means that you won't have any unnecessary allocations as long as the vector never contains more than 3 elements. use guards.push(x) instead of guards[i] = x.
thread::scoped expects a Fn() -> T, so your closure can return an object. You get that object when you call .join(), so you don't need an answer-vector.
vin is moved to the closure. Therefore in the second iteration of the loop that creates your guards, vin isn't available anymore to be moved to the "second" closure. Every loop iteration creates a new closure.
i is moved to the closure. I have no idea what's going on there. But the solution is to let inval = vin[i]; outside the closure, and then use inval inside the closure. This also solves Point 3.
vin is mutable. Yet you never mutate it. Don't bind variables mutably if you don't need to.
vin is an array of f64. Therefore (vin[i] as f64) does nothing. Therefore you can simply use vin[i] directly.
join moves out of the guard. Since you cannot move out of an array, your cannot index into an array of guards and join the element at the specified index. What you can do is loop over the elements of the array and join each guard.
Basically this means: don't iterate over indices (for i in 1..3), but iterate over elements (for element in vector) whenever possible.
All of the above implemented:
use std::thread;
use std::old_io::timer;
use std::time::duration::Duration;
fn main() {
let vin = vec![1.4f64, 1.2f64, 1.5f64];
let mut guards = Vec::with_capacity(3);
for inval in vin {
guards.push(thread::scoped( move || {
let ms = (1000.0f64 * inval) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", inval);
10.0f64 + inval
}));
}
for guard in guards {
let answer = guard.join();
println!("{}", answer);
};
}

In supplement of Ker's answer: if you really need to mutate arrays within a thread, I suppose the most closest valid solution for your task will be something like this:
use std::thread::spawn;
use std::old_io::timer;
use std::sync::{Arc, Mutex};
use std::time::duration::Duration;
fn main() {
let vin = Arc::new(vec![1.4f64, 1.2f64, 1.5f64]);
let answers = Arc::new(Mutex::new(vec![0f64, 0f64, 0f64]));
let mut workers = Vec::new();
for i in 0..3 {
let worker_vin = vin.clone();
let worker_answers = answers.clone();
let worker = spawn( move || {
let ms = (1000.0f64 * worker_vin[i]) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", worker_vin[i]);
let mut answers = worker_answers.lock().unwrap();
answers[i] = 10.0f64 + (worker_vin[i] as f64);
});
workers.push(worker);
}
for worker in workers { worker.join().unwrap(); }
for answer in answers.lock().unwrap().iter() {
println!("{}", answer);
}
}
In order to share vectors between several threads, I have to prove, that these vectors outlive all of my threads. I cannot use just Vec, because it will be destroyed at the end of main block, and another thread could live longer, possibly accessing freed memory. So I took Arc reference counter, which guarantees, that my vectors will be destroyed only when the counter downs to zero.
Arc allows me to share read-only data. In order to mutate answers array, I should use some synchronize tools, like Mutex. That is how Rust prevents me to make data races.

How to make a vector of received size?

I have a vector data with size unknown at compile time. I want to create a new vector of the exact that size. These variants don't work:
let size = data.len();
let mut try1: Vec<u32> = vec![0 .. size]; //ah, you need compile-time constant
let mut try2: Vec<u32> = Vec::new(size); //ah, there is no constructors with arguments
I'm a bit frustrated - there is no any information in Rust API, book, reference or rustbyexample.com about how to do such simple base task with vector.
This solution works but I don't think it is good to do so, it is strange to generate elements one by one and I don't have need in any exact values of elements:
let mut temp: Vec<u32> = range(0u32, data.len() as u32).collect();

The recommended way of doing this is in fact to form an iterator and collect it to a vector. What you want is not precisely clear, however; if you want [0, 1, 2, …, size - 1], you would create a range and collect it to a vector:
let x = (0..size).collect::<Vec<_>>();
(range(0, size) is better written (0..size) now; the range function will be disappearing from the prelude soon.)
If you wish a vector of zeroes, you would instead write it thus:
let x = std::iter::repeat(0).take(size).collect::<Vec<_>>();
If you merely want to preallocate the appropriate amount of space but not push values onto the vector, Vec::with_capacity(capacity) is what you want.
You should also consider whether you need it to be a vector or whether you can work directly with the iterator.

You can use Vec::with_capacity() constructor followed by an unsafe set_len() call:
let n = 128;
let v: Vec<u32> = Vec::with_capacity(n);
unsafe { v.set_len(n); }
v[12] = 64; // won't panic
This way the vector will "extend" over the uninitialized memory. If you're going to use it as a buffer it is a valid approach, as long as the type of elements is Copy (primitives are ok, but it will break horribly if the type has a destructor).

How to reference multiple members in an array?

I have an array of two players. I have a variable, current_num which is equal to which player in the array is the current player. I have a while loop which iterates through the main game logic where sometimes current_num is updated, sometimes it stays the same. I would like to assigned a current_player and next_player variable each iteration of the loop as like so:
while !self.board.check_win() {
let ref mut current_player = self.players[current_num];
let ref mut next_player = self.players[(current_num+1)%2];
/* Game Logic */
}
This doesn't work because I try to borrow something from self.players[..] twice. I honestly don't even need the next_player variable if I could somehow store the next player inside the first player object, but you can't create cyclic data structures in rust it seems. I fought so hard with the compiler to accomplish the following:
player1.next = &player2;
player2.next = &player1;
Unfortunately that doesn't seem to be possible.... If it is, I would rather do that so that I could do something along the lines of:
current_player.next().do_something();
instead of needing a next_player variable. I would also be able to do:
current_player = current_player.next();
for switching to the next player so I wouldn't even have to keep an index variable (current_num).
Now I do have a working mode where I always refer to the current player as:
self.players[current_num].do_something() //current_player
self.players[(current_num+1)%2).do_something() //next_player
This avoids the borrowing issues, but makes for VERY verbose code that's hard to read. C/C++ are really much easier with regards to getting this kind of design working. I feel like I'm constantly fighting the compiler to get what I want done...
Any help would be greatly appreciated!

To solve your immediate problem you can use the mut_split_at method, this uses unsafe code internally to give you two disjoint slices into a vector, resolving all your borrowing issues. You might write a wrapper like:
fn double_index<'a, T>(x: &'a mut [T],
i: uint, j: uint) -> (&'a mut T, &'a mut T) {
assert!(i != j, "cannot double_index with equal indices");
if i < j {
let (low, hi) = x.mut_split_at(j);
(&mut low[i], &mut hi[0])
} else { // i > j
let (low, hi) = x.mut_split_at(i);
(&mut hi[0], &mut low[j])
}
}
then write
let (current_player, next_player) = double_index(self.players,
current_num,
(current_num + 1) % 2);
(Assuming self.players is a [Player, .. 2] or &mut [Player]. If it is a Vec<Player> you will need to call .as_mut_slice() explicitly.)
you can't create cyclic data structures in rust it seems
You can, using Rc and Weak. For shared mutability, you'll need to use RefCell or Cell. E.g.
use std::rc::{Rc, Weak};
use std::cell::RefCell;
struct Player {
// ...
next: Option<Weak<RefCell<Player>>>
}
impl Player {
fn next(&self) -> Rc<RefCell<Player>> {
// upgrade goes from Weak to Rc & fails if this is invalid
self.next.unwrap().upgrade()
}
}
let player1 = Rc::new(RefCell::new(Player { ..., next: None }));
let player2 = Rc::new(RefCell::new(Player { ..., next: None }));
// join them up; downgrade goes from Rc to Weak
player1.borrow_mut().next = Some(player2.downgrade());
player2.borrow_mut().next = Some(player1.downgrade());
This avoids the borrowing issues, but makes for VERY verbose code that's hard to read. C/C++ are really much easier with regards to getting this kind of design working. I feel like I'm constantly fighting the compiler to get what I want done...
Any sort of shared mutability is very easy to get wrong; and in languages without a GC, this can lead to dangling pointers, segfaults & security holes. Unfortunately sealing all the holes in the manner that Rust does leads to this type of thing being rather ugly at times.

Alternatively:
use std::cell::RefCell;
struct Player;
fn main() {
let players = Vec::from_fn(3, |_| RefCell::new(Player));
let mut player_1 = players.get(0).borrow_mut();
let mut player_2 = players.get(1).borrow_mut();
//Happily mutate both players from here on
}
Normally mutating an object which has been borrowed multiple times isn't allowed. You can't have multiple &mut references to the same object, but multiple & references are allowed because they don't allow mutation. Since Cell and RefCell have internal mutability, we can borrow them via & reference while still mutating their contents.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string