How can I share immutable data between threads in Rust? - multithreading

I'm working on a ray tracer in Rust as a way to learn the language, and the single-threaded version works fine. I want to speed it up by multithreading it, and multithreading a raytracer in C/C++ is relatively easy because most of the data shared is read-only (the only problems occur when writing the pixel data). But I'm having a lot more trouble with Rust and its ownership rules.
I have a trait Hittable: Send + Sync for the different types of things (spheres, meshes) that can get hit in the world, and I left the implementation for Send and Sync blank because I don't actually need either of them. And then I have a vec of the world objects of type Vec<Box<dyn Hittable>>. For the actual multithreading, I'm trying something like this:
let pixels_mutex: Arc<Mutex<Vec<Vec<(f64, f64, f64, u32)>>>> = Arc::new(Mutex::new(pixels));
let vec_arc: Arc<Vec<Box<dyn Hittable>>> = Arc::new(vec);
let mut thread_vec: Vec<thread::JoinHandle<()>> = Vec::new();
for _ in 0..NUM_THREADS {
let camera_clone = camera.clone();
thread_vec.push(thread::spawn(move || {
for r in 0..RAYS_PER_THREAD {
if r % THREAD_UPDATE == 0 {
println!("Thread drawing ray {} of {} ({:.2}%)", r, RAYS_PER_THREAD, (r as f64 * 100.) / (RAYS_PER_THREAD as f64));
}
let u: f64 = util::rand();
let v: f64 = util::rand();
let ray = camera_clone.get_ray(u, v);
let res = geometry::thread_safe_cast_ray(&ray, &vec_arc, MAX_DEPTH);
let i = (u * IMAGE_WIDTH as f64).floor() as usize;
let j = (v * IMAGE_HEIGHT as f64).floor() as usize;
util::thread_safe_increment_color(&pixels_mutex, j, i, &res);
}
}));
}
for handle in thread_vec {
handle.join().unwrap();
}
I have the thread_safe_increment_color implemented and that seems fine, but I'm holding off on doing thread_safe_cast_ray until I get this loop working. The problem I'm running into with this code is that each thread tries to move vec_arc into its closure, which violates the ownership rule. I tried making a clone of vec_arc like I did with camera, but the compiler wouldn't let me, which I think is because my Hittable trait doesn't require the Copy or Clone trait. And my structs that implement Hittable can't simply derive(Copy, Clone) because they contain a Box<dyn Material>, where Material is another trait that represents the material of the object.
I thought this would be way easier since I know most of the data (other than pixels_mutex) is read-only. How can I share vec_arc (and for that matter, pixels_mutex) between the threads I'm creating?

You should clone vec_arc like you do with camera. Since it is an Arc, cloning it is cheap and doesn't require the Vec and its elements to be cloneable. You just need to ensure that you clone the smart pointer and not the vector, and use the associated function:
let vec_arc = Arc::clone(&vec_arc);
Another option is to use scoped threads, in which case you don't need to use Arc at all, you can simply make the closure non-move and access the local variables directly. Your code would then look like this:
crossbeam::scope(|s| {
for _ in 0..NUM_THREADS {
let camera_clone = camera.clone();
thread_vec.push(s.spawn(|_| {
// your code here, using `vec` directly instead of `vec_arc`
...
}).unwrap());
}
}).unwrap(); // here threads are joined automatically

Related

Rust mutable container of immutable elements?

With Rust, is it in general possible to have a mutable container of immutable values?
Example:
struct TestStruct { value: i32 }
fn test_fn()
{
let immutable_instance = TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
Here, my TestStruct instance is wrapped in two containers: a Box, then a Vec. From the perspective of a new Rust user it's surprising that moving the Box into the Vec makes both the Box and the TestStruct instance mutable.
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Is there a similar construct whereby the boxed value is immutable, but the container of boxes is mutable? More generally, is it possible to have multiple "layers" of containers without the whole tree being either mutable or immutable?
Not really. You could easily create one (just create a wrapper object which implements Deref but not DerefMut), but the reality is that Rust doesn't really see (im)mutability that way, because its main concern is controlling sharing / visibility.
After all, for an external observer what difference is there between
mutable_vector[0].value = 456;
and
mutable_vector[0] = Box::new(TestStruct{value: 456});
?
None is the answer, because Rust's ownership system means it's not possible for an observer to have kept a handle on the original TestStruct, thus they can't know whether that structure was replaced or modified in place[1][2].
If you want to secure your internal state, use visibility instead: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8a9346072b32cedcf2fccc0eeb9f55c5
mod foo {
pub struct TestStruct { value: i32 }
impl TestStruct {
pub fn new(value: i32) -> Self { Self { value } }
}
}
fn test_fn() {
let immutable_instance = foo::TestStruct{value: 123};
let immutable_box = Box::new(immutable_instance);
let mut mutable_vector = vec!(immutable_box);
mutable_vector[0].value = 456;
}
does not compile because from the point of view of test_fn, TestStruct::value is not accessible. Therefore test_fn has no way to mutate a TestStruct unless you add an &mut method on it.
[1]: technically they could check the address in memory and that might tell them, but even then it's not a sure thing (in either direction) hence pinning being a thing.
[2]: this observability distinction is also embraced by other languages, for instance the Clojure language largely falls on the "immutable all the things" side, however it has a concept of transients which allow locally mutable objects

How to create a Box<UnsafeCell<[T]>>

The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.
The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.
Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground

Extract original slice from SliceStorage and SliceStorageMut

I am working on some software where I am managing a buffer of floats in a Vec<T> where T is either an f32 or f64. I sometimes need to interpret this buffer, or sections of it, as a mathematical vector. To this end, I am taking advantage of MatrixSlice and friends in nalgebra.
I can create a DVectorSliceMut, for example, the following way
fn as_vector<'a>(slice: &'a mut [f64]) -> DVectorSliceMut<'a, f64> {
DVectorSliceMut::from(slice)
}
However, sometimes I need to later extract the original slice from the DVectorSliceMut with the original lifetime 'a. Is there a way to do this?
The StorageMut trait has a as_mut_slice member function, but the lifetime of the returned slice is the lifetime of the reference to the Storage implementor, not the original slice. I am okay with a solution which consumes the DVectorSliceMut if necessary.
Update: Methods into_slice and into_slice_mut have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.
Given the current API of nalgebra (v0.27.1) there isn't much that you can do, except:
life with the shorter life-time of StorageMut::as_mut_slice
make a feature request for such a function at nalgebra (which seems you already did)
employ your own unsafe code to make StorageMut::ptr_mut into a &'a mut
You could go with the third option until nalgebra gets update and implement something like this in your own code:
use nalgebra::base::dimension::Dim;
use nalgebra::base::storage::Storage;
use nalgebra::base::storage::StorageMut;
fn into_slice<'a>(vec: DVectorSliceMut<'a, f64>) -> &'a mut [f64] {
let mut inner = vec.data;
// from nalgebra
// https://docs.rs/nalgebra/0.27.1/src/nalgebra/base/matrix_slice.rs.html#190
let (nrows, ncols) = inner.shape();
if nrows.value() != 0 && ncols.value() != 0 {
let sz = inner.linear_index(nrows.value() - 1, ncols.value() - 1);
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), sz + 1) }
} else {
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), 0) }
}
}
Methods into_slice and into_slice_mut which return the original slice have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

How to create a pool of contigous memory across multiple structs in Rust?

I'm trying to create a set of structs in Rust that use a contiguous block of memory. E.g:
<------------ Memory Pool -------------->
[ Child | Child | Child | Child ]
These structs:
may each contain slices of the pool of different sizes
should allow access to their slice of the pool without any blocking operations once initialized (I intend to access them on an audio thread).
I'm very new to Rust but I'm well versed in C++ so the main hurdle thus far has been working with the ownership semantics - I'm guessing there's a trivial way to achieve this (without using unsafe) but the solution isn't clear to me. I've written a small (broken) example of what I'm trying to do:
pub struct Child<'a> {
pub slice: &'a mut [f32],
}
impl Child<'_> {
pub fn new<'a>(s: &mut [f32]) -> Child {
Child {
slice: s,
}
}
}
pub struct Parent<'a> {
memory_pool: Vec<f32>,
children: Vec<Child<'a>>,
}
impl Parent<'_> {
pub fn new<'a>() -> Parent<'a> {
const SIZE: usize = 100;
let p = vec![0f32; SIZE];
let mut p = Parent {
memory_pool: p,
children: Vec::new(),
};
// Two children using different parts of the memory pool:
let (lower_pool, upper_pool) = p.memory_pool.split_at_mut(SIZE / 2);
p.children = vec!{ Child::new(lower_pool), Child::new(upper_pool) };
return p; // ERROR - p.memory_pool is borrowed 2 lines earlier
}
}
I would prefer a solution that doesn't involve unsafe but I'm not entirely opposed to using it. Any suggestions would be very much appreciated, as would any corrections on how I'm (mis?)using Rust in my example.
Yes, it's currently impossible (or quite hard) in Rust to contain references into sibling data, for example, as you have here, a Vec and slices into that Vec as fields in the same struct. Depending on the architecture of your program, you might solve this by storing the original Vec at some higher-level of your code (for example, it could live on the stack in main() if you're not writing a library) and the slice references at some lower level in a way that the compiler can clearly infer it won't go out of scope before the Vec (doing it in main() after the Vec has been instantiated could work, for example).
This is the perfect use case for an arena allocator. There are quite a few. The following demonstration uses bumpalo:
//# bumpalo = "2.6.0"
use bumpalo::Bump;
use std::mem::size_of;
struct Child1(u32, u32);
struct Child2(f32, f32, f32);
fn main() {
let arena = Bump::new();
let c1 = arena.alloc(Child1(1, 2));
let c2 = arena.alloc(Child2(1.0, 2.0, 3.0));
let c3 = arena.alloc(Child1(10, 11));
// let's verify that they are indeed continuous in memory
let ptr1 = c1 as *mut _ as usize;
let ptr2 = c2 as *mut _ as usize;
let ptr3 = c3 as *mut _ as usize;
assert_eq!(ptr1 + size_of::<Child1>(), ptr2);
assert_eq!(ptr1 + size_of::<Child1>() + size_of::<Child2>(), ptr3);
}
There are caveats, too. The main concern is of course the alignment; there may be some padding between two consecutive allocations. It is up to you to make sure that doesn't gonna happen if it is a deal breaker.
The other is allocator specific. The bumpalo arena allocator used here, for example, doesn't drop object when itself gets deallocated.
Other than that, I do believe a higher level abstraction like this will benefit your project. Otherwise, it'll just be pointer manipulating c/c++ disguised as rust.

Using a HashSet to canonicalize objects in Rust

As an educational exercise, I'm looking at porting cvs-fast-export to Rust.
Its basic mode of operation is to parse a number of CVS master files into a intermediate form, and then to analyse the intermediate form with the goal of transforming it into a git fast-export stream.
One of the things that is done when parsing is to convert common parts of the intermediate form into a canonical representation. A motivating example is commit authors. A CVS repository may have hundreds of thousands of individual file commits, but probably less than a thousand authors. So an interning table is used when parsing where you input the author as you parse it from the file and it will give you a pointer to a canonical version, creating a new one if it hasn't seen it before. (I've heard this called atomizing or interning too). This pointer then gets stored on the intermediate object.
My first attempt to do something similar in Rust attempted to use a HashSet as the interning table. Note this uses CVS version numbers rather than authors, this is just a sequence of digits such as 1.2.3.4, represented as a Vec.
use std::collections::HashSet;
use std::hash::Hash;
#[derive(PartialEq, Eq, Debug, Hash, Clone)]
struct CvsNumber(Vec<u16>);
fn intern<T:Eq + Hash + Clone>(set: &mut HashSet<T>, item: T) -> &T {
let dupe = item.clone();
if !set.contains(&item) {
set.insert(item);
}
set.get(&dupe).unwrap()
}
fn main() {
let mut set: HashSet<CvsNumber> = HashSet::new();
let c1 = CvsNumber(vec![1, 2]);
let c2 = intern(&mut set, c1);
let c3 = CvsNumber(vec![1, 2]);
let c4 = intern(&mut set, c3);
}
This fails with error[E0499]: cannot borrow 'set' as mutable more than once at a time. This is fair enough, HashSet doesn't guarantee references to its keys will be valid if you add more items after you have obtained a reference. The C version is careful to guarantee this. To get this guarantee, I think the HashSet should be over Box<T>. However I can't explain the lifetimes for this to the borrow checker.
The ownership model I am going for here is that the interning table owns the canonical versions of the data, and hands out references. The references should be valid as long the interning table exists. We should be able to add new things to the interning table without invalidating the old references. I think the root of my problem is that I'm confused how to write the interface for this contract in a way consistent with the Rust ownership model.
Solutions I see with my limited Rust knowledge are:
Do two passes, build a HashSet on the first pass, then freeze it and use references on the second pass. This means additional temporary storage (sometimes substantial).
Unsafe
Does anyone have a better idea?
I somewhat disagree with #Shepmaster on the use of unsafe here.
While right now it does not cause issue, should someone decide in the future to change the use of HashSet to include some pruning (for example, to only ever keep a hundred authors in there), then unsafe will bite you sternly.
In the absence of a strong performance reason, I would simply use a Rc<XXX>. You can alias it easily enough: type InternedXXX = Rc<XXX>;.
use std::collections::HashSet;
use std::hash::Hash;
use std::rc::Rc;
#[derive(PartialEq, Eq, Debug, Hash, Clone)]
struct CvsNumber(Rc<Vec<u16>>);
fn intern<T:Eq + Hash + Clone>(set: &mut HashSet<T>, item: T) -> T {
if !set.contains(&item) {
let dupe = item.clone();
set.insert(dupe);
item
} else {
set.get(&item).unwrap().clone()
}
}
fn main() {
let mut set: HashSet<CvsNumber> = HashSet::new();
let c1 = CvsNumber(Rc::new(vec![1, 2]));
let c2 = intern(&mut set, c1);
let c3 = CvsNumber(Rc::new(vec![1, 2]));
let c4 = intern(&mut set, c3);
}
Your analysis is correct. The ultimate issue is that when modifying the HashSet, the compiler cannot guarantee that the mutations will not affect the existing allocations. Indeed, in general they might affect them, unless you add another layer of indirection, as you have identified.
This is a prime example of a place that unsafe is useful. You, the programmer, can assert that the code will only ever be used in a particular way, and that particular way will allow the variable to be stable through any mutations. You can use the type system and module visibility to help enforce these conditions.
Note that String already introduces a heap allocation. So long as you don't change the String once it's allocated, you don't need an extra Box.
Something like this seems like an OK start:
use std::{cell::RefCell, collections::HashSet, mem};
struct EasyInterner(RefCell<HashSet<String>>);
impl EasyInterner {
fn new() -> Self {
EasyInterner(RefCell::new(HashSet::new()))
}
fn intern<'a>(&'a self, s: &str) -> &'a str {
let mut set = self.0.borrow_mut();
if !set.contains(s) {
set.insert(s.into());
}
let interned = set.get(s).expect("Impossible missing string");
// TODO: Document the pre- and post-conditions that the code must
// uphold to make this unsafe code valid instead of copying this
// from Stack Overflow without reading it
unsafe { mem::transmute(interned.as_str()) }
}
}
fn main() {
let i = EasyInterner::new();
let a = i.intern("hello");
let b = i.intern("world");
let c = i.intern("hello");
// Still strings
assert_eq!(a, "hello");
assert_eq!(a, c);
assert_eq!(b, "world");
// But with the same address
assert_eq!(a.as_ptr(), c.as_ptr());
assert!(a.as_ptr() != b.as_ptr());
// This shouldn't compile; a cannot outlive the interner
// let x = {
// let i = EasyInterner::new();
// let a = i.intern("hello");
// a
// };
let the_pointer;
let i = {
let i = EasyInterner::new();
{
// Introduce a scope to contstrain the borrow of `i` for `s`
let s = i.intern("inner");
the_pointer = s.as_ptr();
}
i // moving i to a new location
// All outstanding borrows are invalidated
};
// but the data is still allocated
let s = i.intern("inner");
assert_eq!(the_pointer, s.as_ptr());
}
However, it may be much more expedient to use a crate like:
string_cache, which has the collective brainpower of the Servo project behind it.
typed-arena
generational-arena

Resources