How to take a subslice of an Arc<[T]>

How to take a subslice of an Arc<[T]> - rust

fn subslice(a: Arc<[T]>, begin: usize, end: usize) -> Arc<[T]> {
Arc::new(a[begin..end])
}
The above "obvious implementation" of the subslicing operation for Arc<[T]> does not work because a[begin..end] has type [T], which is unsized. Arc<T> has the curious property that the type itself does not require T: Sized, but the constructor Arc::new does, so I'm at a loss for how to construct this subslice.

You can't.
To explain why, let's look at what Arc actually is under the covers.
pub struct Arc<T: ?Sized> {
ptr: Shared<ArcInner<T>>,
}
Shared<T> is an internal wrapper type that essentially amounts to "*const T, but can't be zero"; so it's basically a &T without a lifetime. This means that you can't adjust the slice at this level; if you did, you'd end up trying to point to an ArcInner that doesn't exist. Thus, if this is possible, it must involve some manipulation of the ArcInner.
ArcInner<T> is defined as follows:
struct ArcInner<T: ?Sized> {
strong: atomic::AtomicUsize,
weak: atomic::AtomicUsize,
data: T,
}
strong and weak are just the number of strong and weak handles to this allocation respectively. data is the actual contents of the allocation, stored inline. And that's the problem.
In order for your code to work as you want, Arc would not only have to refer to data by another pointer (rather than storing it inline), but it would also have to store the reference counts and data in different places, so that you could take a slice of the data, but retain the same reference counts.
So you can't do what you're asking.
One thing you can do instead is to store the slicing information alongside the Arc. The owning_ref crate has an example that does exactly this.

Related

Best way to model storing an iterator for a vector inside same struct

Context
I'm a beginner, and, at a high level, what I want to do is store some mutable state (to power a state machine) in a struct with the following constraints
Mutating the state doesn't require the entire struct to be mut (since I'd have to update a ton of callsites to be mut + I don't want every field to be mutable)
The state is represented as an enum, and can, in the right state, store a way to index into the correct position in a vec that's in the same struct
I came up with two different approaches/examples that seem quite complicated and I want to see if there's a way to simplify. Here's some playgrounds that minimally reproduce what I'm exploring
Using a Cell and a usize
#[derive(Clone, Copy)]
enum S {
A,
B(usize),
}
struct Test {
a: Vec<i32>,
b: Cell<S>,
}
where usage look like this
println!("{}", t.a[index]);
t.b.set(S::B(index + 1));
Using a RefCell and an iterator
enum S<'a> {
A,
B(Iter<'a, i32>),
}
struct Test<'a> {
a: Vec<i32>,
b: RefCell<S<'a>>,
}
where usage looks like this
println!("{}", iter.next().unwrap());
Questions
Is there a better way to model this in general vs. what I've tried?
I like approach #2 with the iterator better in theory since it feels cleaner, but I don't like how it introduces explicit lifetime annotations into the struct...in the actual codebase I'm working on, I'd need to update a ton of callsites to add the lifetime annotation and the tiny bit of convenience doesn't seem worth it. Is there some way to do #2 without introducing lifetimes?

Rust Data Structure

I am currently learning Rust for fun. I have some experience in C / C++ and other experience in other programming languages that use more complex paradigms like generics.
Background
For my first project (after the tutorial), I wanted to create a N-Dimensional array (or Matrix) data structure to practice development in Rust.
Here is what I have so far for my Matrix struct and a basic fill and new initializations.
Forgive the absent bound checking and parameter testing
pub struct Matrix<'a, T> {
data: Vec<Option<T>>,
dimensions: &'a [usize],
}
impl<'a, T: Clone> Matrix<'a, T> {
pub fn fill(dimensions: &'a [usize], fill: T) -> Matrix<'a, T> {
let mut total = if dimensions.len() > 0 { 1 } else { 0 };
for dim in dimensions.iter() {
total *= dim;
}
Matrix {
data: vec![Some(fill); total],
dimensions: dimensions,
}
}
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
...
Matrix {
data: vec![None; total],
dimensions: dimensions,
}
}
}
I wanted the ability to create an "empty" N-Dimensional array using the New fn. I thought using the Option enum would be the best way to accomplish this, as I can fill the N-Dimensional with None and it would allocate space for this T generic automatically.
So then it comes down to being able to set the entries for this. I found the IndexMut and Index traits that looked like I could do something like m[&[2, 3]] = 23. Since the logic is similar to each other here is the IndexMut impl for Matrix.
impl<'a, T> ops::IndexMut<&[usize]> for Matrix<'a, T> {
fn index_mut(&mut self, indices: &[usize]) -> &mut Self::Output {
match self.data[get_matrix_index(self.dimensions, indices)].as_mut() {
Some(x) => x,
None => {
NOT SURE WHAT TO DO HERE.
}
}
}
}
Ideally what would happen is that the value (if there) would be changed i.e.
let mut mat = Matrix::fill(&[4, 4], 0)
mat[&[2, 3]] = 23
This would set the value from 0 to 23 (which the above fn does via returning &mut x from Some(x)). But I also want None to set the value i.e.
let mut mat = Matrix::new(&[4, 4])
mat[&[2, 3]] = 23
Question
Finally, is there a way to make m[&[2,3]] = 23 possible with what the Vec struct requires to allocate the memory? If not what should I change and how can I still have an array with "empty" spots. Open to any suggestions as I am trying to learn. :)
Final Thoughts
Through my research, the Vec struct impls I see that the type T is typed and has to be Sized. This could be useful as to allocate the Vec with the appropriate size via vec![pointer of T that is null but of size of T; total]. But I am unsure of how to do this.

So there are a few ways to make this more similar to idiomatic rust, but first, let's look at why the none branch doesn't make sense.
So the Output type for IndexMut I'm going to assume is &mut T as you don't show the index definition but I feel safe in that assumption. The type &mut T means a mutable reference to an initialized T, unlike pointers in C/C++ where they can point to initialized or uninitialized memory. What this means is that you have to return an initialized T which the none branch cannot because there is no initialized value. This leads to the first of the more idiomatic ways.
Return an Option<T>
The easiest way would be to change Index::Output to be an Option<T>. This is better because the user can decide what to do if there was no value there before and is close to what you are actually storing. Then you can also remove the panic in your index method and allow the caller to choose what to do if there is no value. At this point, I think you can go a little further with gentrifying the structure in the next option.
Store a T directly
This method allows the caller to directly change what the type is that's stored rather than wrapping it in an option. This cleans up most of your index code nicely as you just have to access what's already stored. The main problem is now initialization, how do you represent uninitialized values? You were correct that option is the best way to do this1, but now the caller can decide to have this optional initialization capability by storing an Option themselves. So that means we can always store initialized Ts without losing functionality. This only really changes your new function to instead not fill with None values. My suggestion here is to make a bound T: Default for the new function2:
impl<'a, T: Default> Matrix<'a, T> {
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
Matrix {
data: (0..total).into_iter().map(|_|Default::default()).collect(),
dimensions: dimensions,
}
}
}
This method is much more common in the rust world and allows the caller to choose whether to allow for uninitialized values. Option<T> also implements default for all T and returns None So the functionality is very similar to what you have currently.
Aditional Info
As you're new to rust there are a few comments that I can make about traps that I've fallen into before. To start your struct contains a reference to the dimensions with a lifetime. What this means is that your structs cannot exist longer than the dimension object that created them. This hasn't caused you a problem so far as all you've been passing is statically created dimensions, dimensions that are typed into the code and stored in static memory. This gives your object a lifetime of 'static, but this won't occur if you use dynamic dimensions.
How else can you store these dimensions so that your object always has a 'static lifetime (same as no lifetime)? Since you want an N-dimensional array stack allocation is out of the question since stack arrays must be deterministic at compile time (otherwise known as const in rust). This means you have to use the heap. This leaves two real options Box<[usize]> or Vec<usize>. Box is just another way of saying this is on the heap and adds Sized to values that are ?Sized. Vec is a little more self-explanatory and adds the ability to be resized at the cost of a little overhead. Either would allow your matrix object to always have a 'static lifetime.
1. The other way to represent this without Option<T>'s discriminate is MaybeUninit<T> which is unsafe territory. This allows you to have a chunk of initialized memory big enough to hold a T and then assume it's initialized unsafely. This can cause a lot of problems and is usually not worth it as Option is already heavily optimized in that if it stores a type with a pointer it uses compiler magic to store the discriminate in whether or not that value is a null pointer.
2. The reason this section doesn't just use vec![Default::default(); total] is that this requires T: Clone as the way this macro works the first part is called once and cloned until there are enough values. This is an extra requirement that we don't need to have so the interface is smoother without it.

How is from_raw_parts_mut able to transmute between types of different sizes?

I am looking at the code of from_raw_parts_mut:
pub unsafe fn from_raw_parts_mut<'a, T>(p: *mut T, len: usize) -> &'a mut [T] {
mem::transmute(Repr { data: p, len: len })
}
It uses transmute to reinterpret a Repr to a &mut [T]. As far as I understand, Repr is a 128 bit struct. How does this transmute of differently sized types work?

mem::transmute() does only work when transmuting to a type of the same size - so that means an &mut[T] slice is also the same size.
Looking at Repr:
#[repr(C)]
struct Repr<T> {
pub data: *const T,
pub len: usize,
}
It has a pointer to some data and a length. This is exactly what a slice is - a pointer to an array of items (which might be an actual array, or owned by a Vec<T>, etc.) with a length to say how many items are valid.
The object which is passed around as a slice is (under the covers) exactly what the Repr looks like, even though the data it refers to can be anything from 0 to as many T as will fit into memory.
In Rust, some references are not just implemented as a pointer as in some other languages. Some types are "fat pointers". This might not be obvious at first since, especially if you are familiar with references/pointers in some other languages! Some examples are:
Slices &[T] and &mut [T], which as described above, are actually a pointer and length. The length is needed for bounds checks. For example, you can pass a slice corresponding to part of an array or Vec to a function.
Trait objects like &Trait or Box<Trait>, where Trait is a trait rather than a concrete type, are actually a pointer to the concrete type and a pointer to a vtable — the information needed to call trait methods on the object, given that its concrete type is not known.

How to abstract over a reference to a value or a value itself?

I have a trait that defines an interface for objects that can hold a value. The trait has a way of getting the current value:
pub trait HasValue<T> {
fn get_current_value(&self) -> &T;
}
This is fine, but I realized that depending on the actual implementation, sometimes it's convenient to return a reference if T is stored in a field, and sometimes it's convenient to return a clone of T if the backing field was being shared across threads (for example). I'm struggling to figure out how to represent this in the trait. I could have something like this:
pub enum BorrowedOrOwned<'a, T: 'a> {
Borrowed(&'a T),
Owned(T)
}
impl<'a, T: 'a> Deref for BorrowedOrOwned<'a, T> {
type Target = T;
fn deref(&self) -> &T {
use self::BorrowedOrOwned::*;
match self {
&Borrowed(b) => b,
&Owned(ref o) => o,
}
}
}
And change get_current_value() to return a BorrowedOrOwned<T> but I'm not sure that this is idiomatic. BorrowedOrOwned<T> kind of reminds me of Cow<T> but since the point of Cow is to copy-on-write and I will be discarding any writes, that seems semantically wrong.
Is Cow<T> the correct way to abstract over a reference or an owned value? Is there a better way than BorrowedOrOwned<T>?

I suggest you use a Cow, since your BorrowedOrOwned has no difference to Cow except that it has fewer convenience methods. Anybody that gets a hold of a BorrowedOrOwned object could match on it and get the owned value or a mutable reference to it. If you want to prevent the confusion of being able to get a mutable reference or the object itself, the solution below applies, too.
For your use case i'd simply stay with &T, since there's no reason to make the API more complex. If a user wants a usize, when T is usize, they can simply dereference the reference.
An owned object only makes sense, if you expect the user to actually process it in an owned fashion. And even then, Cow is meant to abstract over big/heavy objects that you pass by ownership for the purpose of not requiring anyone to clone it. Your use case is the opposite, you want to pass small objects by ownership to prevent users from needing to copy small objects, and instead you are copying it.

Is it possible to return part of a struct by reference?

Consider the following two structs:
pub struct BitVector<S: BitStorage> {
data: Vec<S>,
capacity: usize,
storage_size: usize
}
pub struct BitSlice<'a, S: BitStorage> {
data: &'a [S],
storage_size: usize
}
Where BitStorage is practically a type that is restricted to all unsigned integers (u8, u16, u32, u64, usize).
How to implement the Deref trait? (BitVector<S> derefs to BitSlice<S> similar to how Vec<S> derefs to &[S])
I have tried the following (Note that it doesn't compile due to issues with lifetimes, but more importantly because I try to return a value on the stack as a reference):
impl<'b, S: BitStorage> Deref for BitVector<S> {
type Target = BitSlice<'b, S>;
fn deref<'a>(&'a self) -> &'a BitSlice<'b, S> {
let slice = BitSlice {
data: self.data,
storage_size: self.storage_size,
};
&slice
}
}
I am aware that it is possible to return a field of a struct by reference, so for example I could return &Vec<S> or &usize in the Deref trait, but is it possible to return a BitSlice noting that I essentially have all the data in the BitVector already as Vec<S> can be transformed into &[S] and storage_size is already there?
I would think this is possible if I could create a struct using both values and somehow tell the compiler to ignore the fact that it is a struct that is created on the stack and instead just use the existing values, but I have got no clue how.

Deref is required to return a reference. A reference always points to some existing memory, and any local variable will not exist long enough. While there are, in theory, some sick tricks you could play to create a new object in deref and return a reference to it, all that I'm aware of result in a memory leak. Let's ignore these technicalities and just say it's plain impossible.
Now what? You'll have to change your API. Vec can implement Deref because it derefs to [T], not to &[T] or anything like that. You may have success with the same strategy: Make BitSlice<S> an unsized type containing only a slice [S], so that the return type is &'a BitSlice<S>. This assume the storage_size member is not needed. But it seems that this refers to the number of bits that are logically valid (i.e., can be accessed without extending the bit vector) — if so, that seems unavoidable1.
The other alternative, of course, is to not implement a Deref. Inconvenient, but if your slice data type is too far from an actual slice, it may be the only option.
RFC PR #1524 that proposed custom dynamically-sized types, then you could have a type BitSlice<S> that is like a slice but can have additional contents such as storage_size. However, this doesn't exist yet and it's far from certain if it ever will.
1 The capacity member on BitVector, however, seems pointless. Isn't that just sizeof S * 8?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to take a subslice of an Arc<[T]> - rust

Related

Best way to model storing an iterator for a vector inside same struct

Rust Data Structure

How is from_raw_parts_mut able to transmute between types of different sizes?

How to abstract over a reference to a value or a value itself?

Is it possible to return part of a struct by reference?

Categories

Resources