Initialize boxed slice without clone or copy - rust

I'm trying to initialize a boxed slice of None values, such that the underlying type T does not need to implement Clone or Copy. Here a few ideal solutions:
fn by_vec<T>() -> Box<[Option<T>]> {
vec![None; 5].into_boxed_slice()
}
fn by_arr<T>() -> Box<[Option<T>]> {
Box::new([None; 5])
}
Unfortunately, the by_vec implementation requires T: Clone and the by_arr implemenation requires T: Copy. I've experimented with a few more approaches:
fn by_vec2<T>() -> Box<[Option<T>]> {
let v = &mut Vec::with_capacity(5);
for i in 0..v.len() {
v[i] = None;
}
v.into_boxed_slice() // Doesn't work: cannot move out of borrowed content
}
fn by_iter<T>() -> Box<[Option<T>]> {
(0..5).map(|_| None).collect::<Vec<Option<T>>>().into_boxed_slice()
}
by_vec2 doesn't get past the compiler (I'm not sure I understand why), but by_iter does. I'm concerned about the performance of collect -- will it need to resize the vector it is collecting into as it iterates, or can it allocate the correct sized vector to begin with?
Maybe I'm going about this all wrong -- I'm very new to Rust, so any tips would be appreciated!

Let's start with by_vec2. You are taking a &mut reference to a Vec. You shouldn't do that, work directly with the Vec and make the v binding mutable.
Then you are iterating over the length of a Vec with a capacity of 5 and a length of 0. That means your loop never gets executed. What you wanted was to iterate over 0..v.cap().
Since your v is still of length 0, accessing v[i] in the loop will panic at runtime. What you actually want is v.push(None). This would normally cause reallocations, but in your case you already allocated with Vec::with_capacity, so pushing 5 times will not allocate.
This time around we did not take a reference to the Vec so into_boxed_slice will actually work.
fn by_vec2<T>() -> Box<[Option<T>]> {
let mut v = Vec::with_capacity(5);
for _ in 0..v.capacity() {
v.push(None);
}
v.into_boxed_slice()
}
Your by_iter function actually only allocates once. The Range iterator created by 0..5 knows that is exactly 5 elements long. So collect will in fact check that length and allocate only once.

Related

Creating a `Pin<Box<[T; N]>>` in Rust when `[T; N]` is too large to be created on the stack

Generalized Question
How can I implement a general function pinned_array_of_default in stable Rust where [T; N] is too large to fit on the stack?
fn pinned_array_of_default<T: Default, const N: usize>() -> Pin<Box<[T; N]>> {
unimplemented!()
}
Alternatively, T can implement Copy if that makes the process easier.
fn pinned_array_of_element<T: Copy, const N: usize>(x: T) -> Pin<Box<[T; N]>> {
unimplemented!()
}
Keeping the solution in safe Rust would have been preferable, but it seems unlikely that it is possible.
Approaches
Initially I was hopping that by implementing Default I might be able to get Default to handle the initial allocation, however it still creates it on the stack so this will not work for large values of N.
let boxed: Box<[T; N]> = Box::default();
let foo = Pin::new(boxed);
I suspect I need to use MaybeUninit to achieve this and there is a Box::new_uninit() function, but it is currently unstable and I would ideally like to keep this within stable Rust. I also somewhat unsure if transmuting Pin<Box<MaybeUninit<B>>> to Pin<Box<B>> could somehow have negative effects on the Pin.
Background
The purpose behind using a Pin<Box<[T; N]>> is to hold a block of pointers where N is some constant factor/multiple of the page size.
#[repr(C)]
#[derive(Copy, Clone)]
pub union Foo<R: ?Sized> {
assigned: NonNull<R>,
next_unused: Option<NonNull<Self>>,
}
Each pointer may or may not be in use at a given point in time. An in-use Foo points to R, and an unused/empty Foo has a pointer to either the next empty Foo in the block or None. A pointer to the first unused Foo in the block is stored separately. When a block is full, a new block is created and then pointer chain of unused positions continues through the next block.
The box needs to be pinned since it will contain self referential pointers as well as outside structs holding pointers into assigned positions in each block.
I know that Foo is wildly unsafe by Rust standards, but the general question of creating a Pin<Box<[T; N]>> still stands
A way to construct a large array on the heap and avoid creating it on the stack is to proxy through a Vec. You can construct the elements and use .into_boxed_slice() to get a Box<[T]>. You can then use .try_into() to convert it to a Box<[T; N]>. And then use .into() to convert it to a Pin<Box<[T; N]>>:
fn pinned_array_of_default<T: Default, const N: usize>() -> Pin<Box<[T; N]>> {
let mut vec = vec![];
vec.resize_with(N, T::default);
let boxed: Box<[T; N]> = match vec.into_boxed_slice().try_into() {
Ok(boxed) => boxed,
Err(_) => unreachable!(),
};
boxed.into()
}
You can optionally make this look more straight-forward if you add T: Clone so that you can do vec![T::default(); N] and/or add T: Debug so you can use .unwrap() or .expect().
See also:
Creating a fixed-size array on heap in Rust

What is the difference between TryFrom<&[T]> and TryFrom<Vec<T>>?

There seem to be two ways to try to turn a vector into an array, either via a slice (fn a) or directly (fn b):
use std::array::TryFromSliceError;
use std::convert::TryInto;
type Input = Vec<u8>;
type Output = [u8; 1000];
// Rust 1.47
pub fn a(vec: Input) -> Result<Output, TryFromSliceError> {
vec.as_slice().try_into()
}
// Rust 1.48
pub fn b(vec: Input) -> Result<Output, Input> {
vec.try_into()
}
Practically speaking, what's the difference between these? Is it just the error type? The fact that the latter was added makes me wonder whether there's more to it than that.
They have slightly different behavior.
The slice to array implementation will copy the elements from the slice. It has to copy instead of move because the slice doesn't own the elements.
The Vec to array implementation will consume the Vec and move its contents to the new array. It can do this because it does own the elements.

Is it possible to map a function over a Vec without allocating a new Vec?

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

Is there a way to pre- & un-leak a value?

I'm currently looking into doing more stuff with arrays, but I think the performance of those operations could be even better if we were allowed to somehow transmute into a Leaked<T> the array up front, only to un-leak it when the function ends. This would let us use leak amplification without a) introducing unsafety and b) setting up a catch_panic(_). Is this somehow possible in Rust?
For example, creating a generic array from an iterator (this obviously does not work):
#[inline]
fn map_inner<I, S, F, T, N>(list: I, f: F) -> GenericArray<T, N>
where I: IntoIterator<Item=S>, F: Fn(&S) -> T, N: ArrayLength<T> {
unsafe {
// pre-leak the whole array, it's uninitialized anyway
let mut res : GenericArray<Leaked<T>, N> = std::mem::uninitialized();
let i = list.into_iter();
for r in res.iter_mut() {
// this could panic anytime
std::ptr::write(r, Leaked::new(f(i.next().unwrap())))
}
// transmuting un-leaks the array
std::mem::transmute::<GenericArray<Leaked<T>, N>,
GenericArray<T, N>>(res)
}
}
I should note that if we either had compile-time access to the size of T or a type that can hide its innards from borrowck (like Leaked<T> in the example), this is perfectly feasible.
It is possible using nodrop, but it could leak.
fn map_inner<I, S, F, T, N>(list: I, f: F) -> GenericArray<T, N>
where I: IntoIterator<Item=S>, F: Fn(&S) -> T, N: ArrayLength<T> {
unsafe {
// pre-leak the whole array, it's uninitialized anyway
let mut res : NoDrop<GenericArray<T, N>> = NoDrop::new(std::mem::uninitialized());
let i = list.into_iter();
for r in res.iter_mut() {
// this could panic anytime
std::ptr::write(r, f(i.next().unwrap()))
}
res.into_inner()
}
}
Let's suppose that after the first item (a) is consumed from i and written to r, a panic happens. The remaining items from i would be drop, but the item a would not. Although leaking memory is not considered unsafe, it is not desirable.
I think that the approach described in the question link is the way to go. It is similar to the Vec and ArrayVec implementations. I'm using a similar approach in array library that I'm writing.

error: cannot assign to immutable indexed content `i[..]`

In the following rust code I am trying to change the contents of an array:
let mut example_state = [[0;8]; 2];
for mut i in example_state.iter() {
let mut k = 0;
for j in i.iter(){
i[k] = 9u8;
k +=1
}
}
However I get the error message:
src/main.rs:18:13: 18:23 error: cannot assign to immutable indexed content `i[..]`
src/main.rs:18 i[k] = 9u8;
which I'm confused by because I am defining i to be mut and example_state is also mutable.
I also don't know if this is the best way to change the contents of an array - do I need the counter k or can I simply use the iterator j in some way?
UPDATE:
So I found that this block of code works:
let mut example_state = [[n;8]; 2];
for i in example_state.iter_mut() {
for j in i.iter_mut(){
*j = 9u8;
}
}
but I would appreciate some explanation of what the difference is between them, iter_mut doesn't throw up much on Google.
Let's look at the signatures of the two methods, iter and iter_mut:
fn iter(&self) -> Iter<T>;
fn iter_mut(&mut self) -> IterMut<T>;
And the structs they return, Iter and IterMut, specifically the implementation of Iterator:
// Iter
type Item = &'a T
// IterMut
type Item = &'a mut T
These are associated types, but basically in this case, they specify what the return type of calling Iterator::next. When you used iter, even though it was on a mutable variable, you were asking for an iterator to immutable references to a type T (&T). That's why you weren't able to mutate them!
When you switched to iter_mut, the return type of Iterator::next is &mut T, a mutable reference to a type T. You are allowed to set these values!
As an aside, your question used arrays, not slices, but there aren't documentation links for arrays (that I could find quickly), and slices are close enough to arrays so I used them for this explanation.
There are two orthogonal concepts going on here:
Whether the reference itself is mutable. That's the difference between i and mut i.
Whether the data it points to is mutable. That's the difference between .iter()/&T and .iter_mut()/&mut T.
If you use C, this distinction should be familiar. Your initial code creates mutable references to immutable data, or const char * in C. So while you can assign to the reference itself (i = ...), you can't modify the data it points to (*i = ...). That's why the compiler stops you.
On the other hand, your fixed code creates immutable references to mutable data. That's char * const in C. This doesn't let you assign to the reference itself, but it does let you modify the underlying array, so it compiles as expected.
So why does Rust have a separate .iter() and .iter_mut()? Because in Rust, while you can take as many &T to a structure as you want, you can only modify it through a single &mut T. In other words, mutable references are unique and never alias.
Having both .iter() and .iter_mut() gives you a choice. On one hand, you can have any number of immutable iterators in scope at once, all pointing to the same array. Here's a silly example that iterates forwards and backwards at the same time:
for i, j in array.iter().zip(array.iter().rev()) {
println!("{} {}", i, j);
}
But if you want a mutable iterator, you have to guarantee the references never alias. So this won't work:
// Won't compile
for i, j in array.iter_mut().zip(array.iter_mut().rev()) {
println!("{} {}", i, j);
}
because the compiler can't guarantee i and j don't point to the same location in memory.

Resources