Extract original slice from SliceStorage and SliceStorageMut

Extract original slice from SliceStorage and SliceStorageMut - rust

I am working on some software where I am managing a buffer of floats in a Vec<T> where T is either an f32 or f64. I sometimes need to interpret this buffer, or sections of it, as a mathematical vector. To this end, I am taking advantage of MatrixSlice and friends in nalgebra.
I can create a DVectorSliceMut, for example, the following way
fn as_vector<'a>(slice: &'a mut [f64]) -> DVectorSliceMut<'a, f64> {
DVectorSliceMut::from(slice)
}
However, sometimes I need to later extract the original slice from the DVectorSliceMut with the original lifetime 'a. Is there a way to do this?
The StorageMut trait has a as_mut_slice member function, but the lifetime of the returned slice is the lifetime of the reference to the Storage implementor, not the original slice. I am okay with a solution which consumes the DVectorSliceMut if necessary.
Update: Methods into_slice and into_slice_mut have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

Given the current API of nalgebra (v0.27.1) there isn't much that you can do, except:
life with the shorter life-time of StorageMut::as_mut_slice
make a feature request for such a function at nalgebra (which seems you already did)
employ your own unsafe code to make StorageMut::ptr_mut into a &'a mut
You could go with the third option until nalgebra gets update and implement something like this in your own code:
use nalgebra::base::dimension::Dim;
use nalgebra::base::storage::Storage;
use nalgebra::base::storage::StorageMut;
fn into_slice<'a>(vec: DVectorSliceMut<'a, f64>) -> &'a mut [f64] {
let mut inner = vec.data;
// from nalgebra
// https://docs.rs/nalgebra/0.27.1/src/nalgebra/base/matrix_slice.rs.html#190
let (nrows, ncols) = inner.shape();
if nrows.value() != 0 && ncols.value() != 0 {
let sz = inner.linear_index(nrows.value() - 1, ncols.value() - 1);
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), sz + 1) }
} else {
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), 0) }
}
}

Methods into_slice and into_slice_mut which return the original slice have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

Related

Rust: how to assign `iter().map()` or `iter().enumarate()` to same variable

struct A {...whatever...};
const MY_CONST_USIZE:usize = 127;
// somewhere in function
// vec1_of_A:Vec<A> vec2_of_A_refs:Vec<&A> have values from different data sources and have different inside_item types
let my_iterator;
if my_rand_condition() { // my_rand_condition is random and compiles for sake of simplicity
my_iterator = vec1_of_A.iter().map(|x| (MY_CONST_USIZE, &x)); // Map<Iter<Vec<A>>>
} else {
my_iterator = vec2_of_A_refs.iter().enumerate(); // Enumerate<Iter<Vec<&A>>>
}
how to make this code compile?
at the end (based on condition) I would like to have iterator able build from both inputs and I don't know how to integrate these Map and Enumerate types into single variable without calling collect() to materialize iterator as Vec
reading material will be welcomed

In the vec_of_A case, first you need to replace &x with x in your map function. The code you have will never compile because the mapping closure tries to return a reference to one of its parameters, which is never allowed in Rust. To make the types match up, you need to dereference the &&A in the vec2_of_A_refs case to &A instead of trying to add a reference to the other.
Also, -127 is an invalid value for usize, so you need to pick a valid value, or use a different type than usize.
Having fixed those, now you need some type of dynamic dispatch. The simplest approach would be boxing into a Box<dyn Iterator>.
Here is a complete example:
#![allow(unused)]
#![allow(non_snake_case)]
struct A;
// Fixed to be a valid usize.
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator: Box<dyn Iterator<Item=(usize, &A)>>;
if my_rand_condition() {
// Fixed to return x instead of &x
my_iterator = Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
// Added map to deref &&A to &A to make the types match
my_iterator = Box::new(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Instead of a boxed trait object, you could also use the Either type from the either crate. This is an enum with Left and Right variants, but the Either type itself implements Iterator if both the left and right types also do, with the same type for the Item associated type. For example:
#![allow(unused)]
#![allow(non_snake_case)]
use either::Either;
struct A;
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator;
if my_rand_condition() {
my_iterator = Either::Left(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
my_iterator = Either::Right(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Why would you choose one approach over the other?
Pros of the Either approach:
It does not require a heap allocation to store the iterator.
It implements dynamic dispatch via match which is likely (but not guaranteed) to be faster than dynamic dispatch via vtable lookup.
Pros of the boxed trait object approach:
It does not depend on any external crates.
It scales easily to many different types of iterators; the Either approach quickly becomes unwieldy with more than two types.

You can do this using a Boxed trait object like so:
let my_iterator: Box<dyn Iterator<Item = _>> = if my_rand_condition() {
Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)))
} else {
Box::new(vec2_of_A_refs.iter().enumerate().map(|(i, x)| (i, *x)))
};
I don't think this is a good idea generally though. A few things to note:
The use of trait objects means the types here must be resolved dynamically. This adds a lot of overhead.
The closure in vec1's iterator's map method cannot reference its arguments. Instead the second map must be added to vec2s iterator. The effect of this is that all the items are being copied regardless. If you are doing this, why not collect()? The overhead for creating the Vec or whatever you choose should be less than that of the dynamic resolution.
Bit pedantic, but remember if statements are expressions in Rust, and so the assignment can be expressed a little more cleanly as I have done above.

How to create a Box<UnsafeCell<[T]>>

The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.

The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.

Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground

Lifetimes in lambda-based iterators

My questions seems to be closely related to Rust error "cannot infer an appropriate lifetime for borrow expression" when attempting to mutate state inside a closure returning an Iterator, but I think it's not the same. So, this
use std::iter;
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
let mut chunk = None;
iter::from_fn(move || {
if i <= text.len() {
let p_chunk = chunk;
chunk = Some(&text[..i]);
i += 1;
Some(p_chunk.map(|s| String::from(s)))
} else {
None
}
})
}
fn main() {}
does not compile. The compiler says it cannot determine the appropriate lifetime for &text[..i]. This is the smallest example I could come up with. The idea being, there is an internal state, which is a slice of text, and the iterator returns new Strings allocated from that internal state. I'm new to Rust, so maybe it's all obvious, but how would I annotate lifetimes so that this compiles?
Note that this example is different from the linked example, because there point was passed as a reference, while here text is moved. Also, the answer there is one and half years old by now, so maybe there is an easier way.
EDIT: Added p_chunk to emphasize that chunk needs to be persistent across calls to next and so cannot be local to the closure but should be captured by it.

Your code is an example of attempting to create a self-referential struct, where the struct is implicitly created by the closure. Since both text and chunk are moved into the closure, you can think of both as members of a struct. As chunk refers to the contents in text, the result is a self-referential struct, which is not supported by the current borrow checker.
While self-referential structs are unsafe in general due to moves, in this case it would be safe because text is heap-allocated and is not subsequently mutated, nor does it escape the closure. Therefore it is impossible for the contents of text to move, and a sufficiently smart borrow checker could prove that what you're trying to do is safe and allow the closure to compile.
The answer to the [linked question] says that referencing through an Option is possible but the structure cannot be moved afterwards. In my case, the self-reference is created after text and chunk were moved in place, and they are never moved again, so in principle it should work.
Agreed - it should work in principle, but it is well known that the current borrow checker doesn't support it. The support would require multiple new features: the borrow checker should special-case heap-allocated types like Box or String whose moves don't affect references into their content, and in this case also prove that you don't resize or mem::replace() the closed-over String.
In this case the best workaround is the "obvious" one: instead of persisting the chunk slice, persist a pair of usize indices (or a Range) and create the slice when you need it.

If you move the chunk Option into the closure, your code compiles. I can't quite answer why declaring chunk outside the closure results in a lifetime error for the borrow of text inside the closure, but the chunk Option looks superfluous anyways and the following code should be equivalent:
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(Some(chunk))
} else {
None
}
})
}
Additionally, it seems unlikely that you really want an Iterator<Item = Option<String>> here instead of an Iterator<Item<String>>, since the iterator never yields Some(None) anyways.
fn example(text: String) -> impl Iterator<Item = String> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(chunk)
} else {
None
}
})
}
Note, you can also go about this iterator without allocating a String for each chunk, if you take a &str as an argument and tie the lifetime of the output to the input argument:
fn example<'a>(text: &'a str) -> impl Iterator<Item = &'a str> + 'a {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = &text[..i];
i += 1;
Some(chunk)
} else {
None
}
})
}

Can a type know when a mutable borrow to itself has ended?

I have a struct and I want to call one of the struct's methods every time a mutable borrow to it has ended. To do so, I would need to know when the mutable borrow to it has been dropped. How can this be done?

Disclaimer: The answer that follows describes a possible solution, but it's not a very good one, as described by this comment from Sebastien Redl:
[T]his is a bad way of trying to maintain invariants. Mostly because dropping the reference can be suppressed with mem::forget. This is fine for RefCell, where if you don't drop the ref, you will simply eventually panic because you didn't release the dynamic borrow, but it is bad if violating the "fraction is in shortest form" invariant leads to weird results or subtle performance issues down the line, and it is catastrophic if you need to maintain the "thread doesn't outlive variables in the current scope" invariant.
Nevertheless, it's possible to use a temporary struct as a "staging area" that updates the referent when it's dropped, and thus maintain the invariant correctly; however, that version basically amounts to making a proper wrapper type and a kind of weird way to use it. The best way to solve this problem is through an opaque wrapper struct that doesn't expose its internals except through methods that definitely maintain the invariant.
Without further ado, the original answer:
Not exactly... but pretty close. We can use RefCell<T> as a model for how this can be done. It's a bit of an abstract question, but I'll use a concrete example to demonstrate. (This won't be a complete example, but something to show the general principles.)
Let's say you want to make a Fraction struct that is always in simplest form (fully reduced, e.g. 3/5 instead of 6/10). You write a struct RawFraction that will contain the bare data. RawFraction instances are not always in simplest form, but they have a method fn reduce(&mut self) that reduces them.
Now you need a smart pointer type that you will always use to mutate the RawFraction, which calls .reduce() on the pointed-to struct when it's dropped. Let's call it RefMut, because that's the naming scheme RefCell uses. You implement Deref<Target = RawFraction>, DerefMut, and Drop on it, something like this:
pub struct RefMut<'a>(&'a mut RawFraction);
impl<'a> Deref for RefMut<'a> {
type Target = RawFraction;
fn deref(&self) -> &RawFraction {
self.0
}
}
impl<'a> DerefMut for RefMut<'a> {
fn deref_mut(&mut self) -> &mut RawFraction {
self.0
}
}
impl<'a> Drop for RefMut<'a> {
fn drop(&mut self) {
self.0.reduce();
}
}
Now, whenever you have a RefMut to a RawFraction and drop it, you know the RawFraction will be in simplest form afterwards. All you need to do at this point is ensure that RefMut is the only way to get &mut access to the RawFraction part of a Fraction.
pub struct Fraction(RawFraction);
impl Fraction {
pub fn new(numerator: i32, denominator: i32) -> Self {
// create a RawFraction, reduce it and wrap it up
}
pub fn borrow_mut(&mut self) -> RefMut {
RefMut(&mut self.0)
}
}
Pay attention to the pub markings (and lack thereof): I'm using those to ensure the soundness of the exposed interface. All three types should be placed in a module by themselves. It would be incorrect to mark the RawFraction field pub inside Fraction, since then it would be possible (for code outside the module) to create an unreduced Fraction without using new or get a &mut RawFraction without going through RefMut.
Supposing all this code is placed in a module named frac, you can use it something like this (assuming Fraction implements Display):
let f = frac::Fraction::new(3, 10);
println!("{}", f); // prints 3/10
f.borrow_mut().numerator += 3;
println!("{}", f); // prints 3/5
The types encode the invariant: Wherever you have Fraction, you can know that it's fully reduced. When you have a RawFraction, &RawFraction, etc., you can't be sure. If you want, you may also make RawFraction's fields non-pub, so that you can't get an unreduced fraction at all except by calling borrow_mut on a Fraction.

Basically the same thing is done in RefCell. There you want to reduce the runtime borrow-count when a borrow ends. Here you want to perform an arbitrary action.
So let's re-use the concept of writing a function that returns a wrapped reference:
struct Data {
content: i32,
}
impl Data {
fn borrow_mut(&mut self) -> DataRef {
println!("borrowing");
DataRef { data: self }
}
fn check_after_borrow(&self) {
if self.content > 50 {
println!("Hey, content should be <= {:?}!", 50);
}
}
}
struct DataRef<'a> {
data: &'a mut Data
}
impl<'a> Drop for DataRef<'a> {
fn drop(&mut self) {
println!("borrow ends");
self.data.check_after_borrow()
}
}
fn main() {
let mut d = Data { content: 42 };
println!("content is {}", d.content);
{
let b = d.borrow_mut();
//let c = &d; // Compiler won't let you have another borrow at the same time
b.data.content = 123;
println!("content set to {}", b.data.content);
} // borrow ends here
println!("content is now {}", d.content);
}
This results in the following output:
content is 42
borrowing
content set to 123
borrow ends
Hey, content should be <= 50!
content is now 123
Be aware that you can still obtain an unchecked mutable borrow with e.g. let c = &mut d;. This will be silently dropped without calling check_after_borrow.

How to achieve equivalent of take_while on a slice?

Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa

First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.

It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string