How to achieve equivalent of take_while on a slice? - rust

Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa

First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.

It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.

Related

Rust: how to assign `iter().map()` or `iter().enumarate()` to same variable

struct A {...whatever...};
const MY_CONST_USIZE:usize = 127;
// somewhere in function
// vec1_of_A:Vec<A> vec2_of_A_refs:Vec<&A> have values from different data sources and have different inside_item types
let my_iterator;
if my_rand_condition() { // my_rand_condition is random and compiles for sake of simplicity
my_iterator = vec1_of_A.iter().map(|x| (MY_CONST_USIZE, &x)); // Map<Iter<Vec<A>>>
} else {
my_iterator = vec2_of_A_refs.iter().enumerate(); // Enumerate<Iter<Vec<&A>>>
}
how to make this code compile?
at the end (based on condition) I would like to have iterator able build from both inputs and I don't know how to integrate these Map and Enumerate types into single variable without calling collect() to materialize iterator as Vec
reading material will be welcomed
In the vec_of_A case, first you need to replace &x with x in your map function. The code you have will never compile because the mapping closure tries to return a reference to one of its parameters, which is never allowed in Rust. To make the types match up, you need to dereference the &&A in the vec2_of_A_refs case to &A instead of trying to add a reference to the other.
Also, -127 is an invalid value for usize, so you need to pick a valid value, or use a different type than usize.
Having fixed those, now you need some type of dynamic dispatch. The simplest approach would be boxing into a Box<dyn Iterator>.
Here is a complete example:
#![allow(unused)]
#![allow(non_snake_case)]
struct A;
// Fixed to be a valid usize.
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator: Box<dyn Iterator<Item=(usize, &A)>>;
if my_rand_condition() {
// Fixed to return x instead of &x
my_iterator = Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
// Added map to deref &&A to &A to make the types match
my_iterator = Box::new(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Instead of a boxed trait object, you could also use the Either type from the either crate. This is an enum with Left and Right variants, but the Either type itself implements Iterator if both the left and right types also do, with the same type for the Item associated type. For example:
#![allow(unused)]
#![allow(non_snake_case)]
use either::Either;
struct A;
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator;
if my_rand_condition() {
my_iterator = Either::Left(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
my_iterator = Either::Right(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Why would you choose one approach over the other?
Pros of the Either approach:
It does not require a heap allocation to store the iterator.
It implements dynamic dispatch via match which is likely (but not guaranteed) to be faster than dynamic dispatch via vtable lookup.
Pros of the boxed trait object approach:
It does not depend on any external crates.
It scales easily to many different types of iterators; the Either approach quickly becomes unwieldy with more than two types.
You can do this using a Boxed trait object like so:
let my_iterator: Box<dyn Iterator<Item = _>> = if my_rand_condition() {
Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)))
} else {
Box::new(vec2_of_A_refs.iter().enumerate().map(|(i, x)| (i, *x)))
};
I don't think this is a good idea generally though. A few things to note:
The use of trait objects means the types here must be resolved dynamically. This adds a lot of overhead.
The closure in vec1's iterator's map method cannot reference its arguments. Instead the second map must be added to vec2s iterator. The effect of this is that all the items are being copied regardless. If you are doing this, why not collect()? The overhead for creating the Vec or whatever you choose should be less than that of the dynamic resolution.
Bit pedantic, but remember if statements are expressions in Rust, and so the assignment can be expressed a little more cleanly as I have done above.

Multi-dimensional vector borrowing

I'm trying to implement a coding exercise, but I've ran into a wall regarding multi-dimensional vectors and borrowing.
The code is accessible in this playground, but I'll add here a snippet for reference:
type Matrix = Rc<RefCell<Vec<Vec<String>>>>;
/// sequence -> target string
/// dictionary -> array of 'words' that can be used to construct the 'sequence'
/// returns -> 2d array of all the possible combinations to create the 'sequence' from the 'dictionary'
pub fn all_construct<'a>(sequence: &'a str, dictionary: &'a [&str]) -> Matrix {
let memo: Rc<RefCell<HashMap<&str, Matrix>>> = Rc::new(RefCell::new(HashMap::new()));
all_construct_memo(sequence, dictionary, Rc::clone(&memo))
}
fn all_construct_memo<'a>(
sequence: &'a str,
dictionary: &'a [&str],
memo: Rc<RefCell<HashMap<&'a str, Matrix>>>,
) -> Matrix {
if memo.borrow().contains_key(sequence) {
return Rc::clone(&memo.borrow()[sequence]);
}
if sequence.is_empty() {
return Rc::new(RefCell::new(Vec::new()));
}
let ways = Rc::new(RefCell::new(Vec::new()));
for word in dictionary {
if let Some(new_sequence) = sequence.strip_prefix(word) {
let inner_ways = all_construct_memo(new_sequence, dictionary, Rc::clone(&memo));
for mut entry in inner_ways.borrow_mut().into_iter() { // error here
entry.push(word.to_string());
ways.borrow_mut().push(entry);
}
}
}
memo.borrow_mut().insert(sequence, Rc::clone(&ways));
Rc::clone(&ways)
}
The code doesn't compile.
Questions:
This feel overly complicated. Is there a simpler way to do it?
1.1 For the Matrix type, I tried getting by with just Vec<Vec<String>>, but that didn't get me very far. What's the way to properly encode a 2d Vector that allows for mutability and sharing, without using extra crates?
1.2. Is there a better way to pass the memo object?
Not really understanding the compiler error here. Can you help me with that?
error[E0507]: cannot move out of dereference of `RefMut<'_, Vec<Vec<String>>>`
--> src/lib.rs:31:30
|
31 | for mut entry in inner_ways.borrow_mut().into_iter() { // error here
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ move occurs because value has type `Vec<Vec<String>>`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0507`.
Thank you!
2d vecs work fine, and for jagged arrays like yours, your implementation is correct. Your issues stem from a needless use of Rc and RefCell. Because of the way you're calling things, a single, mutable reference will work.
Consider the following, modified, example:
type Vec2<T> = Vec<Vec<T>>;
fn all_constructs<'a>(sequence: &'a str, segments: &[&'a str]) -> Vec2<&'a str> {
let mut cache = HashMap::new();
all_constructs_memo(sequence, segments, &mut cache)
}
fn all_constructs_memo<'a>(
sequence: &'a str,
segments: &[&'a str],
cache: &mut HashMap<&'a str, Vec2<&'a str>>
) -> Vec2<&'a str> {
// If we have the answer cached, return the cache
if let Some(constructs) = cache.get(sequence) {
return constructs.to_vec();
}
// We don't have it cached, so figure it out
let mut constructs = Vec::new();
for segment in segments {
if *segment == sequence {
constructs.push(vec![*segment]);
} else if let Some(sub_sequence) = sequence.strip_suffix(segment) {
let mut sub_constructs = all_constructs_memo(sub_sequence, segments, cache);
sub_constructs.iter_mut().for_each(|c| c.push(segment));
constructs.append(&mut sub_constructs);
}
}
cache.insert(sequence, constructs.clone());
return constructs;
}
It's identical, execpt for 4 differences:
1.) I removed all Rc and RefCell. There is a single Hashmap reference
2.) Instead of having all_constructs_memo("", ...) -> Vec::new(), I just added a branch in the iterator if *segment == sequence to test for single-segment matches that way.
3.) I wrote Vec2 instead of Matrix
4.) strip_suffix instead of strip_prefix, just because adding to the end of vecs is a little more efficient than adding to the front.
Here's a playground link with tests against a non-memoized reference implementation
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1b488aafda6466629c17c8a7de8f3e42

Extract original slice from SliceStorage and SliceStorageMut

I am working on some software where I am managing a buffer of floats in a Vec<T> where T is either an f32 or f64. I sometimes need to interpret this buffer, or sections of it, as a mathematical vector. To this end, I am taking advantage of MatrixSlice and friends in nalgebra.
I can create a DVectorSliceMut, for example, the following way
fn as_vector<'a>(slice: &'a mut [f64]) -> DVectorSliceMut<'a, f64> {
DVectorSliceMut::from(slice)
}
However, sometimes I need to later extract the original slice from the DVectorSliceMut with the original lifetime 'a. Is there a way to do this?
The StorageMut trait has a as_mut_slice member function, but the lifetime of the returned slice is the lifetime of the reference to the Storage implementor, not the original slice. I am okay with a solution which consumes the DVectorSliceMut if necessary.
Update: Methods into_slice and into_slice_mut have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.
Given the current API of nalgebra (v0.27.1) there isn't much that you can do, except:
life with the shorter life-time of StorageMut::as_mut_slice
make a feature request for such a function at nalgebra (which seems you already did)
employ your own unsafe code to make StorageMut::ptr_mut into a &'a mut
You could go with the third option until nalgebra gets update and implement something like this in your own code:
use nalgebra::base::dimension::Dim;
use nalgebra::base::storage::Storage;
use nalgebra::base::storage::StorageMut;
fn into_slice<'a>(vec: DVectorSliceMut<'a, f64>) -> &'a mut [f64] {
let mut inner = vec.data;
// from nalgebra
// https://docs.rs/nalgebra/0.27.1/src/nalgebra/base/matrix_slice.rs.html#190
let (nrows, ncols) = inner.shape();
if nrows.value() != 0 && ncols.value() != 0 {
let sz = inner.linear_index(nrows.value() - 1, ncols.value() - 1);
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), sz + 1) }
} else {
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), 0) }
}
}
Methods into_slice and into_slice_mut which return the original slice have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

How to share parts of a string with Rc?

I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).

How to return new data from a function as a reference without borrow checker issues?

I'm writing a function that takes a reference to an integer and returns a vector of that integer times 2, 5 times. I think that'd look something like:
fn foo(x: &i64) -> Vec<&i64> {
let mut v = vec![];
for i in 0..5 {
let q = x * 2;
v.push(&q);
}
v
}
fn main() {
let x = 5;
let q = foo(&x);
println!("{:?}", q);
}
The borrow checker goes nuts because I define a new variable, it's allocated on the stack, and goes out of scope at the end of the function.
What do I do? Certainly I can't go through life without writing functions that create new data! I'm aware there's Box, and Copy-type workarounds, but I'm interested in an idiomatic Rust solution.
I realize I could return a Vec<i64> but I think that'd run into the same issues? Mainly trying to come up with an "emblematic" problem for the general issue :)
EDIT: I only just realized that you wrote "I'm aware there's Box, Copy etc type workaround but I'm mostly interested in an idiomatic rust solution", but I've already typed the whole answer. :P And the solutions below are idiomatic Rust, this is all just how memory works! Don't go trying to return pointers to stack-allocated data in C or C++, because even if the compiler doesn't stop you, that doesn't mean anything good will come of it. ;)
Any time that you return a reference, that reference must have been a parameter to the function. In other words, if you're returning references to data, all that data must have been allocated outside of the function. You seem to understand this, I just want to make sure it's clear. :)
There are many potential ways of solving this problem depending on what your use case is.
In this particular example, because you don't need x for anything afterward, you can just give ownership to foo without bothering with references at all:
fn foo(x: i64) -> Vec<i64> {
std::iter::repeat(x * 2).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(x));
}
But let's say that you don't want to pass ownership into foo. You could still return a vector of references as long as you didn't want to mutate the underlying value:
fn foo(x: &i64) -> Vec<&i64> {
std::iter::repeat(x).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(&x));
}
...and likewise you could mutate the underlying value as long as you didn't want to hand out new pointers to it:
fn foo(x: &mut i64) -> &mut i64 {
*x *= 2;
x
}
fn main() {
let mut x = 5;
println!("{:?}", foo(&mut x));
}
...but of course, you want to do both. So if you're allocating memory and you want to return it, then you need to do it somewhere other than the stack. One thing you can do is just stuff it on the heap, using Box:
// Just for illustration, see the next example for a better approach
fn foo(x: &i64) -> Vec<Box<i64>> {
std::iter::repeat(Box::new(x * 2)).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(&x));
}
...though with the above I just want to make sure you're aware of Box as a general means of using the heap. Truthfully, simply using a Vec means that your data will be placed on the heap, so this works:
fn foo(x: &i64) -> Vec<i64> {
std::iter::repeat(x * 2).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(&x));
}
The above is probably the most idiomatic example here, though as ever your use case might demand something different.
Alternatively, you could pull a trick from C's playbook and pre-allocate the memory outside of foo, and then pass in a reference to it:
fn foo(x: &i64, v: &mut [i64; 5]) {
for i in v {
*i = x * 2;
}
}
fn main() {
let x = 5;
let mut v = [0; 5]; // fixed-size array on the stack
foo(&x, &mut v);
println!("{:?}", v);
}
Finally, if the function must take a reference as its parameter and you must mutate the referenced data and you must copy the reference itself and you must return these copied references, then you can use Cell for this:
use std::cell::Cell;
fn foo(x: &Cell<i64>) -> Vec<&Cell<i64>> {
x.set(x.get() * 2);
std::iter::repeat(x).take(5).collect()
}
fn main() {
let x = Cell::new(5);
println!("{:?}", foo(&x));
}
Cell is both efficient and non-surprising, though note that Cell works only on types that implement the Copy trait (which all the primitive numeric types do). If your type doesn't implement Copy then you can still do this same thing with RefCell, but it imposes a slight runtime overhead and opens up the possibilities for panics at runtime if you get the "borrowing" wrong.

Resources