What's the purpose of SizeHint in Iterator::unzip? - rust

From the Rust standard library implementation of unzip:
fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB) where
FromA: Default + Extend<A>,
FromB: Default + Extend<B>,
Self: Sized + Iterator<Item=(A, B)>,
{
struct SizeHint<A>(usize, Option<usize>, marker::PhantomData<A>);
impl<A> Iterator for SizeHint<A> {
type Item = A;
fn next(&mut self) -> Option<A> { None }
fn size_hint(&self) -> (usize, Option<usize>) {
(self.0, self.1)
}
}
let (lo, hi) = self.size_hint();
let mut ts: FromA = Default::default();
let mut us: FromB = Default::default();
ts.extend(SizeHint(lo, hi, marker::PhantomData));
us.extend(SizeHint(lo, hi, marker::PhantomData));
for (t, u) in self {
ts.extend(Some(t));
us.extend(Some(u));
}
(ts, us)
}
These two lines:
ts.extend(SizeHint(lo, hi, marker::PhantomData));
us.extend(SizeHint(lo, hi, marker::PhantomData));
don't actually extend ts or us by anything, since the next method of SizeHint returns None. What's the purpose of doing so?

It is a cool trick. By giving this size hint, it gives ts and us the chance to reserve the space for the extend calls in the loop. According to the documentation
size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g. omit bounds checks in unsafe code. An incorrect implementation of size_hint() should not lead to memory safety violations.
Note that the creation of SizeHint is necessary because the extend call in for loop is made with a Some value (Optional implements the Iterator trait), and the size_hint for a Some value is (1, Some(1)). That doesn't help with pre allocation.
But looking at the code for Vec, this will have no effect (neither in HashMap and VecDeque). Others Extend implementations may be different.
The execution of ts.extend(SizeHint(lo, hi, marker::PhantomData)); does not trigger a resize, since next returns None. Maybe some one should write a patch.
impl<T> Vec<T> {
fn extend_desugared<I: Iterator<Item = T>>(&mut self, mut iterator: I) {
// This function should be the moral equivalent of:
//
// for item in iterator {
// self.push(item);
// }
while let Some(element) = iterator.next() {
let len = self.len();
if len == self.capacity() {
let (lower, _) = iterator.size_hint();
self.reserve(lower.saturating_add(1));
}
unsafe {
ptr::write(self.get_unchecked_mut(len), element);
// NB can't overflow since we would have had to alloc the address space
self.set_len(len + 1);
}
}
}
}

It's a dubious hack!
It implements an iterator with a fake (overestimated) size hint to encourage the produced collection to reserve the eventually appropriate capacity up front.
Cool trick but, it does so by implementing a size hint where the estimated lower bound is greater than the actual number of elements produced (0). If the lower bound is not known, the iterator should return a lower bound of 0. This implementation is arguably very buggy for this reason, and the collection's Extend impl may react with bugginess as a result (but of course not memory unsafety.)

Related

Panic if the capacity of a vector is increased

I am working on implementing a sieve of atkins as my first decently sized program in rust. This algorithm takes a number and returns a vector of all primes below that number. There are two different vectors I need use this function.
BitVec 1 for prime 0 for not prime (flipped back and forth as part of the algorithm).
Vector containing all known primes.
The size of the BitVec is known as soon as the function is called. While the final size of the vector containing all known primes is not known, there are relatively accurate upper limits for the number of primes in a range. Using these I can set the size of the vector to an upper bound then shrink_to_fit it before returning. The upshot of this neither array should ever need to have it's capacity increased while the algorithm is running, and if this happens something has gone horribly wrong with the algorithm.
Therefore, I would like my function to panic if the capacity of either the vector or the bitvec is changed during the running of the function. Is this possible and if so how would I be best off implementing it?
Thanks,
You can assert that the vecs capacity() and len() are different before each push:
assert_ne!(v.capacity(), v.len());
v.push(value);
If you want it done automatically you'd have to wrap your vec in a newtype:
struct FixedSizeVec<T>(Vec<T>);
impl<T> FixedSizeVec<T> {
pub fn push(&mut self, value: T) {
assert_ne!(self.0.len(), self.0.capacity())
self.0.push(value)
}
}
To save on forwarding unchanged methods you can impl Deref(Mut) for your newtype.
use std::ops::{Deref, DerefMut};
impl<T> Deref for FixedSizeVec<T> {
type Target = Vec<T>;
fn deref(&self) -> &Vec<T> {
&self.0
}
}
impl<T> DerefMut for FixedSizeVec<T> {
fn deref_mut(&mut self) -> &mut Vec<T> {
&mut self.0
}
}
An alternative to the newtype pattern is to create a new trait with a method that performs the check, and implement it for the vector like so:
trait PushCheck<T> {
fn push_with_check(&mut self, value: T);
}
impl<T> PushCheck<T> for std::vec::Vec<T> {
fn push_with_check(&mut self, value: T) {
let prev_capacity = self.capacity();
self.push(value);
assert!(prev_capacity == self.capacity());
}
}
fn main() {
let mut v = Vec::new();
v.reserve(4);
dbg!(v.capacity());
v.push_with_check(1);
v.push_with_check(1);
v.push_with_check(1);
v.push_with_check(1);
// This push will panic
v.push_with_check(1);
}
The upside is that you aren't creating a new type, but the obvious downside is you need to remember to use the newly defined method.

How do I avoid incurring in lifetime issues when refactoring a function?

Playground if you want to jump directly into the code.
Problem
I'm trying to implement a function filter_con<T, F>(v: Vec<T>, predicate: F) that allows concurrent filter on a Vec, via async predicates.
That is, instead of doing:
let arr = vec![...];
let arr_filtered = join_all(arr.into_iter().map(|it| async move {
if some_getter(&it).await > some_value {
Some(it)
} else {
None
}
}))
.await
.into_iter()
.filter_map(|it| it)
.collect::<Vec<T>>()
every time I need to filter for a Vec, I want to be able to:
let arr = vec![...];
let arr_filtered = filter_con(arr, |it| async move {
some_getter(&it).await > some_value
}).await
Tentative implementation
I've extracted the function into its own but I am incurring in lifetime issues
async fn filter_con<T, B, F>(arr: Vec<T>, predicate: F) -> Vec<T>
where
F: FnMut(&T) -> B,
B: futures::Future<Output = bool>,
{
join_all(arr.into_iter().map(|it| async move {
if predicate(&it).await {
Some(it)
} else {
None
}
}))
.await
.into_iter()
.filter_map(|p| p)
.collect::<Vec<_>>()
}
error[E0507]: cannot move out of a shared reference
I don't know what I'm moving out of predicate?
For more details, see the playground.
You won't be able to make the predicate an FnOnce, because, if you have 10 items in your Vec, you'll need to call the predicate 10 times, but an FnOnce only guarantees it can be called once, which could lead to something like this:
let vec = vec![1, 2, 3];
let has_drop_impl = String::from("hello");
filter_con(vec, |&i| async {
drop(has_drop_impl);
i < 5
}
So F must be either an FnMut or an Fn. The standard library Iterator::filter takes an FnMut, though this can be a source of confusion (it is the captured variables of the closure that need a mutable reference, not the elements of the iterator).
Because the predicate is an FnMut, any caller needs to be able to get an &mut F. For Iterator::filter, this can be used to do something like this:
let vec = vec![1, 2, 3];
let mut count = 0;
vec.into_iter().filter(|&x| {
count += 1; // this line makes the closure an `FnMut`
x < 2
})
However, by sending the iterator to join_all, you are essentially allowing your async runtime to schedule these calls as it wants, potentially at the same time, which would cause an aliased &mut T, which is always undefined behaviour. This issue has a slightly more cut down version of the same issue https://github.com/rust-lang/rust/issues/69446.
I'm still not 100% on the details, but it seems the compiler is being conservative here and doesn't even let you create the closure in the first place to prevent soundness issues.
I'd recommend making your function only accept Fns. This way, your runtime is free to call the function however it wants. This does means that your closure cannot have mutable state, but this is unlikely to be a problem in a tokio application. For the counting example, the "correct" solution is to use an AtomicUsize (or equivalent), which allows mutation via shared reference. If you're referencing mutable state in your filter call, it should be thread safe, and thread safe data structures generally allow mutation via shared reference.
Given that restriction, the following gives the answer you expect:
async fn filter_con<T, B, F>(arr: Vec<T>, predicate: F) -> Vec<T>
where
F: Fn(&T) -> B,
B: Future<Output = bool>,
{
join_all(arr.into_iter().map(|it| async {
if predicate(&it).await {
Some(it)
} else {
None
}
}))
.await
.into_iter()
.filter_map(|p| p)
.collect::<Vec<_>>()
}
Playground

Extract original slice from SliceStorage and SliceStorageMut

I am working on some software where I am managing a buffer of floats in a Vec<T> where T is either an f32 or f64. I sometimes need to interpret this buffer, or sections of it, as a mathematical vector. To this end, I am taking advantage of MatrixSlice and friends in nalgebra.
I can create a DVectorSliceMut, for example, the following way
fn as_vector<'a>(slice: &'a mut [f64]) -> DVectorSliceMut<'a, f64> {
DVectorSliceMut::from(slice)
}
However, sometimes I need to later extract the original slice from the DVectorSliceMut with the original lifetime 'a. Is there a way to do this?
The StorageMut trait has a as_mut_slice member function, but the lifetime of the returned slice is the lifetime of the reference to the Storage implementor, not the original slice. I am okay with a solution which consumes the DVectorSliceMut if necessary.
Update: Methods into_slice and into_slice_mut have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.
Given the current API of nalgebra (v0.27.1) there isn't much that you can do, except:
life with the shorter life-time of StorageMut::as_mut_slice
make a feature request for such a function at nalgebra (which seems you already did)
employ your own unsafe code to make StorageMut::ptr_mut into a &'a mut
You could go with the third option until nalgebra gets update and implement something like this in your own code:
use nalgebra::base::dimension::Dim;
use nalgebra::base::storage::Storage;
use nalgebra::base::storage::StorageMut;
fn into_slice<'a>(vec: DVectorSliceMut<'a, f64>) -> &'a mut [f64] {
let mut inner = vec.data;
// from nalgebra
// https://docs.rs/nalgebra/0.27.1/src/nalgebra/base/matrix_slice.rs.html#190
let (nrows, ncols) = inner.shape();
if nrows.value() != 0 && ncols.value() != 0 {
let sz = inner.linear_index(nrows.value() - 1, ncols.value() - 1);
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), sz + 1) }
} else {
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), 0) }
}
}
Methods into_slice and into_slice_mut which return the original slice have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}
Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.

How to achieve equivalent of take_while on a slice?

Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa
First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.
It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.

Resources