I am trying to benchmark getting keys from a Rust hash map. I have the following benchmark:
#[bench]
fn rust_get(b: &mut Bencher) {
let (hash, keys) =
get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
let mut keys = test::black_box(keys);
b.iter(|| {
for k in keys.drain(..) {
hash.get(&k);
}
});
}
where get_random_hash is defined as:
fn get_random_hash<T>(
new: &Fn(usize) -> T,
insert: &Fn(&mut T, String, usize) -> (),
) -> (T, Vec<String>) {
let mut keys = Vec::with_capacity(HASH_SIZE);
let mut hash = new(HASH_CAPACITY);
for i in 0..HASH_SIZE {
let k: String = format!("{}", Uuid::new_v4());
keys.push(k.clone());
insert(&mut hash, k, i);
}
return (hash, keys);
}
and rust_insert_fn is:
fn rust_insert_fn(map: &mut HashMap<String, usize>, key: String, value: usize) {
map.insert(key, value);
}
However, when I run the benchmark, it is clearly optimized out:
test benchmarks::benchmarks::rust_get ... bench: 1 ns/iter (+/- 0)
I thought test::black_box would solve the problem but it doesn't look like it does. I have even tried wrapping thehash.get(&k) in the for loop withtest::black_box` but that still optimizes the code. How should I correctly get the code to run without being optimized out?
EDIT - Even the following does optimizes out the get operation:
#[bench]
fn rust_get(b: &mut Bencher) {
let (hash, keys) = get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
let mut keys = test::black_box(keys);
b.iter(|| {
let mut n = 0;
for k in keys.drain(..) {
hash.get(&k);
n += 1;
};
return n;
});
}
Interestingly, the following benchmarks work:
#[bench]
fn rust_get_random(b: &mut Bencher) {
let (hash, _) = get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
b.iter(|| {
for _ in 0..HASH_SIZE {
hash.get(&format!("{}", Uuid::new_v4()));
}
});
}
#[bench]
fn rust_insert(b: &mut Bencher) {
b.iter(|| {
let mut hash = HashMap::with_capacity(HASH_CAPACITY);
for i in 0..HASH_SIZE {
let k: String = format!("{}", Uuid::new_v4());
hash.insert(k, i);
}
});
}
but this also does not:
#[bench]
fn rust_del(b: &mut Bencher) {
let (mut hash, keys) = get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
let mut keys = test::black_box(keys);
b.iter(|| {
for k in keys.drain(..) {
hash.remove(&k);
};
});
}
Here is the full gist.
How does a compiler optimizer work?
An optimizer is nothing more than a pipeline of analyses and transformations. Each individual analysis or transformation is relatively simple, and the optimal order to apply them is unknown and generally determined by heuristics.
How does this affect my benchmark?
Benchmarks are complicated in that in general you wish to measure optimized code, but at the same time some analyses or transformations may remove the code you were interested in rendering the benchmark useless.
It is therefore important to have a passing acquaintance with the analyses and transformation passes of the particular optimizer you are using so as to be able to understand:
which ones are undesirable,
how to foil them.
As mentioned, most passes are relatively simple, and therefore foiling them is relatively simple as well. The difficulty lies in the fact that there is a hundred or more of them and you have to know which one is kicking in to be able to foil it.
What optimizations am I running afoul of?
There are a few specific optimizations which very often play often with benchmarks:
Constant Propagation: allows evaluating part of the code at compile-time,
Loop Invariant Code Motion: allows lifting the evaluation of some piece of code outside the loop,
Dead Code Elimination: removes code that is not useful.
What? How dare the optimizer mangle my code so?
The optimizer operates under the so-called as-if rule. This basic rule allows the optimizer to perform any transformation which does not change the output of the program. That is, it should not change the observable behavior of the program in general.
On top of that, a few changes are generally explicitly allowed. The most obvious being that the run-time is expected to shrink, this in turn means that thread interleaving may differ, and some languages give even more wiggle room.
I used black_box!
What is black_box? It's a function whose definition is specifically opaque to the optimizer. This has some implications on the optimizations the compiler is allowed to perform since it may have side-effects. This therefore mean:
the transformed code must perform the very same number of calls to black_box than the original code,
the transformed code must perform said calls in the same order with regard to the passed in arguments,
no assumption can be made on the value returned by black_box.
Thus, surgical use of black_box can foil certain optimizations. Blind use, however, may not foil the right ones.
What optimizations am I running afoul of?
Let's start from the naive code:
#[bench]
fn rust_get(b: &mut Bencher) {
let (hash, mut keys): (HashMap<String, usize>, _) =
get_random_hash(&HashMap::with_capacity, &rust_insert_fn);
b.iter(|| {
for k in keys.drain(..) {
hash.get(&k);
}
});
}
The assumption is that the loop inside b.iter() will iterate over all keys and perform a hash.get() for each of them:
The result of hash.get() is unused,
hash.get() is a pure function, meaning that is has no side-effect.
Thus, this loop can be rewritten as:
b.iter(|| { for k in keys.drain(..) {} })
We are running afoul of Dead Code Elimination (or some variant): the code serves no purpose, thus it is eliminated.
It may even be that the compiler is smart enough to realize that for k in keys.drain(..) {} can be optimized into drop(keys).
A surgical application of black_box can, however, foil DCE:
b.iter(|| {
for k in keys.drain(..) {
black_box(hash.get(&k));
}
});
As per the effects of black_box described above:
the loop can no longer be optimized out, as it would change the number of calls to black_box,
each call to black_box must be performed with the expected argument.
There is still one possible hurdle: Constant Propagation. Specifically if the compiler realizes that all keys yield the same value, it could optimize out hash.get(&k) and replace it by said value.
This can be achieved by obfuscating the keys: let mut keys = black_box(keys);, as you did above, or the map. If you were to benchmark an empty map, the latter would be necessary, here they are equal.
We thus get:
#[bench]
fn rust_get(b: &mut Bencher) {
let (hash, keys): (HashMap<String, usize>, _) =
get_random_hash(&HashMap::with_capacity, &rust_insert_fn);
let mut keys = test::black_box(keys);
b.iter(|| {
for k in keys.drain(..) {
test::black_box(hash.get(&k));
}
});
}
A final tip.
Benchmarks are complicated enough that you should be extra careful to only benchmark what you wish to benchmark.
In this particular case, there are two method calls:
keys.drain(),
hash.get().
Since the benchmark name suggests, to me, that what you aim for is to measure the performance of get, I can only assume that the call to keys.drain(..) is a mistake.
Thus, the benchmark really should be:
#[bench]
fn rust_get(b: &mut Bencher) {
let (hash, keys): (HashMap<String, usize>, _) =
get_random_hash(&HashMap::with_capacity, &rust_insert_fn);
let keys = test::black_box(keys);
b.iter(|| {
for k in &keys {
test::black_box(hash.get(k));
}
});
}
In this instance, this is even more critical in that the closure passed to b.iter() is expected to run multiple times: if you drain the keys the first time, what's left afterward? An empty Vec...
... which may actually be all that is really happening here; since b.iter() runs the closure until its time stabilizes, it may just be draining the Vec in the first run and then time an empty loop.
Related
I have multiple threads doing a computation and want to collect the results into a pre-allocated vector. To turn off the borrow checker, I wrote the function:
fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
let first_elem = vec.as_ptr() as *mut usize;
unsafe { *first_elem.add(idx) = val }
}
With that we can fill a vector concurrently (e.g. using Rayon):
let vec = vec![0; 10];
(0..10).into_par_iter().for_each(|i| set_unsync(&vec, i, i));
It compiles, it works, and even Clippy likes it, but is it sound? After reading about things that appear to be sound but actually are Undefined Behavior, I'm unsure. For example, the documentation of the as_ptr method says:
The caller must also ensure that the memory the pointer
(non-transitively) points to is never written to (except inside an
UnsafeCell) using this pointer or any pointer derived from it.
Strictly speaking, the solution violates this. However, it feels sound to me. If it is not, how can we let multiple threads write to nonoverlapping parts of the same vector without using locks?
Assuming this is your minimal reproducible example:
use rayon::prelude::*;
fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
let first_elem = vec.as_ptr() as *mut usize;
unsafe { *first_elem.add(idx) = val }
}
fn main() {
let input = vec![2, 3, 9];
let output = vec![0; 100];
input.par_iter().for_each(|&i| {
for j in i * 10..(i + 1) * 10 {
set_unsync(&output, j, i);
}
});
dbg!(output);
}
If you are asking of whether this code works and always will work, then I'd answer with yes.
BUT: it violates many rules on how safe and unsafe code should interact with each other.
If you write a function that is not marked unsafe, you indicate that this method can be abused by users in any way possible without causing undefined behaviour (note that "users" here is not just other people, this also means your own code in safe sections). If you cannot guarantee this, you should mark it unsafe, requiring the caller of the function to mark the invocation as unsafe as well, because the caller then again has to make sure he is using your function correctly. And every point in your code that required a programmer to manually prove that it is free of undefined behaviour must require an unsafe as well. If it's possible to have sections that require a human to prove this, but do not require an unsafe, there is something unsound in your code.
In your case, the set_unsync function is not marked unsafe, but the following code causes undefined behaviour:
fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
let first_elem = vec.as_ptr() as *mut usize;
unsafe { *first_elem.add(idx) = val }
}
fn main() {
let output = vec![0; 5];
set_unsync(&output, 100000000000000, 42);
dbg!(output);
}
Not that at no point in your main did you need an unsafe, and yet a segfault is happening here.
Now if you say "but set_unsync is not pub, so no one else can call it. And I, in my par_iter, have ensured that I am using it correctly" - then this is the best indicator that you should mark set_unsync as unsafe. The act of "having to ensure to use the function correctly" is more or less the definition of an unsafe function. unsafe doesn't mean it will break horribly, it just means that the caller has to manually make sure that he is using it correctly, because the compiler can't. It's unsafe from the compiler's point of view.
Here is an example of how your code could be rewritten in a more sound way.
I don't claim that it is 100% sound, because I haven't thought about it enough.
But I hope this demonstrates how to cleanly interface between safe and unsafe code:
use rayon::prelude::*;
// mark as unsafe, as it's possible to provide parameters that
// cause undefined behaviour
unsafe fn set_unsync(vec: &[usize], idx: usize, val: usize) {
let first_elem = vec.as_ptr() as *mut usize;
// No need to use unsafe{} here, as the entire function is already unsafe
*first_elem.add(idx) = val
}
// Does not need to be marked `unsafe` as no combination of parameters
// could cause undefined behaviour.
// Also note that output is marked `&mut`, which is also crucial.
// Mutating data behind a non-mutable reference is also considered undefined
// behaviour.
fn do_something(input: &[usize], output: &mut [usize]) {
input.par_iter().for_each(|&i| {
// This assert is crucial for soundness, otherwise an incorrect value
// in `input` could cause an out-of-bounds access on `output`
assert!((i + 1) * 10 <= output.len());
for j in i * 10..(i + 1) * 10 {
unsafe {
// This is the critical point where we interface
// from safe to unsafe code.
// This call requires the programmer to manually verify that
// `set_unsync` never gets called with dangerous parameters.
set_unsync(&output, j, i);
}
}
});
}
fn main() {
// note that we now have to declare output `mut`, as it should be
let input = vec![2, 3, 9];
let mut output = vec![0; 100];
do_something(&input, &mut output);
dbg!(output);
}
Playground if you want to jump directly into the code.
Problem
I'm trying to implement a function filter_con<T, F>(v: Vec<T>, predicate: F) that allows concurrent filter on a Vec, via async predicates.
That is, instead of doing:
let arr = vec![...];
let arr_filtered = join_all(arr.into_iter().map(|it| async move {
if some_getter(&it).await > some_value {
Some(it)
} else {
None
}
}))
.await
.into_iter()
.filter_map(|it| it)
.collect::<Vec<T>>()
every time I need to filter for a Vec, I want to be able to:
let arr = vec![...];
let arr_filtered = filter_con(arr, |it| async move {
some_getter(&it).await > some_value
}).await
Tentative implementation
I've extracted the function into its own but I am incurring in lifetime issues
async fn filter_con<T, B, F>(arr: Vec<T>, predicate: F) -> Vec<T>
where
F: FnMut(&T) -> B,
B: futures::Future<Output = bool>,
{
join_all(arr.into_iter().map(|it| async move {
if predicate(&it).await {
Some(it)
} else {
None
}
}))
.await
.into_iter()
.filter_map(|p| p)
.collect::<Vec<_>>()
}
error[E0507]: cannot move out of a shared reference
I don't know what I'm moving out of predicate?
For more details, see the playground.
You won't be able to make the predicate an FnOnce, because, if you have 10 items in your Vec, you'll need to call the predicate 10 times, but an FnOnce only guarantees it can be called once, which could lead to something like this:
let vec = vec![1, 2, 3];
let has_drop_impl = String::from("hello");
filter_con(vec, |&i| async {
drop(has_drop_impl);
i < 5
}
So F must be either an FnMut or an Fn. The standard library Iterator::filter takes an FnMut, though this can be a source of confusion (it is the captured variables of the closure that need a mutable reference, not the elements of the iterator).
Because the predicate is an FnMut, any caller needs to be able to get an &mut F. For Iterator::filter, this can be used to do something like this:
let vec = vec![1, 2, 3];
let mut count = 0;
vec.into_iter().filter(|&x| {
count += 1; // this line makes the closure an `FnMut`
x < 2
})
However, by sending the iterator to join_all, you are essentially allowing your async runtime to schedule these calls as it wants, potentially at the same time, which would cause an aliased &mut T, which is always undefined behaviour. This issue has a slightly more cut down version of the same issue https://github.com/rust-lang/rust/issues/69446.
I'm still not 100% on the details, but it seems the compiler is being conservative here and doesn't even let you create the closure in the first place to prevent soundness issues.
I'd recommend making your function only accept Fns. This way, your runtime is free to call the function however it wants. This does means that your closure cannot have mutable state, but this is unlikely to be a problem in a tokio application. For the counting example, the "correct" solution is to use an AtomicUsize (or equivalent), which allows mutation via shared reference. If you're referencing mutable state in your filter call, it should be thread safe, and thread safe data structures generally allow mutation via shared reference.
Given that restriction, the following gives the answer you expect:
async fn filter_con<T, B, F>(arr: Vec<T>, predicate: F) -> Vec<T>
where
F: Fn(&T) -> B,
B: Future<Output = bool>,
{
join_all(arr.into_iter().map(|it| async {
if predicate(&it).await {
Some(it)
} else {
None
}
}))
.await
.into_iter()
.filter_map(|p| p)
.collect::<Vec<_>>()
}
Playground
I am developing an algorithm in Rust that I want to multi-thread. The nature of the algorithm is that it produces solutions to overlapping subproblems, hence why I am looking for a way to achieve multi-threaded memoisation.
An implementation of (single-threaded) memoisation is presented by Pritchard in this article.
I would like to have this functionality extended such that:
Whenever the underlying function must be invoked, including recursively, the result is evaluated asynchronously on a new thread.
Continuing on from the previous point, suppose we have some memoised function f, and f(x) that needs to recursively invoke f(x1), f(x2), … f(xn). It should be possible for all of these recursive invocations to be evaluated concurrently on separate threads.
If the memoised function is called on an input whose result is currently being evaluated, the current thread should block on this thread, and somehow obtain the result after it is released. This ensures that we don't end up with multiple threads attempting to evaluate the same result.
There is a means of forcing f(x) to be evaluated and cached (if it isn't already) without blocking the current thread. This allows the programmer to preemptively begin the evaluation of a result on a particular value that they know will be (or is likely to be) needed later.
One way you could do this is by storing a HashMap, where the key is the paramaters to f and the value is the receiver of a oneshot message containing the result. Then for any value that you need:
If there is already a receiver in the map, await it.
Otherwise, spawn a future to start calculating the result, and store the receiver in the map.
Here is a very contrived example that took way longer than it should have, but successfully runs (Playground):
use futures::{
future::{self, BoxFuture},
prelude::*,
ready,
};
use std::{
collections::HashMap,
pin::Pin,
sync::Arc,
task::{Context, Poll},
};
use tokio::sync::{oneshot, Mutex};
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct MemoInput(usize);
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct MemoReturn(usize);
/// This is necessary in order to make a concrete type for the `HashMap`.
struct OneshotReceiverUnwrap<T>(oneshot::Receiver<T>);
impl<T> Future for OneshotReceiverUnwrap<T> {
type Output = T;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// Don't worry too much about this part
Poll::Ready(ready!(Pin::new(&mut self.0).poll(cx)).unwrap())
}
}
type MemoMap = Mutex<HashMap<MemoInput, future::Shared<OneshotReceiverUnwrap<MemoReturn>>>>;
/// Compute (2^n)-1, super inefficiently.
fn compute(map: Arc<MemoMap>, x: MemoInput) -> BoxFuture<'static, MemoReturn> {
async move {
// First, get all dependencies.
let dependencies: Vec<MemoReturn> = future::join_all({
let map2 = map.clone();
let mut map_lock = map.lock().await;
// This is an iterator of futures that resolve to the results of the
// dependencies.
(0..x.0).map(move |i| {
let key = MemoInput(i);
let key2 = key.clone();
(*map_lock)
.entry(key)
.or_insert_with(|| {
// If the value is not currently being calculated (ie.
// is not in the map), start calculating it
let (tx, rx) = oneshot::channel();
let map3 = map2.clone();
tokio::spawn(async move {
// Compute the value, then send it to the receiver
// that we put in the map. This will awake all
// threads that were awaiting it.
tx.send(compute(map3, key2).await).unwrap();
});
// Return a shared future so that multiple threads at a
// time can await it
OneshotReceiverUnwrap(rx).shared()
})
.clone() // Clone one instance of the shared future for us
})
})
.await;
// At this point, all dependencies have been resolved!
let result = dependencies.iter().map(|r| r.0).sum::<usize>() + x.0;
MemoReturn(result)
}
.boxed() // Box in order to prevent a recursive type
}
#[tokio::main]
async fn main() {
let map = Arc::new(MemoMap::default());
let result = compute(map, MemoInput(10)).await.0;
println!("{}", result); // 1023
}
Note: this could certainly be better optimized, this is just a POC example.
I can shuffle a regular vector quite simply like this:
extern crate rand;
use rand::Rng;
fn shuffle(coll: &mut Vec<i32>) {
rand::thread_rng().shuffle(coll);
}
The problem is, my code now requires the use of a std::collections::VecDeque instead, which causes this code to not compile.
What's the simplest way of getting around this?
As of Rust 1.48, VecDeque supports the make_contiguous() method. That method doesn't allocate and has complexity of O(n), like shuffling itself. Therefore you can shuffle a VecDeque by calling make_contiguous() and then shuffling the returned slice:
use rand::prelude::*;
use std::collections::VecDeque;
pub fn shuffle<T>(v: &mut VecDeque<T>, rng: &mut impl Rng) {
v.make_contiguous().shuffle(rng);
}
Playground
Historical answer follows below.
Unfortunately, the rand::Rng::shuffle method is defined to shuffle slices. Due to its own complexity constraints a VecDeque cannot store its elements in a slice, so shuffle can never be directly invoked on a VecDeque.
The real requirement of the values argument to shuffle algorithm are finite sequence length, O(1) element access, and the ability to swap elements, all of which VecDeque fulfills. It would be nice if there were a trait that incorporates these, so that values could be generic on that, but there isn't one.
With the current library, you have two options:
Use Vec::from(deque) to copy the VecDeque into a temporary Vec, shuffle the vector, and return the contents back to VecDeque. The complexity of the operation will remain O(n), but it will require a potentially large and costly heap allocation of the temporary vector.
Implement the shuffle on VecDeque yourself. The Fisher-Yates shuffle used by rand::Rng is well understood and easy to implement. While in theory the standard library could switch to a different shuffle algorithm, that is not likely to happen in practice.
A generic form of the second option, using a trait to express the len-and-swap requirement, and taking the code of rand::Rng::shuffle, could look like this:
use std::collections::VecDeque;
// Real requirement for shuffle
trait LenAndSwap {
fn len(&self) -> usize;
fn swap(&mut self, i: usize, j: usize);
}
// A copy of an earlier version of rand::Rng::shuffle, with the signature
// modified to accept any type that implements LenAndSwap
fn shuffle(values: &mut impl LenAndSwap, rng: &mut impl rand::Rng) {
let mut i = values.len();
while i >= 2 {
// invariant: elements with index >= i have been locked in place.
i -= 1;
// lock element i in place.
values.swap(i, rng.gen_range(0..=i));
}
}
// VecDeque trivially fulfills the LenAndSwap requirement, but
// we have to spell it out.
impl<T> LenAndSwap for VecDeque<T> {
fn len(&self) -> usize {
self.len()
}
fn swap(&mut self, i: usize, j: usize) {
self.swap(i, j)
}
}
fn main() {
let mut v: VecDeque<u64> = [1, 2, 3, 4].into_iter().collect();
shuffle(&mut v, &mut rand::thread_rng());
println!("{:?}", v);
}
You can use make_contiguous (documentation) to create a mutable slice that you can then shuffle:
use rand::prelude::*;
use std::collections::VecDeque;
fn main() {
let mut deque = VecDeque::new();
for p in 0..10 {
deque.push_back(p);
}
deque.make_contiguous().shuffle(&mut rand::thread_rng());
println!("Random deque: {:?}", deque)
}
Playground Link if you want to try it out online.
Shuffle the components of the VecDeque separately, starting with VecDeque.html::as_mut_slices:
use rand::seq::SliceRandom; // 0.6.5;
use std::collections::VecDeque;
fn shuffle(coll: &mut VecDeque<i32>) {
let mut rng = rand::thread_rng();
let (a, b) = coll.as_mut_slices();
a.shuffle(&mut rng);
b.shuffle(&mut rng);
}
As Lukas Kalbertodt points out, this solution never swaps elements between the two slices so a certain amount of randomization will not happen. Depending on your needs of randomization, this may be unnoticeable or a deal breaker.
From the Rust standard library implementation of unzip:
fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB) where
FromA: Default + Extend<A>,
FromB: Default + Extend<B>,
Self: Sized + Iterator<Item=(A, B)>,
{
struct SizeHint<A>(usize, Option<usize>, marker::PhantomData<A>);
impl<A> Iterator for SizeHint<A> {
type Item = A;
fn next(&mut self) -> Option<A> { None }
fn size_hint(&self) -> (usize, Option<usize>) {
(self.0, self.1)
}
}
let (lo, hi) = self.size_hint();
let mut ts: FromA = Default::default();
let mut us: FromB = Default::default();
ts.extend(SizeHint(lo, hi, marker::PhantomData));
us.extend(SizeHint(lo, hi, marker::PhantomData));
for (t, u) in self {
ts.extend(Some(t));
us.extend(Some(u));
}
(ts, us)
}
These two lines:
ts.extend(SizeHint(lo, hi, marker::PhantomData));
us.extend(SizeHint(lo, hi, marker::PhantomData));
don't actually extend ts or us by anything, since the next method of SizeHint returns None. What's the purpose of doing so?
It is a cool trick. By giving this size hint, it gives ts and us the chance to reserve the space for the extend calls in the loop. According to the documentation
size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g. omit bounds checks in unsafe code. An incorrect implementation of size_hint() should not lead to memory safety violations.
Note that the creation of SizeHint is necessary because the extend call in for loop is made with a Some value (Optional implements the Iterator trait), and the size_hint for a Some value is (1, Some(1)). That doesn't help with pre allocation.
But looking at the code for Vec, this will have no effect (neither in HashMap and VecDeque). Others Extend implementations may be different.
The execution of ts.extend(SizeHint(lo, hi, marker::PhantomData)); does not trigger a resize, since next returns None. Maybe some one should write a patch.
impl<T> Vec<T> {
fn extend_desugared<I: Iterator<Item = T>>(&mut self, mut iterator: I) {
// This function should be the moral equivalent of:
//
// for item in iterator {
// self.push(item);
// }
while let Some(element) = iterator.next() {
let len = self.len();
if len == self.capacity() {
let (lower, _) = iterator.size_hint();
self.reserve(lower.saturating_add(1));
}
unsafe {
ptr::write(self.get_unchecked_mut(len), element);
// NB can't overflow since we would have had to alloc the address space
self.set_len(len + 1);
}
}
}
}
It's a dubious hack!
It implements an iterator with a fake (overestimated) size hint to encourage the produced collection to reserve the eventually appropriate capacity up front.
Cool trick but, it does so by implementing a size hint where the estimated lower bound is greater than the actual number of elements produced (0). If the lower bound is not known, the iterator should return a lower bound of 0. This implementation is arguably very buggy for this reason, and the collection's Extend impl may react with bugginess as a result (but of course not memory unsafety.)