Efficiency of flattening and collecting slices

Efficiency of flattening and collecting slices - rust

If one uses the standard .flatten().collect::<Box<[T]>>() on an Iterator<Item=&[T]> where T: Copy, does it:
perform a single allocation; and
use memcpy to copy each item to the destination
or does it do something less efficient?

Box<[T]> does not implement FromIterator<&T>, so I'll assume your actual inner iterator is something that yields owned Ts.
FromIterator<T> for Box<[T]> forwards to Vec<T>, which uses size_hint() to reserve space for lower + 1 items, and reallocates as it grows beyond that (moving elements as necessary). So the question is, what does Flatten<I> return for size_hint?
The implementation of Iterator::size_hint for Flatten<I> forwards to the internal struct FlattenCompat<I>, which is a little complicated because it supports double-ended iteration, but ultimately returns (0, None) if the outer iterator has not been advanced or exhausted.
So the answer to your question is: it does something less efficient. Namely, (unless you have already called next or next_back on the iterator at least once) it creates an empty Vec<T> and grows it progressively according to whatever growth strategy Vec uses (which is unspecified, but guaranteed by the documentation to result in O(1) amortized push).
This isn't an artificial limitation; it is fundamental to the way Flatten works. The only way you could pre-calculate the size of the flattened iterator is by exhausting the outer iterator and adding up all the inner size_hints. This is a bad idea both because it doesn't always work (the inner iterators may not return useful size_hints) and because you also have to find a way to keep the inner iterators around after exhausting the outer one; there's no solution that would be acceptable for a general purpose iterator adapter.
If you know something about your particular iterator that enables you to know what the final size should be, you can reserve the allocation yourself by calling Vec::with_capacity and use Extend to fill it from the flattened iterator, rather than using collect.

Related

How to take only positive elements for decreasing parallel iterator in Rayon

I have a Rayon parallel iterator that generates a decreasing sequence of i32 values. At some moment values in this iterator becomes negative. I want to stop iteration at this point and minimize the amount of needless computations.
Since rayon uses size_hint to split an iterator into parts I probably should implement some trait for ParallelIterator and make something like binary search.
I have tried implementing this trait, but it turns out to be quite a difficult task. Is there an easier way?

What is the most idomatic way to write iterators that map an uncertain number of input items to other objects in Rust?

I'm trying to implement a Lexer. Since lexers emit tokens, I suppose that we can perceive a Lexer as a special iterator that maps certain chunks of chars to Tokens. I therefore expect Lexer to store an Iterator<Item=char> and manipulate that iterator instead of a &str to enable maximum flexibility.
struct Lexer<T: Iterator<Item=char>> {
source: T
}
Yet I find it hard to manipulate the iterator, since almost all iterator adaptors take ownership, and with generics I cannot change the type of T at runtime, unless I switch to Box.
self.source.take_while(|x| x.is_whitespace())
A possible workaround is to require that the iterator implement Clone, use a clone every time I want to transform it, remember how many characters I have seen, and call next that many times. I believe that it is too clumsy.
I wonder if there is an idomatic way to write iterators that map an uncertain number of input items (in this case, chars) into another object (in this case, Tokens)?
The most elegant way I can come up with so far is to use while let etc. which are not so fluent-style-like. I inspected the implementation of GroupBy in itertools and found that they use the while let approach too.

Are there queue and stack collections in Rust?

If we need FIFO or LIFO collections (with basically push, pop and front/back) what should we use in Rust? Something like std::queue or std::stack from C++.

First of all, Rust does not offer (in the Standard library) any collection with guaranteed latency for adding elements: Rust collections may generally allocate memory when adding new elements, and allocating memory may take an unbounded amount of time in the worst case.
That being said, there are two contenders for each case:
a stack may be implemented either on top of Vec or LinkedList (both feature pop_back and push_back)
a queue may be implemented either on top of VecDeque or LinkedList (both feature pop_front and push_back)
The difference between Vec* and LinkedList is that the latter is simplistic: for each call to push_back a memory allocation is made. On the one hand, this is great because it means that the cost of push_back is independent of the number of elements already in the collection, on the other hand... well, a memory allocation may take a really long time.
The former is a bit more complicated:
it has better throughput, thanks to being more cache-friendly
it has additional capacity, guaranteeing non-allocating push_back as long as there is excess capacity
it still maintains amortized O(1) push_back even when not reserving excess capacity ahead of time
In general, I would advise to use Vec for a stack and VecDeque for a queue.

Both VecDeque and LinkedList have push/pop_front/back.

Matthieu M. has it just about perfect. Vec is your stack (LIFO) and VecDeque is a double ended queue that supports all 4 variants (FIFO, FILO, LIFO, and LILO) using:
.push_front(x) | .front() | .pop_front()
.push_back(x) | .back() | .pop_back()
If you're looking to maximize your efficiency, I recommend checking out "Unterstanding Rust’s Vec and its capacity for fast and efficient programs". It goes into a lot more detail about how allocation and reallocation occurs in Vec and VecDeque, but the biggest take away is that if you can predict the maximum number of elements you're going to need in the queue you can use VecDeque::with_capacity(x) if you know when you initialize it, or .reserve_exact(x) if at some point you know exactly how many more slots you're going to need
I strongly recommend checking out the Rust docs on std::collections, it has an excellent list of the most common collections used in Rust, along with suggestions on when to pick each
One last thing, VecDeque isn't part of the default prelude in Rust so if you want to use it you need to include this:
use std::collections::VecDeque;

Box<X> vs move semantics on X

I have an easy question regarding Box<X>.
I understand what it does, it allocates X on the heap.
In C++ you use the new operator to allocate something on the heap so it can outlive the current scope (because if you create something on the stack it goes away at the end of the current block).
But reading Rust's documentation, it looks like you can create something on the stack and still return it taking advantage of the language's move semantics without having to resort to the heap.
Then it's not clear to me when to use Box<X> as opposed to simply X.
I just started reading about Rust so I apologize if I'm missing something obvious.

First of all: C++11 (and newer) has move semantics with rvalue references, too. So your question would also apply to C++. Keep in mind though, that C++'s move semantics are -- unlike Rust's ones -- highly unsafe.
Second: the word "move semantic" somehow hints the absence of a "copy", which is not true. Suppose you have a struct with 100 64-bit integers. If you would transfer an object of this struct via move semantics, those 100 integers will be copied (of course, the compiler's optimizer can often remove those copies, but anyway...). The advantage of move semantics comes to play when dealing with objects that deal with some kind of data on the heap (or pointers in general).
For example, take a look at Vec (similar to C++'s vector): the type itself only contains a pointer and two pointer-sized integer (ptr, len and cap). Those three times 64bit are still copied when the vector is moved, but the main data of the vector (which lives on the heap) is not touched.
That being said, let's discuss the main question: "Why to use Box at all?". There are actually many use cases:
Unsized types: some types (e.g. Trait-objects which also includes closures) are unsized, meaning their size is not known to the compiler. But the compiler has to know the size of each stack frame -- hence those unsized types cannot live on the stack.
Recursive data structures: think of a BinaryTreeNode struct. It saves two members named "left" and "right" of type... BinaryTreeNode? That won't work. So you can box both children so that the compiler knows the size of your struct.
Huge structs: think of the 100 integer struct mentioned above. If you don't want to copy it every time, you can allocate it on the heap (this happens pretty seldom).

There are cases where you can’t return X eg. if X is ?Sized (traits, non-compile-time-sized arrays, etc.). In those cases Box<X> will still work.

How does Rust find out the upper limit of an iterator?

I'm going through a couple of Rust examples, and there's a particular snippet of code which I don't really understand how it works. In particular, this example of Higher Order Functions. My focus is on this snippet of code:
let sum_of_squared_odd_numbers: u32 =
(0..).map(|n| n * n) // All natural numbers squared
.take_while(|&n| n < upper) // Below upper limit
.filter(|n| is_odd(*n)) // That are odd
.fold(0, |sum, i| sum + i); // Sum them
Here are my questions:
How does the compiler know when (0..) ends? Is the loop unrolled at compile time and are all the lambdas evaluated?
Isn't this extremely memory inefficient compared to the imperative version? For example (0..).map(|n| n * n) alone would end up taking O(n) memory.

How does the compiler know when (0..) ends?
The compiler doesn't know at all. That is a range literal, specifically a RangeFrom. Note that it implements the Iterator trait. The core piece of Iterator is next:
fn next(&mut self) -> Option<Self::Item>
That is, given a mutable borrow to the iterator, it can return another item (Some) or signal that there are no more items (None). It is completely possible to have iterators that go on forever.
In this particular example:
The range will yield every unsigned 32-bit number before stopping †.
The map will stop when the underlying iterator stops.
The take_while will stop when the predicate fails or the underlying iterator stops.
The filter will stop when the underlying iterator stops.
The fold will stop when the underlying iterator stops.
Isn't this extremely memory inefficient compared to the imperative version?
Nope! In fact, the compiler is very likely to compile this to the same code as the imperative version! You'd want to check the LLVM IR or assembly to be 100% sure, but Rust's monomorphization capabilities combined with LLVM's optimizer do some pretty amazing things.
Each iterator adapter pulls just enough items from the previous adapter to calculate the next value. In your example, I'd expect a constant memory allocation for the entire process.
The only component of the pipeline that requires any extra space would be the fold, and it just needs an accumulator value that is an u32. All the other adapters have no extra state.
An important thing to note is that calling the map, filter, and take_while iterator adaptors doesn't do any iterator computation at that point in time. They simply return new objects:
// Note the type is
// Filter<TakeWhile<Map<RangeFrom<_>, [closure]>, [closure]>, [closure]>
let () =
(0..)
.map(|n| n * n)
.take_while(|&n| n < 20)
.filter(|n| n % 2 == 0);
// At this point, we still haven't even looked at a single value
Whenever you call next on the final adaptor, each layer of the adaptor stack does enough work to get the next value. In the original example, fold is an iterator terminator that consumes the entire iterator, calling next until there are no more values.
† As bluss points out, you don't really want to try and go past the maximum value of a range, as it will either panic or loop forever, depending on if it is built in debug or release mode.

The compiler doesn't know when (0..) ends. However the iterators are lazy (as is mentioned on the page you linked to) and the the .take_while(|&n| n < upper) statement will stop the sequence as soon as n is greater than or equal to upper

I am not a rust expert, but the comment
// All natural numbers squared
tells me, that the list (or stream, enumeration, whatever) cannot be fully evaluated. So I would put my bet on some kind of lazy evaluation.
In that case, the range would carry an internal state and upon every usage the next element is computed. Catamorphisms (fold, map) are then implemented by "storing" the respective implementation instead of evaluating it directly.
update: I overlooked the take_while part. This method seems to be responsible (also according to the comment), to actually force the evaluation. Everything before just computes the range abstract, i.e. without any concrete elements. This is possible because functions can be composed.
The standard example for such a behavior is haskell

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Efficiency of flattening and collecting slices - rust

If one uses the standard .flatten().collect::<Box<[T]>>() on an Iterator<Item=&[T]> where T: Copy, does it: perform a single allocation; and use memcpy to copy each item to the destination or does it do something less efficient?

Related

How to take only positive elements for decreasing parallel iterator in Rayon

What is the most idomatic way to write iterators that map an uncertain number of input items to other objects in Rust?

Are there queue and stack collections in Rust?

Box<X> vs move semantics on X

How does Rust find out the upper limit of an iterator?

Categories

Resources