I'm going through a couple of Rust examples, and there's a particular snippet of code which I don't really understand how it works. In particular, this example of Higher Order Functions. My focus is on this snippet of code:
let sum_of_squared_odd_numbers: u32 =
(0..).map(|n| n * n) // All natural numbers squared
.take_while(|&n| n < upper) // Below upper limit
.filter(|n| is_odd(*n)) // That are odd
.fold(0, |sum, i| sum + i); // Sum them
Here are my questions:
How does the compiler know when (0..) ends? Is the loop unrolled at compile time and are all the lambdas evaluated?
Isn't this extremely memory inefficient compared to the imperative version? For example (0..).map(|n| n * n) alone would end up taking O(n) memory.
How does the compiler know when (0..) ends?
The compiler doesn't know at all. That is a range literal, specifically a RangeFrom. Note that it implements the Iterator trait. The core piece of Iterator is next:
fn next(&mut self) -> Option<Self::Item>
That is, given a mutable borrow to the iterator, it can return another item (Some) or signal that there are no more items (None). It is completely possible to have iterators that go on forever.
In this particular example:
The range will yield every unsigned 32-bit number before stopping †.
The map will stop when the underlying iterator stops.
The take_while will stop when the predicate fails or the underlying iterator stops.
The filter will stop when the underlying iterator stops.
The fold will stop when the underlying iterator stops.
Isn't this extremely memory inefficient compared to the imperative version?
Nope! In fact, the compiler is very likely to compile this to the same code as the imperative version! You'd want to check the LLVM IR or assembly to be 100% sure, but Rust's monomorphization capabilities combined with LLVM's optimizer do some pretty amazing things.
Each iterator adapter pulls just enough items from the previous adapter to calculate the next value. In your example, I'd expect a constant memory allocation for the entire process.
The only component of the pipeline that requires any extra space would be the fold, and it just needs an accumulator value that is an u32. All the other adapters have no extra state.
An important thing to note is that calling the map, filter, and take_while iterator adaptors doesn't do any iterator computation at that point in time. They simply return new objects:
// Note the type is
// Filter<TakeWhile<Map<RangeFrom<_>, [closure]>, [closure]>, [closure]>
let () =
(0..)
.map(|n| n * n)
.take_while(|&n| n < 20)
.filter(|n| n % 2 == 0);
// At this point, we still haven't even looked at a single value
Whenever you call next on the final adaptor, each layer of the adaptor stack does enough work to get the next value. In the original example, fold is an iterator terminator that consumes the entire iterator, calling next until there are no more values.
† As bluss points out, you don't really want to try and go past the maximum value of a range, as it will either panic or loop forever, depending on if it is built in debug or release mode.
The compiler doesn't know when (0..) ends. However the iterators are lazy (as is mentioned on the page you linked to) and the the .take_while(|&n| n < upper) statement will stop the sequence as soon as n is greater than or equal to upper
I am not a rust expert, but the comment
// All natural numbers squared
tells me, that the list (or stream, enumeration, whatever) cannot be fully evaluated. So I would put my bet on some kind of lazy evaluation.
In that case, the range would carry an internal state and upon every usage the next element is computed. Catamorphisms (fold, map) are then implemented by "storing" the respective implementation instead of evaluating it directly.
update: I overlooked the take_while part. This method seems to be responsible (also according to the comment), to actually force the evaluation. Everything before just computes the range abstract, i.e. without any concrete elements. This is possible because functions can be composed.
The standard example for such a behavior is haskell
Related
The document says that get and insert for HashMap (not HashSet) are Ο(1)-like, but not for HashSet or len.
What is the computational complexity of HashSet::len?
Usually, the computational complexity of len is Ο(1). Is there a statement that indicates this?
https://doc.rust-lang.org/stable/std/collections/index.html#maps
Had to go down a bit of a rabbit hole for this one. In short, reading the source code from std::collections::HashMap indicates that the standard HashMap inherits its len() functionality from the crate hashbrown, which is a Rust port of the Google HashTable variant, SwissTable (github). Tracking down the implementation of len() in this crate leads down to the underlying class, RawTable, which contains an instance of RawTableInner<A>, generic for the entry type A. This struct contains a slot called items : usize which is returned whenever len() is called. This indicates that the len() function simply returns the value of an internally stored integer count which keeps track of the number of entries.
Overall, this suggests that the time complexity of len() will be O(1), as it isn't doing any iteration or counting, but rather simply return the value of an entry counter which has been maintained over the course of the HashTable's construction.
If one uses the standard .flatten().collect::<Box<[T]>>() on an Iterator<Item=&[T]> where T: Copy, does it:
perform a single allocation; and
use memcpy to copy each item to the destination
or does it do something less efficient?
Box<[T]> does not implement FromIterator<&T>, so I'll assume your actual inner iterator is something that yields owned Ts.
FromIterator<T> for Box<[T]> forwards to Vec<T>, which uses size_hint() to reserve space for lower + 1 items, and reallocates as it grows beyond that (moving elements as necessary). So the question is, what does Flatten<I> return for size_hint?
The implementation of Iterator::size_hint for Flatten<I> forwards to the internal struct FlattenCompat<I>, which is a little complicated because it supports double-ended iteration, but ultimately returns (0, None) if the outer iterator has not been advanced or exhausted.
So the answer to your question is: it does something less efficient. Namely, (unless you have already called next or next_back on the iterator at least once) it creates an empty Vec<T> and grows it progressively according to whatever growth strategy Vec uses (which is unspecified, but guaranteed by the documentation to result in O(1) amortized push).
This isn't an artificial limitation; it is fundamental to the way Flatten works. The only way you could pre-calculate the size of the flattened iterator is by exhausting the outer iterator and adding up all the inner size_hints. This is a bad idea both because it doesn't always work (the inner iterators may not return useful size_hints) and because you also have to find a way to keep the inner iterators around after exhausting the outer one; there's no solution that would be acceptable for a general purpose iterator adapter.
If you know something about your particular iterator that enables you to know what the final size should be, you can reserve the allocation yourself by calling Vec::with_capacity and use Extend to fill it from the flattened iterator, rather than using collect.
I have an iterator and I would like to fold it with a nice method (say Iterator::sum):
let it = ...;
let sum = it.sum::<u64>();
Then I notice that I also need to know the number of elements in the iterator. I could write a for loop and do the counting and summing up manually, but that's not nice since I have to change a potentially long iterator adapter chain and all of that. Additionally, in my real code I'm not using sum, but a more complex "folding method" which logic I don't want to replicate.
I had the idea to (ab)use Iterator::inspect:
let it = ...;
let mut count = 0;
let sum = it.inspect(|_| count += 1).sum::<u64>();
This works, but does it work by coincidence or is this behavior guaranteed? The documentation of inspect mentions that the closure is called for each element, but also states that it's mostly used as debugging tool. I'm not sure if using it this way in production code is a good idea.
I'd say it's guaranteed, but you'll never find it explicitly stated as such. As you mention, the documentation states:
Do something with each element of an iterator, passing the value on.
Since the function guarantees to run the closure for each element, and the language guarantees what happens when a closure is run (by definition of a closure), the behavior is safe to rely on.
That being said, once you have one or more side-effects, it might be better to eschew heavy chaining and move to a boring for loop for readability, but that will depend on the exact case.
Rust supports two methods for accessing the elements of a vector:
let mut v = vec![1, 2, 3];
let first_element = &v[0];
let second_element = v.get(1);
The get() method returns an Option type, which seems like a useful safety feature. The C-like syntax &v[0] seems shorter to type, but gives up the safety benefits, since invalid reads cause a run-time error rather than producing an indication that the read was out of bounds.
It's not clear to me when I would want to use the direct access approach, because it seems like the only advantage is that it's quicker to type (I save 3 characters). Is there some other advantage (perhaps a speedup?) that I'm not seeing? I guess I would save the conditional of a match expression, but that doesn't seem like it offers much benefit compared to the costs.
Neither of them is quicker because they both do bounds checks. In fact, your question is quite generic because there are other pairs of methods where one of them panics while the other returns an option, such as String::reserve vs String::try_reserve.
If you are sure that you are in bounds, use the brackets version. This is only a syntactic shortcut for get().unwrap().
If you are unsure of this, use the get() method and do your check.
If you critically need maximum speed and you cannot use an iterator and you have determined through benchmarks that the indexing is the bottleneck and you are sure to be in bounds, you can use the get_unchecked() method. Be careful about this because it is unsafe: it is always better to not have any unsafe block in your code.
Just a little bit of advice: if you are concerned by your program performance, avoid using those methods and prefer to use iterators as much as you can. For example, the second example is faster than the first one because in the first case there are one million bounds checks:
let v: Vec<_> = (0..1000_000).collect();
for idx in 0..1000_000 {
// do something with v[idx]
}
for num in &v {
// do something with num
}
Some times you could want to avoid/minimize the garbage collector, so I want to be sure about how to do it.
I think that the next one is correct:
Declare variables at the beginning of the function.
To use array instead of slice.
Any more?
To minimize garbage collection in Go, you must minimize heap allocations. To minimize heap allocations, you must understand when allocations happen.
The following things always cause allocations (at least in the gc compiler as of Go 1):
Using the new built-in function
Using the make built-in function (except in a few unlikely corner cases)
Composite literals when the value type is a slice, map, or a struct with the & operator
Putting a value larger than a machine word into an interface. (For example, strings, slices, and some structs are larger than a machine word.)
Converting between string, []byte, and []rune
As of Go 1.3, the compiler special cases this expression to not allocate: m[string(b)], where m is a map and b is a []byte
Converting a non-constant integer value to a string
defer statements
go statements
Function literals that capture local variables
The following things can cause allocations, depending on the details:
Taking the address of a variable. Note that addresses can be taken implicitly. For example a.b() might take the address of a if a isn't a pointer and the b method has a pointer receiver type.
Using the append built-in function
Calling a variadic function or method
Slicing an array
Adding an element to a map
The list is intended to be complete and I'm reasonably confident in it, but am happy to consider additions or corrections.
If you're uncertain of where your allocations are happening, you can always profile as others suggested or look at the assembly produced by the compiler.
Avoiding garbage is relatively straight forward. You need to understand where the allocations are being made and see if you can avoid the allocation.
First, declaring variables at the beginning of a function will NOT help. The compiler does not know the difference. However, human's will know the difference and it will annoy them.
Use of an array instead of a slice will work, but that is because arrays (unless dereferenced) are put on the stack. Arrays have other issues such as the fact that they are passed by value (copied) between functions. Anything on the stack is "not garbage" since it will be freed when the function returns. Any pointer or slice that may escape the function is put on the heap which the garbage collector must deal with at some point.
The best thing you can do is avoid allocation. When you are done with large bits of data which you don't need, reuse them. This is the method used in the profiling tutorial on the Go blog. I suggest reading it.
Another example besides the one in the profiling tutorial: Lets say you have an slice of type []int named xs. You continually append to the []int until you reach a condition and then you reset it so you can start over. If you do xs = nil, you are now declaring the underlying array of the slice as garbage to be collected. Append will then reallocate xs the next time you use it. If instead you do xs = xs[:0], you are still resetting it but keeping the old array.
For the most part, trying to avoid creating garbage is premature optimization. For most of your code it does not matter. But you may find every once in a while a function which is called a great many times that allocates a lot each time it is run. Or a loop where you reallocate instead of reusing. I would wait until you see the bottle neck before going overboard.