Is `iter().map().sum()` as fast as `iter().fold()`? - rust

Does the compiler generate the same code for iter().map().sum() and iter().fold()? In the end they achieve the same goal, but the first code would iterate two times, once for the map and once for the sum.
Here is an example. Which version would be faster in total?
pub fn square(s: u32) -> u64 {
match s {
s # 1...64 => 2u64.pow(s - 1),
_ => panic!("Square must be between 1 and 64")
}
}
pub fn total() -> u64 {
// A fold
(0..64).fold(0u64, |r, s| r + square(s + 1))
// or a map
(1..64).map(square).sum()
}
What would be good tools to look at the assembly or benchmark this?

For them to generate the same code, they'd first have to do the same thing. Your two examples do not:
fn total_fold() -> u64 {
(0..64).fold(0u64, |r, s| r + square(s + 1))
}
fn total_map() -> u64 {
(1..64).map(square).sum()
}
fn main() {
println!("{}", total_fold());
println!("{}", total_map());
}
18446744073709551615
9223372036854775807
Let's assume you meant
fn total_fold() -> u64 {
(1..64).fold(0u64, |r, s| r + square(s + 1))
}
fn total_map() -> u64 {
(1..64).map(|i| square(i + 1)).sum()
}
There are a few avenues to check:
The generated LLVM IR
The generated assembly
Benchmark
The easiest source for the IR and assembly is one of the playgrounds (official or alternate). These both have buttons to view the assembly or IR. You can also pass --emit=llvm-ir or --emit=asm to the compiler to generate these files.
Make sure to generate assembly or IR in release mode. The attribute #[inline(never)] is often useful to keep functions separate to find them easier in the output.
Benchmarking is documented in The Rust Programming Language, so there's no need to repeat all that valuable information.
Before Rust 1.14, these do not produce the exact same assembly. I'd wait for benchmarking / profiling data to see if there's any meaningful impact on performance before I worried.
As of Rust 1.14, they do produce the same assembly! This is one reason I love Rust. You can write clear and idiomatic code and smart people come along and make it equally as fast.
but the first code would iterate two times, once for the map and once for the sum.
This is incorrect, and I'd love to know what source told you this so we can go correct it at that point and prevent future misunderstandings. An iterator operates on a pull basis; one element is processed at a time. The core method is next, which yields a single value, running just enough computation to produce that value.

First, let's fix those example to actually return the same result:
pub fn total_fold_iter() -> u64 {
(1..65).fold(0u64, |r, s| r + square(s))
}
pub fn total_map_iter() -> u64 {
(1..65).map(square).sum()
}
Now, let's develop them, starting with fold. A fold is just a loop and an accumulator, it is roughly equivalent to:
pub fn total_fold_explicit() -> u64 {
let mut total = 0;
for i in 1..65 {
total = total + square(i);
}
total
}
Then, let's go with map and sum, and unwrap the sum first, which is roughly equivalent to:
pub fn total_map_partial_iter() -> u64 {
let mut total = 0;
for i in (1..65).map(square) {
total += i;
}
total
}
It's just a simple accumulator! And now, let's unwrap the map layer (which only applies a function), obtaining something that is roughly equivalent to:
pub fn total_map_explicit() -> u64 {
let mut total = 0;
for i in 1..65 {
let s = square(i);
total += s;
}
total
}
As you can see, the both of them are extremely similar: they have apply the same operations in the same order and have the same overall complexity.
Which is faster? I have no idea. And a micro-benchmark may only tell half the truth anyway: just because something is faster in a micro-benchmark does not mean it is faster in the midst of other code.
What I can say, however, is that they both have equivalent complexity and therefore should behave similarly, ie within a factor of each other.
And that I would personally go for map + sum, because it expresses the intent more clearly whereas fold is the "kitchen-sink" of Iterator methods and therefore far less informative.

Related

Equivalent of __builtin_ia32_addcarryx_u64 in Rust (addition with carry)

There's no much information about __builtin_ia32_addcarryx_u64, only I coud find is this: https://clang.llvm.org/doxygen/adxintrin_8h_source.html
It looks like __builtin_ia32_addcarryx_u64 returns the carry of the addition of 2 u64 numbers as a u8. While I could implement this in Rust, would be nice to have the fast version of it.
What you're looking for is probably the u64::overflowing_add function (which exists for u64 and all other primitive integer types):
let x: u64 = 0x0123456789abcdef;
let y: u64 = 0xfedcba9876543212;
let (sum, carry) = x.overflowing_add(y);
// sum = 1, carry = true
This function returns a tuple with the resulting sum and the carry bit (as a bool), and is implemented using the add_with_overflow compiler intrinsic, so it should be very fast.
If you need to take a carry value as input, you can (as you mentioned in a comment) use the nightly-only u64::carrying_add. It's just implemented in terms of two overflowing_add operations, though, and you can just use these instead on stable Rust if you need it:
impl u64 {
pub const fn carrying_add(self, rhs: u64, carry: bool) -> (u64, bool) {
// note: longer-term this should be done via an intrinsic, but
// this has been shown to generate optimal code for now, and
// LLVM doesn't have an equivalent intrinsic
let (a, b) = self.overflowing_add(rhs);
let (c, d) = a.overflowing_add(carry as u64);
(c, b || d)
}
}

How do I create a non-recursive calculation of factorial using iterators and ranges?

I ran into a Rustlings exercise that keeps bugging me:
pub fn factorial(num: u64) -> u64 {
// Complete this function to return factorial of num
// Do not use:
// - return
// For extra fun don't use:
// - imperative style loops (for, while)
// - additional variables
// For the most fun don't use:
// - recursion
// Execute `rustlings hint iterators4` for hints.
}
A hint to solution tells me...
In an imperative language you might write a for loop to iterate
through multiply the values into a mutable variable. Or you might
write code more functionally with recursion and a match clause. But
you can also use ranges and iterators to solve this in rust.
I tried this approach, but I am missing something:
if num > 1 {
(2..=num).map(|n| n * ( n - 1 ) ??? ).???
} else {
1
}
Do I have to use something like .take_while instead of if?
The factorial is defined as the product of all the numbers from a starting number down to 1. We use that definition and Iterator::product:
fn factorial(num: u64) -> u64 {
(1..=num).product()
}
If you look at the implementation of Product for the integers, you'll see that it uses Iterator::fold under the hood:
impl Product for $a {
fn product<I: Iterator<Item=Self>>(iter: I) -> Self {
iter.fold($one, Mul::mul)
}
}
You could hard-code this yourself:
fn factorial(num: u64) -> u64 {
(1..=num).fold(1, |acc, v| acc * v)
}
See also:
How to sum the values in an array, slice, or Vec in Rust?
How do I sum a vector using fold?
Although using .product() or .fold() is probably the best answer, you can also use .for_each().
fn factorial(num: u64) -> u64 {
let mut x = 1;
(1..=num).for_each(|i| x *= i);
x
}

C-style switch statement with fall-through in Rust [duplicate]

I’m new to Rust, but as a fan of Haskell, I greatly appreciate the way match works in Rust. Now I’m faced with the rare case where I do need fall-through – in the sense that I would like all matching cases of several overlapping ones to be executed. This works:
fn options(stairs: i32) -> i32 {
if stairs == 0 {
return 1;
}
let mut count: i32 = 0;
if stairs >= 1 {
count += options(stairs - 1);
}
if stairs >= 2 {
count += options(stairs - 2);
}
if stairs >= 3 {
count += options(stairs - 3);
}
count
}
My question is whether this is idiomatic in Rust or whether there is a better way.
The context is a question from Cracking the Coding Interview: “A child is running up a staircase with n steps and can hop either 1 step, 2 steps, or 3 steps at a time. Implement a method to count how many possible ways the child can run up the stairs.”
Based on the definition of the tribonacci sequence I found you could write it in a more concise manner like this:
fn options(stairs: i32) -> i32 {
match stairs {
0 => 0,
1 => 1,
2 => 1,
3 => 2,
_ => options(stairs - 1) + options(stairs - 2) + options(stairs - 3)
}
}
I would also recommend changing the funtion definition to only accept positive integers, e.g. u32.
To answer the generic question, I would argue that match and fallthrough are somewhat antithetical.
match is used to be able to perform different actions based on the different patterns. Most of the time, the very values extracted via pattern matching are so different than a fallthrough does not make sense.
A fallthrough, instead, points to a sequence of actions. There are many ways to express sequences: recursion, iteration, ...
In your case, for example, one could use a loop:
for i in 1..4 {
if stairs >= i {
count += options(stairs - i);
}
}
Of course, I find #ljedrz' solution even more elegant in this particular instance.
I would advise to avoid recursion in Rust. It is better to use iterators:
struct Trib(usize, usize, usize);
impl Default for Trib {
fn default() -> Trib {
Trib(1, 0, 0)
}
}
impl Iterator for Trib {
type Item = usize;
fn next(&mut self) -> Option<usize> {
let &mut Trib(a, b, c) = self;
let d = a + b + c;
*self = Trib(b, c, d);
Some(d)
}
}
fn options(stairs: usize) -> usize {
Trib::default().take(stairs + 1).last().unwrap()
}
fn main() {
for (i, v) in Trib::default().enumerate().take(10) {
println!("i={}, t={}", i, v);
}
println!("{}", options(0));
println!("{}", options(1));
println!("{}", options(3));
println!("{}", options(7));
}
Playground
Your code looks pretty idiomatic to me, although #ljedrz has suggested an even more elegant rewriting of the same strategy.
Since this is an interview problem, it's worth mentioning that neither solution is going to be seen as an amazing answer because both solutions take exponential time in the number of stairs.
Here is what I might write if I were trying to crack a coding interview:
fn options(stairs: usize) -> u128 {
let mut o = vec![1, 1, 2, 4];
for _ in 3..stairs {
o.push(o[o.len() - 1] + o[o.len() - 2] + o[o.len() - 3]);
}
o[stairs]
}
Instead of recomputing options(n) each time, we cache each value in an array. So, this should run in linear time instead of exponential time. I also switched to a u128 to be able to return solutions for larger inputs.
Keep in mind that this is not the most efficient solution because it uses linear space. You can get away with using constant space by only keeping track of the final three elements of the array. I chose this as a compromise between conciseness, readability, and efficiency.

Idiomatic implementation of the tribonacci sequence in Rust

I’m new to Rust, but as a fan of Haskell, I greatly appreciate the way match works in Rust. Now I’m faced with the rare case where I do need fall-through – in the sense that I would like all matching cases of several overlapping ones to be executed. This works:
fn options(stairs: i32) -> i32 {
if stairs == 0 {
return 1;
}
let mut count: i32 = 0;
if stairs >= 1 {
count += options(stairs - 1);
}
if stairs >= 2 {
count += options(stairs - 2);
}
if stairs >= 3 {
count += options(stairs - 3);
}
count
}
My question is whether this is idiomatic in Rust or whether there is a better way.
The context is a question from Cracking the Coding Interview: “A child is running up a staircase with n steps and can hop either 1 step, 2 steps, or 3 steps at a time. Implement a method to count how many possible ways the child can run up the stairs.”
Based on the definition of the tribonacci sequence I found you could write it in a more concise manner like this:
fn options(stairs: i32) -> i32 {
match stairs {
0 => 0,
1 => 1,
2 => 1,
3 => 2,
_ => options(stairs - 1) + options(stairs - 2) + options(stairs - 3)
}
}
I would also recommend changing the funtion definition to only accept positive integers, e.g. u32.
To answer the generic question, I would argue that match and fallthrough are somewhat antithetical.
match is used to be able to perform different actions based on the different patterns. Most of the time, the very values extracted via pattern matching are so different than a fallthrough does not make sense.
A fallthrough, instead, points to a sequence of actions. There are many ways to express sequences: recursion, iteration, ...
In your case, for example, one could use a loop:
for i in 1..4 {
if stairs >= i {
count += options(stairs - i);
}
}
Of course, I find #ljedrz' solution even more elegant in this particular instance.
I would advise to avoid recursion in Rust. It is better to use iterators:
struct Trib(usize, usize, usize);
impl Default for Trib {
fn default() -> Trib {
Trib(1, 0, 0)
}
}
impl Iterator for Trib {
type Item = usize;
fn next(&mut self) -> Option<usize> {
let &mut Trib(a, b, c) = self;
let d = a + b + c;
*self = Trib(b, c, d);
Some(d)
}
}
fn options(stairs: usize) -> usize {
Trib::default().take(stairs + 1).last().unwrap()
}
fn main() {
for (i, v) in Trib::default().enumerate().take(10) {
println!("i={}, t={}", i, v);
}
println!("{}", options(0));
println!("{}", options(1));
println!("{}", options(3));
println!("{}", options(7));
}
Playground
Your code looks pretty idiomatic to me, although #ljedrz has suggested an even more elegant rewriting of the same strategy.
Since this is an interview problem, it's worth mentioning that neither solution is going to be seen as an amazing answer because both solutions take exponential time in the number of stairs.
Here is what I might write if I were trying to crack a coding interview:
fn options(stairs: usize) -> u128 {
let mut o = vec![1, 1, 2, 4];
for _ in 3..stairs {
o.push(o[o.len() - 1] + o[o.len() - 2] + o[o.len() - 3]);
}
o[stairs]
}
Instead of recomputing options(n) each time, we cache each value in an array. So, this should run in linear time instead of exponential time. I also switched to a u128 to be able to return solutions for larger inputs.
Keep in mind that this is not the most efficient solution because it uses linear space. You can get away with using constant space by only keeping track of the final three elements of the array. I chose this as a compromise between conciseness, readability, and efficiency.

Take slice of certain length known at compile time

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?
Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.
I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.
I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

Resources