Lazy sequence generation in Rust - rust

How can I create what other languages call a lazy sequence or a "generator" function?
In Python, I can use yield as in the following example (from Python's docs) to lazily generate a sequence that is iterable in a way that does not use the memory of an intermediary list:
# a generator that yields items instead of returning a list
def firstn(n):
num = 0
while num < n:
yield num
num += 1
sum_of_first_n = sum(firstn(1000000))
How can I do something similar in Rust?

Rust does have generators, but they are highly experimental and not currently available in stable Rust.
Works in stable Rust 1.0 and above
Range handles your concrete example. You can use it with the syntactical sugar of ..:
fn main() {
let sum: u64 = (0..1_000_000).sum();
println!("{}", sum)
}
What if Range didn't exist? We can create an iterator that models it!
struct MyRange {
start: u64,
end: u64,
}
impl MyRange {
fn new(start: u64, end: u64) -> MyRange {
MyRange {
start: start,
end: end,
}
}
}
impl Iterator for MyRange {
type Item = u64;
fn next(&mut self) -> Option<u64> {
if self.start == self.end {
None
} else {
let result = Some(self.start);
self.start += 1;
result
}
}
}
fn main() {
let sum: u64 = MyRange::new(0, 1_000_000).sum();
println!("{}", sum)
}
The guts are the same, but more explicit than the Python version. Notably, Python's generators keep track of the state for you. Rust prefers explicitness, so we have to create our own state and update it manually. The important part is the implementation of the Iterator trait. We specify that the iterator yields values of a specific type (type Item = u64) and then deal with stepping each iteration and how to tell we have reached the end of iteration.
This example is not as powerful as the real Range, which uses generics, but shows an example of how to go about it.
Works in nightly Rust
Nightly Rust does have generators, but they are highly experimental. You need to bring in a few unstable features to create one. However, it looks pretty close to the Python example, with some Rust-specific additions:
// 1.43.0-nightly (2020-02-09 71c7e149e42cb0fc78a8)
#![feature(generators, generator_trait)]
use std::{
ops::{Generator, GeneratorState},
pin::Pin,
};
fn firstn(n: u64) -> impl Generator<Yield = u64, Return = ()> {
move || {
let mut num = 0;
while num < n {
yield num;
num += 1;
}
}
}
Since everything in current Rust operates on iterators, we create an adapter that converts a generator into an iterator in order to play with the broader ecosystem. I'd expect that such an adapter would be present in the standard library eventually:
struct GeneratorIteratorAdapter<G>(Pin<Box<G>>);
impl<G> GeneratorIteratorAdapter<G>
where
G: Generator<Return = ()>,
{
fn new(gen: G) -> Self {
Self(Box::pin(gen))
}
}
impl<G> Iterator for GeneratorIteratorAdapter<G>
where
G: Generator<Return = ()>,
{
type Item = G::Yield;
fn next(&mut self) -> Option<Self::Item> {
match self.0.as_mut().resume(()) {
GeneratorState::Yielded(x) => Some(x),
GeneratorState::Complete(_) => None,
}
}
}
Now we can use it:
fn main() {
let generator_iterator = GeneratorIteratorAdapter::new(firstn(1_000_000));
let sum: u64 = generator_iterator.sum();
println!("{}", sum);
}
What's interesting about this is that it's less powerful than an implementation of Iterator. For example, iterators have the size_hint method, which allows consumers of the iterator to have an idea of how many elements are remaining. This allows optimizations when collecting into a container. Generators do not have any such information.

As of Rust 1.34 stable, you have convenient std::iter::from_fn utility. It is not a coroutine (i.e. you still have to return each time), but at least it saves you from defining another struct.
from_fn accepts a closure FnMut() -> Option<T> and repeatedly calls it to create an Iterator<T>. In pseudo-Python, def from_fn(f): while (val := f()) is not None: yield val.
// -> Box<dyn std::iter::Iterator<Item=u64>> in Rust 2015
fn firstn(n: u64) -> impl std::iter::Iterator<Item = u64> {
let mut num = 0;
std::iter::from_fn(move || {
let result;
if num < n {
result = Some(num);
num += 1
} else {
result = None
}
result
})
}
fn main() {
let sum_of_first_n = firstn(1000000).sum::<u64>();
println!("sum(0 to 999999): {}", sum_of_first_n);
}
std::iter::successors is also available. It is less general but might be a bit easier to use since you just pass around the seed value explicitly. In pseudo-Python: def successors(seed, f): while seed is not None: yield seed; seed = f(seed).
fn firstn(n: u64) -> impl std::iter::Iterator<Item = u64> {
std::iter::successors(
Some(0),
move |&num| {
let next = num + 1;
if next < n {
Some(next)
} else {
None
}
},
)
}
However, Shepmaster's note applies to these utility too. (tldr: often hand-rolled Iterators are more memory efficient)
What's interesting about this is that it's less powerful than an implementation of Iterator. For example, iterators have the size_hint method, which allows consumers of the iterator to have an idea of how many elements are remaining. This allows optimizations when collecting into a container. Generators do not have any such information.
(Note: returning impl is a Rust 2018 feature. See the Edition Guide for configuration and Announcement or Rust By Example for explanation)

Rust 1.0 does not have generator functions, so you'd have to do it manually with explicit iterators.
First, rewrite your Python example as a class with a next() method, since that is closer to the model you're likely to get in Rust. Then you can rewrite it in Rust with a struct that implements the Iterator trait.
You might also be able to use a function that returns a closure to achieve a similar result, but I don't think it would be possible to have that implement the Iterator trait (since it would require being called to generate a new result).

You can use my stackful Rust generator library which supports stable Rust:
#[macro_use]
extern crate generator;
use generator::{Generator, Gn};
fn firstn(n: usize) -> Generator<'static, (), usize> {
Gn::new_scoped(move |mut s| {
let mut num = 0;
while num < n {
s.yield_(num);
num += 1;
}
done!();
})
}
fn main() {
let sum_of_first_n: usize = firstn(1000000).sum();
println!("sum ={}", sum_of_first_n);
}
or more simply:
let n = 100000;
let range = Gn::new_scoped(move |mut s| {
let mut num = 0;
while num < n {
s.yield_(num);
num += 1;
}
done!();
});
let sum: usize = range.sum();

Related

Temporarily cache owned value between iterator adapters

I'd like to know if there's a way to cache an owned value between iterator adapters, so that adapters later in the chain can reference it.
(Or if there's another way to allow later adapters to reference an owned value that lives inside the iterator chain.)
To illustrate what I mean, let's look at this (contrived) example:
I have a function that returns a String, which is called in an Iterator map() adapter, yielding an iterator over Strings. I'd like to get an iterator over the chars() in those Strings, but the chars() method requires a string slice, meaning a reference.
Is this possible to do, without first collecting the Strings?
Here's a minimal example that of course fails:
fn greet(c: &str) -> String {
"Hello, ".to_owned() + c
}
fn main() {
let names = ["Martin", "Helena", "Ingrid", "Joseph"];
let iterator = names.into_iter().map(greet);
let fails = iterator.flat_map(<str>::chars);
}
Playground
Using a closure instead of <str>::chars - |s| s.chars() - does of course not work either. It makes the types match, but breaks lifetimes.
Edit (2022-10-03): In response to the comments, here's some pseudocode of what I have in mind, but with incorrect lifetimes:
struct IteratorCache<'a, T, I>{
item : Option<T>,
inner : I,
_p : core::marker::PhantomData<&'a T>
}
impl<'a, T, I> Iterator for IteratorCache<'a, T,I>
where I: Iterator<Item=T>
{
type Item=&'a T;
fn next(&mut self) -> Option<&'a T> {
self.item = self.inner.next();
if let Some(x) = &self.item {
Some(&x)
} else {
None
}
}
}
The idea would be that the reference could stay valid until the next call to next(). However I don't know if this can be expressed with the function signature of the Iterator trait. (Or if this can be expressed at all.)
I don't think something like this exists yet, and collecting into a Vec<char> creates some overhead, but you can write such an iterator yourself with a little bit of trickery:
struct OwnedCharsIter {
s: String,
index: usize,
}
impl OwnedCharsIter {
pub fn new(s: String) -> Self {
Self { s, index: 0 }
}
}
impl Iterator for OwnedCharsIter {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
// Slice of leftover characters
let slice = &self.s[self.index..];
// Iterator over leftover characters
let mut chars = slice.chars();
// Query the next char
let next_char = chars.next()?;
// Compute the new index by looking at how many bytes are left
// after querying the next char
self.index = self.s.len() - chars.as_str().len();
// Return next char
Some(next_char)
}
}
fn greet(c: &str) -> String {
"Hello, ".to_owned() + c
}
fn main() {
let names = ["Martin", "Helena", "Ingrid", "Joseph"];
let iterator = names.into_iter().map(greet);
let chars_iter = iterator.flat_map(OwnedCharsIter::new);
println!("{:?}", chars_iter.collect::<String>())
}
"Hello, MartinHello, HelenaHello, IngridHello, Joseph"

Recursive closure inside a function [duplicate]

This is a very simple example, but how would I do something similar to:
let fact = |x: u32| {
match x {
0 => 1,
_ => x * fact(x - 1),
}
};
I know that this specific example can be easily done with iteration, but I'm wondering if it's possible to make a recursive function in Rust for more complicated things (such as traversing trees) or if I'm required to use my own stack instead.
There are a few ways to do this.
You can put closures into a struct and pass this struct to the closure. You can even define structs inline in a function:
fn main() {
struct Fact<'s> { f: &'s dyn Fn(&Fact, u32) -> u32 }
let fact = Fact {
f: &|fact, x| if x == 0 {1} else {x * (fact.f)(fact, x - 1)}
};
println!("{}", (fact.f)(&fact, 5));
}
This gets around the problem of having an infinite type (a function that takes itself as an argument) and the problem that fact isn't yet defined inside the closure itself when one writes let fact = |x| {...} and so one can't refer to it there.
Another option is to just write a recursive function as a fn item, which can also be defined inline in a function:
fn main() {
fn fact(x: u32) -> u32 { if x == 0 {1} else {x * fact(x - 1)} }
println!("{}", fact(5));
}
This works fine if you don't need to capture anything from the environment.
One more option is to use the fn item solution but explicitly pass the args/environment you want.
fn main() {
struct FactEnv { base_case: u32 }
fn fact(env: &FactEnv, x: u32) -> u32 {
if x == 0 {env.base_case} else {x * fact(env, x - 1)}
}
let env = FactEnv { base_case: 1 };
println!("{}", fact(&env, 5));
}
All of these work with Rust 1.17 and have probably worked since version 0.6. The fn's defined inside fns are no different to those defined at the top level, except they are only accessible within the fn they are defined inside.
As of Rust 1.62 (July 2022), there's still no direct way to recurse in a closure. As the other answers have pointed out, you need at least a bit of indirection, like passing the closure to itself as an argument, or moving it into a cell after creating it. These things can work, but in my opinion they're kind of gross, and they're definitely hard for Rust beginners to follow. If you want to use recursion but you have to have a closure, for example because you need something that implements FnOnce() to use with thread::spawn, then I think the cleanest approach is to use a regular fn function for the recursive part and to wrap it in a non-recursive closure that captures the environment. Here's an example:
let x = 5;
let fact = || {
fn helper(arg: u64) -> u64 {
match arg {
0 => 1,
_ => arg * helper(arg - 1),
}
}
helper(x)
};
assert_eq!(120, fact());
Here's a really ugly and verbose solution I came up with:
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
fn main() {
let weak_holder: Rc<RefCell<Weak<dyn Fn(u32) -> u32>>> =
Rc::new(RefCell::new(Weak::<fn(u32) -> u32>::new()));
let weak_holder2 = weak_holder.clone();
let fact: Rc<dyn Fn(u32) -> u32> = Rc::new(move |x| {
let fact = weak_holder2.borrow().upgrade().unwrap();
if x == 0 {
1
} else {
x * fact(x - 1)
}
});
weak_holder.replace(Rc::downgrade(&fact));
println!("{}", fact(5)); // prints "120"
println!("{}", fact(6)); // prints "720"
}
The advantages of this are that you call the function with the expected signature (no extra arguments needed), it's a closure that can capture variables (by move), it doesn't require defining any new structs, and the closure can be returned from the function or otherwise stored in a place that outlives the scope where it was created (as an Rc<Fn...>) and it still works.
Closure is just a struct with additional contexts. Therefore, you can do this to achieve recursion (suppose you want to do factorial with recursive mutable sum):
#[derive(Default)]
struct Fact {
ans: i32,
}
impl Fact {
fn call(&mut self, n: i32) -> i32 {
if n == 0 {
self.ans = 1;
return 1;
}
self.call(n - 1);
self.ans *= n;
self.ans
}
}
To use this struct, just:
let mut fact = Fact::default();
let ans = fact.call(5);

How to use a macro to generate compile-time unique integers?

I need several parts of a program, in different modules, to have a unique integer.
for example:
pub fn foo() -> u64 {
unique_integer!()
}
pub fn bar() -> u64 {
unique_integer!()
}
(foo() should never return the same as bar(), but the values themselves are meaningless and do not need to be stable across builds. All invocations of foo() must return the same values, as must all invocations to bar(). It is preferred, but not essential, that the values are contiguous.)
Is there a way of using a macro to do this?
You could compute a compile-time hash using the module path (which contains the crate and modules leading up to the file), the file name, column and line number of the macro invocation like this:
pub const fn hash(module_path: &'static str, file: &'static str, line: u32, column: u32) -> u64 {
let mut hash = 0xcbf29ce484222325;
let prime = 0x00000100000001B3;
let mut bytes = module_path.as_bytes();
let mut i = 0;
while i < bytes.len() {
hash ^= bytes[i] as u64;
hash = hash.wrapping_mul(prime);
i += 1;
}
bytes = file.as_bytes();
i = 0;
while i < bytes.len() {
hash ^= bytes[i] as u64;
hash = hash.wrapping_mul(prime);
i += 1;
}
hash ^= line as u64;
hash = hash.wrapping_mul(prime);
hash ^= column as u64;
hash = hash.wrapping_mul(prime);
hash
}
macro_rules! unique_number {
() => {{
const UNIQ: u64 = crate::hash(module_path!(), file!(), line!(), column!());
UNIQ
}};
}
fn foo() -> u64 {
unique_number!()
}
fn bar() -> u64 {
unique_number!()
}
fn main() {
println!("{} {}", foo(), bar()); // 2098219922142993841 2094402417770602149 on the playground
}
(playground)
This has the benefit of consistent results, when compared to the top answer that can return different values depending on the order of invocation, and this is also entirely computed in compile time, which remove the runtime overhead of maintaining a counter.
The only downside to this is that you could get hash value collisions. But the chance is low. If you want, you could try implementing an algorithm that computes perfect hash values. The example shown uses the FNV algorithm which should be decent but not perfect.
Not exactly a macro but anyway it's a proposition:
#[repr(u64)]
enum Unique {
Foo,
Bar,
}
pub fn foo() -> u64 {
Unique::Foo as u64
}
pub fn bar() -> u64 {
Unique::Bar as u64
}
Compiler should warn you if you don't use a variant.
No, you can not use a regular macro for this. However, you might be able to find a procedural macro crate which might give this functionality.
That being said...
This does not count as safe rust, but if we are okay with throwing safety out the window then this should do the trick.
macro_rules! unique_u64 {
() => {{
struct PlaceHolder;
let id = ::std::any::TypeId::of::<PlaceHolder>();
unsafe { ::std::mem::transmute::<_, u64>(id) }
}};
}
This is probably undefined behavior, but since we know that every type should have a unique TypeId it would have the desired effect. The only reason I know that this is even possible is because I have looked at the structure of TypeId and know it contains a single u64 to distinguish types. However, there are currently plans to change TypeId from being a u64 to something more stable and less prone to this kind of unsafe code. We have no guarantees on what the contents of TypeId might change to and when it does change it might silently fail if it still has the same size as a u64.
Alternatively,
We can achieve a similar result in safe rust by hashing the TypeId. Now, it slightly breaks the rules since we do not have any guarantee that it will always produce a unique result. However, it seems highly unlikely that 2 different TypeIds would hash to the same value. Plus this stays within safe rust and is unlikely to break for future releases of Rust.
macro_rules! unique_u64 {
() => {{
use ::std::hash::{Hash, Hasher};
struct PlaceHolder;
let id = ::std::any::TypeId::of::<PlaceHolder>();
let mut hasher = ::std::collections::hash_map::DefaultHasher::new();
id.hash(&mut hasher);
hasher.finish()
}};
}
It's possible to do something like this with once_cell, using a static atomic variable as a counter:
use core::sync::atomic::{Ordering, AtomicU64};
use once_cell::sync::Lazy;
static COUNTER: AtomicU64 = AtomicU64::new(0);
fn foo() -> u64 {
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}
fn bar() -> u64 {
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}
fn main() {
dbg!(foo()); // 0
dbg!(foo()); // still 0
dbg!(bar()); // 1
dbg!(foo()); // unchanged - 0
dbg!(bar()); // unchanged - 1
}
Playground
And, yes, the repeating code can be, as usual, wrapped in macro:
macro_rules! unique_integer {
() => {{
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}}
}
fn foo() -> u64 {
unique_integer!()
}
fn bar() -> u64 {
unique_integer!()
}

Function comparing 2 variables of any type

A tricky problem. I have to implement a function fn eq( a, b ) comparing a and b. The function should return false if either types of variables are different or variables have different values. The function should return true if both type and value are the same.
A possible solution is to use dyn Any as Netwave advised. But such a solution has limited application because it restricts arguments of eq with static constraint. Maybe it is possible to come up with a more practical implementation? Playground of such a solution.
Well, playing with Any is not so difficult to implement something:
use std::any::Any;
use std::any::TypeId;
fn eq<T: Any + Eq, Q: Any + Eq>(a: T, b: Q) -> bool {
if TypeId::of::<T>() == TypeId::of::<Q>() {
let b_as_t = &b as &dyn Any;
// safe to unwrap, we matched the type already
&a == b_as_t.downcast_ref::<T>().unwrap()
} else {
false
}
}
fn main() {
assert!(!eq("foo", 1));
assert!(eq(1, 1));
assert!(eq(&1, &1));
assert!(!eq(&'a', &1));
}
Playground
As per the comments, it may be possible to have another version that works over references directly:
use std::any::Any;
use std::any::TypeId;
fn eq<T: Any + Eq, Q: Any + Eq>(a: &T, b: &Q) -> bool {
if TypeId::of::<T>() == TypeId::of::<Q>() {
let b_as_t = b as &dyn Any;
// safe to unwrap, we matched the type already
*a == *b_as_t.downcast_ref::<T>().unwrap()
} else {
false
}
}
fn main() {
assert!(!eq(&"foo", &1));
assert!(eq(&1, &1));
assert!(eq(&1, &1));
assert!(!eq(&'a', &1));
let s1 = "foo".to_owned();
let s2 = "foo".to_owned();
assert!(eq(&s1, &s2));
}
Playground

How to express integers other than zero and one in generic code using the num crate?

The num crate in Rust provides a way of representing zeros and ones via T::zero() and T::one(). Is there a way of representing other integers, such as two, three, etc.?
Consider the following (artificial) example:
extern crate num;
trait IsTwo {
fn is_two(self) -> bool;
}
impl<T: num::Integer> IsTwo for T {
fn is_two(self) -> bool {
self == (T::one() + T::one())
}
}
Is there a better way of representing T::one() + T::one() as 2?
One way of representing arbitrary integers in generic code is to use the num::NumCast trait:
impl<T: num::Integer + num::NumCast> IsTwo for T {
fn is_two(self) -> bool {
self == T::from(2).unwrap()
}
}
A related way is to use the num::FromPrimitive trait:
impl<T: num::Integer + num::FromPrimitive> IsTwo for T {
fn is_two(self) -> bool {
self == T::from_i32(2).unwrap()
}
}
Related questions and answers: [1, 2].
You can write a function:
fn two<T>() -> T
where T: num::Integer,
{
let mut v = T::zero();
for _ in 0..2 {
v = v + T::one();
}
v
}
I've chosen this form because it's easily made into a macro, which can be reused for any set of values:
num_constant!(two, 2);
num_constant!(forty_two, 42);
I hear the concerns now... "but that's a loop and inefficient!". That's what optimizing compilers are for. Here's the LLVM IR for two when compiled in release mode:
; Function Attrs: noinline readnone uwtable
define internal fastcc i32 #_ZN10playground3two17hbef99995c3606e93E() unnamed_addr #3 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* #rust_eh_personality {
bb3:
br label %bb8
bb8: ; preds = %bb3
ret i32 2
}
That's right - it's been optimized to the value 2. No loops.
It's relatively simple to forge any number from 0 and 1:
you need to create 2, which is hardly difficult
you then proceed in converting your number to base 2, which takes O(log2(N)) operations
The algorithm is dead simple:
fn convert<T: Integer>(n: usize) -> T {
let two = T::one() + T::one();
let mut n = n;
let mut acc = T::one();
let mut result = T::zero();
while n > 0 {
if n % 2 != 0 {
result += acc;
}
acc *= two;
n /= 2;
}
result
}
And will be efficient both in Debug (O(log2(N)) iterations) and Release (the compiler optimizes it out completely).
For those who wish to see it in action, here on the playground we can see that convert::<i32>(12345) is optimized to 12345 as expected.
As an exercise to the reader, implement a generic version of convert which takes any Integer parameter, there's not much operations required on n after all.

Resources