How to use a macro to generate compile-time unique integers? - rust

I need several parts of a program, in different modules, to have a unique integer.
for example:
pub fn foo() -> u64 {
unique_integer!()
}
pub fn bar() -> u64 {
unique_integer!()
}
(foo() should never return the same as bar(), but the values themselves are meaningless and do not need to be stable across builds. All invocations of foo() must return the same values, as must all invocations to bar(). It is preferred, but not essential, that the values are contiguous.)
Is there a way of using a macro to do this?

You could compute a compile-time hash using the module path (which contains the crate and modules leading up to the file), the file name, column and line number of the macro invocation like this:
pub const fn hash(module_path: &'static str, file: &'static str, line: u32, column: u32) -> u64 {
let mut hash = 0xcbf29ce484222325;
let prime = 0x00000100000001B3;
let mut bytes = module_path.as_bytes();
let mut i = 0;
while i < bytes.len() {
hash ^= bytes[i] as u64;
hash = hash.wrapping_mul(prime);
i += 1;
}
bytes = file.as_bytes();
i = 0;
while i < bytes.len() {
hash ^= bytes[i] as u64;
hash = hash.wrapping_mul(prime);
i += 1;
}
hash ^= line as u64;
hash = hash.wrapping_mul(prime);
hash ^= column as u64;
hash = hash.wrapping_mul(prime);
hash
}
macro_rules! unique_number {
() => {{
const UNIQ: u64 = crate::hash(module_path!(), file!(), line!(), column!());
UNIQ
}};
}
fn foo() -> u64 {
unique_number!()
}
fn bar() -> u64 {
unique_number!()
}
fn main() {
println!("{} {}", foo(), bar()); // 2098219922142993841 2094402417770602149 on the playground
}
(playground)
This has the benefit of consistent results, when compared to the top answer that can return different values depending on the order of invocation, and this is also entirely computed in compile time, which remove the runtime overhead of maintaining a counter.
The only downside to this is that you could get hash value collisions. But the chance is low. If you want, you could try implementing an algorithm that computes perfect hash values. The example shown uses the FNV algorithm which should be decent but not perfect.

Not exactly a macro but anyway it's a proposition:
#[repr(u64)]
enum Unique {
Foo,
Bar,
}
pub fn foo() -> u64 {
Unique::Foo as u64
}
pub fn bar() -> u64 {
Unique::Bar as u64
}
Compiler should warn you if you don't use a variant.

No, you can not use a regular macro for this. However, you might be able to find a procedural macro crate which might give this functionality.
That being said...
This does not count as safe rust, but if we are okay with throwing safety out the window then this should do the trick.
macro_rules! unique_u64 {
() => {{
struct PlaceHolder;
let id = ::std::any::TypeId::of::<PlaceHolder>();
unsafe { ::std::mem::transmute::<_, u64>(id) }
}};
}
This is probably undefined behavior, but since we know that every type should have a unique TypeId it would have the desired effect. The only reason I know that this is even possible is because I have looked at the structure of TypeId and know it contains a single u64 to distinguish types. However, there are currently plans to change TypeId from being a u64 to something more stable and less prone to this kind of unsafe code. We have no guarantees on what the contents of TypeId might change to and when it does change it might silently fail if it still has the same size as a u64.
Alternatively,
We can achieve a similar result in safe rust by hashing the TypeId. Now, it slightly breaks the rules since we do not have any guarantee that it will always produce a unique result. However, it seems highly unlikely that 2 different TypeIds would hash to the same value. Plus this stays within safe rust and is unlikely to break for future releases of Rust.
macro_rules! unique_u64 {
() => {{
use ::std::hash::{Hash, Hasher};
struct PlaceHolder;
let id = ::std::any::TypeId::of::<PlaceHolder>();
let mut hasher = ::std::collections::hash_map::DefaultHasher::new();
id.hash(&mut hasher);
hasher.finish()
}};
}

It's possible to do something like this with once_cell, using a static atomic variable as a counter:
use core::sync::atomic::{Ordering, AtomicU64};
use once_cell::sync::Lazy;
static COUNTER: AtomicU64 = AtomicU64::new(0);
fn foo() -> u64 {
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}
fn bar() -> u64 {
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}
fn main() {
dbg!(foo()); // 0
dbg!(foo()); // still 0
dbg!(bar()); // 1
dbg!(foo()); // unchanged - 0
dbg!(bar()); // unchanged - 1
}
Playground
And, yes, the repeating code can be, as usual, wrapped in macro:
macro_rules! unique_integer {
() => {{
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}}
}
fn foo() -> u64 {
unique_integer!()
}
fn bar() -> u64 {
unique_integer!()
}

Related

Recursive closure inside a function [duplicate]

This is a very simple example, but how would I do something similar to:
let fact = |x: u32| {
match x {
0 => 1,
_ => x * fact(x - 1),
}
};
I know that this specific example can be easily done with iteration, but I'm wondering if it's possible to make a recursive function in Rust for more complicated things (such as traversing trees) or if I'm required to use my own stack instead.
There are a few ways to do this.
You can put closures into a struct and pass this struct to the closure. You can even define structs inline in a function:
fn main() {
struct Fact<'s> { f: &'s dyn Fn(&Fact, u32) -> u32 }
let fact = Fact {
f: &|fact, x| if x == 0 {1} else {x * (fact.f)(fact, x - 1)}
};
println!("{}", (fact.f)(&fact, 5));
}
This gets around the problem of having an infinite type (a function that takes itself as an argument) and the problem that fact isn't yet defined inside the closure itself when one writes let fact = |x| {...} and so one can't refer to it there.
Another option is to just write a recursive function as a fn item, which can also be defined inline in a function:
fn main() {
fn fact(x: u32) -> u32 { if x == 0 {1} else {x * fact(x - 1)} }
println!("{}", fact(5));
}
This works fine if you don't need to capture anything from the environment.
One more option is to use the fn item solution but explicitly pass the args/environment you want.
fn main() {
struct FactEnv { base_case: u32 }
fn fact(env: &FactEnv, x: u32) -> u32 {
if x == 0 {env.base_case} else {x * fact(env, x - 1)}
}
let env = FactEnv { base_case: 1 };
println!("{}", fact(&env, 5));
}
All of these work with Rust 1.17 and have probably worked since version 0.6. The fn's defined inside fns are no different to those defined at the top level, except they are only accessible within the fn they are defined inside.
As of Rust 1.62 (July 2022), there's still no direct way to recurse in a closure. As the other answers have pointed out, you need at least a bit of indirection, like passing the closure to itself as an argument, or moving it into a cell after creating it. These things can work, but in my opinion they're kind of gross, and they're definitely hard for Rust beginners to follow. If you want to use recursion but you have to have a closure, for example because you need something that implements FnOnce() to use with thread::spawn, then I think the cleanest approach is to use a regular fn function for the recursive part and to wrap it in a non-recursive closure that captures the environment. Here's an example:
let x = 5;
let fact = || {
fn helper(arg: u64) -> u64 {
match arg {
0 => 1,
_ => arg * helper(arg - 1),
}
}
helper(x)
};
assert_eq!(120, fact());
Here's a really ugly and verbose solution I came up with:
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
fn main() {
let weak_holder: Rc<RefCell<Weak<dyn Fn(u32) -> u32>>> =
Rc::new(RefCell::new(Weak::<fn(u32) -> u32>::new()));
let weak_holder2 = weak_holder.clone();
let fact: Rc<dyn Fn(u32) -> u32> = Rc::new(move |x| {
let fact = weak_holder2.borrow().upgrade().unwrap();
if x == 0 {
1
} else {
x * fact(x - 1)
}
});
weak_holder.replace(Rc::downgrade(&fact));
println!("{}", fact(5)); // prints "120"
println!("{}", fact(6)); // prints "720"
}
The advantages of this are that you call the function with the expected signature (no extra arguments needed), it's a closure that can capture variables (by move), it doesn't require defining any new structs, and the closure can be returned from the function or otherwise stored in a place that outlives the scope where it was created (as an Rc<Fn...>) and it still works.
Closure is just a struct with additional contexts. Therefore, you can do this to achieve recursion (suppose you want to do factorial with recursive mutable sum):
#[derive(Default)]
struct Fact {
ans: i32,
}
impl Fact {
fn call(&mut self, n: i32) -> i32 {
if n == 0 {
self.ans = 1;
return 1;
}
self.call(n - 1);
self.ans *= n;
self.ans
}
}
To use this struct, just:
let mut fact = Fact::default();
let ans = fact.call(5);

Initialize a Vec with not-None values only

If I have variables like this:
let a: u32 = ...;
let b: Option<u32> = ...;
let c: u32 = ...;
, what is the shortest way to make a vector of those values, so that b is only included if it's Some?
In other words, is there something simpler than this:
let v = match b {
None => vec![a, c],
Some(x) => vec![a, x, c],
};
P.S. I would prefer a solution where we don't need to use the variables more than once. Consider this example:
let some_person: String = ...;
let best_man: Option<String> = ...;
let a_third_person: &str = ...;
let another_opt: Option<String> = ...;
...
As can be seen, we might have to use longer variable names, more than one Option (None), expressions (like a_third_person.to_string()), etc.
Yours is fine, but here's a sophisticated one:
[Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>()
This works since Option impls IntoIterator.
If it depends on just one variable:
b.map(|b| vec![a, b, c]).unwrap_or_else(|| vec![a, c]);
Playground
After some thinking and investigating, I've come with the following crazy thing.
The end goal is to have a macro, optional_vec![], that you can pass it either T or Option<T> and it should behave like described in the question. However, I decided on a strong restriction: it should have the best performance possible. So, you write:
optional_vec![a, b, c]
And get at least the performance of hand-written match, if not more. This forbids the use of the simple [Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>(), suggested in my other answer (though even this solution needs some way to differentiate between Option<T> and just T, which, like we'll see, is not an easy problem at all).
I will first warn that I've not found a way to make my macro work with Option. That is, if you want to build a vector of Option<T> from Option<T> and Option<Option<T>>, it will not work.
When a design a complex macro, I like to think first how the expanded code will look like. And in this macro, we have several hard problems to solve.
First, the macro take plain expressions. But somehow, it needs to switch on their type being T or Option<T>. How should such thing be done?
The feature we use to do such things is specialization.
#![feature(specialization)]
pub trait Optional {
fn some_method(self);
}
impl<T> Optional for T {
default fn some_method(self) {
// Just T
}
}
impl<T> Optional for Option<T> {
fn some_method(self) {
// Option<T>
}
}
Like you probably noticed, now we have two problems: first, specialization is unstable, and I'd like to stay with stable. Second, what should be inside the trait? The second problem is easier to solve, so let's begin with it.
Turns out that the most performant way to do the pushing to the vector is to pre-allocate capacity (Vec::with_capacity), write to the vector by using pointers (don't push(), it optimizes badly!) then set the length (Vec::set_len()).
We can get a pointer to the internal buffer of the vector using Vec::as_mut_ptr(), and advance the pointer via <*mut T>::add(1).
So, we need two methods: one to hint us about the capacity (can be zero for None or one for Some() and non-Option elements), and a write_and_advance() method:
pub trait Optional {
type Item;
fn len(&self) -> usize;
unsafe fn write_and_advance(self, place: &mut *mut Self::Item);
}
impl<T> Optional for T {
default type Item = Self;
default fn len(&self) -> usize { 1 }
default unsafe fn write_and_advance(self, place: &mut *mut Self) {
place.write(self);
*place = place.add(1);
}
}
impl<T> Optional<T> for Option<T> {
type Item = T;
fn len(&self) -> usize { self.is_some() as usize }
unsafe fn write_and_advance(self, place: &mut *mut T) {
if let Some(value) = self {
place.write(value);
*place = place.add(1);
}
}
}
It doesn't even compile! For the why, see Mismatch between associated type and type parameter only when impl is marked `default`. Luckily for us, the trick we'll use to workaround specialization not being stable does work in this situation. But for now, let's assume it works. How will the code using this trait look like?
match (a, b, c) { // The match is here because it's the best binding for liftimes: see https://stackoverflow.com/a/54855986/7884305
(a, b, c) => {
let len = Optional::len(&a) + Optional::len(&b) + Optional::len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
Optional::write_and_advance(a, &mut next_element);
Optional::write_and_advance(b, &mut next_element);
Optional::write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
And it works! Except that it does not, because the specialization does not compile as I said, and we also want to not repeat all of this boilerplate but insert it into a macro.
So, how do we solve the problems with specialization: being unstable and not working?
dtonlay has a very cool trick he calls autoref specialization (BTW, all of this repo is a very recommended reading!). This is a trick that can be used to emulate specialization. It works only in macros, but we're in a macro so this is fine.
I will not elaborate about the trick here (I recommend to read his post; he also used this trick in the excellent and very widely used anyhow crate). In short, the idea is to trick the typechecker by implementing a trait for T under certain conditions (the specialized impl) and other trait for &T for the general case (this could be inherent impl if not coherence). Since Rust performs automatic referencing during method resolution, that is take reference to the receiver as needed, this will work - the typechecker will autoref if needed, and will stop in the first applicable impl - i.e. the specialized impl if it matches, or the general impl otherwise.
Here's an example:
use std::fmt;
pub trait Display {
fn foo(&self);
}
// Level 1
impl<T: fmt::Display> Display for T {
fn foo(&self) { println!("Display({}), {}", std::any::type_name::<T>(), self); }
}
pub trait Debug {
fn foo(&self);
}
// Level 2
impl<T: fmt::Debug> Debug for &T {
fn foo(&self) { println!("Debug({}), {:?}", std::any::type_name::<T>(), self); }
}
macro_rules! foo {
($e:expr) => ((&$e).foo());
}
Playground.
We can use this trick in our case:
#[doc(hidden)]
pub mod autoref_specialization {
#[derive(Copy, Clone)]
pub struct OptionTag;
pub trait OptionKind {
fn optional_kind(&self) -> OptionTag;
}
impl<T> OptionKind for Option<T> {
#[inline(always)]
fn optional_kind(&self) -> OptionTag { OptionTag }
}
impl OptionTag {
#[inline(always)]
pub fn len<T>(self, this: &Option<T>) -> usize { this.is_some() as usize }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: Option<T>, place: &mut *mut T) {
if let Some(value) = this {
place.write(value);
*place = place.add(1);
}
}
}
#[derive(Copy, Clone)]
pub struct DefaultTag;
pub trait DefaultKind {
fn optional_kind(&self) -> DefaultTag;
}
impl<T> DefaultKind for &'_ T {
#[inline(always)]
fn optional_kind(&self) -> DefaultTag { DefaultTag }
}
impl DefaultTag {
#[inline(always)]
pub fn len<T>(self, _this: &T) -> usize { 1 }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: T, place: &mut *mut T) {
place.write(this);
*place = place.add(1);
}
}
}
And the expanded code will look like:
use autoref_specialization::{DefaultKind as _, OptionKind as _};
match (a, b, c) {
(a, b, c) => {
let (a_tag, b_tag, c_tag) = (
(&a).optional_kind(),
(&b).optional_kind(),
(&c).optional_kind(),
);
let len = a_tag.len(&a) + b_tag.len(&b) + c_tag.len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
a_tag.write_and_advance(a, &mut next_element);
b_tag.write_and_advance(b, &mut next_element);
c_tag.write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
It may be tempting to try to convert this immediately into a macro, but we still have one unsolved problem: our macro need to generate identifiers. This may not be obvious, but what if we pass optional_vec![1, Some(2), 3]? We need to generate the bindings for the match (in our case, (a, b, c) => ...) and the tag names ((a_tag, b_tag, c_tag)).
Unfortunately, generating names is not something macro_rules! can do in today's Rust. Fortunately, there is an excellent crate paste (another one from dtonlay!) that is a small proc-macro that allows you to do that. It is even available on the playground!
However, we need a series of identifiers. That can be done with tt-munching, by repeatedly adding some letter (I used a), so you get a, aa, aaa, ... you get the idea.
#[doc(hidden)]
pub mod reexports {
pub use std::vec::Vec;
pub use paste::paste;
}
#[macro_export]
macro_rules! optional_vec {
// Empty case
{ #generate_idents
exprs = []
processed_exprs = [$($e:expr,)*]
match_bindings = [$($binding:ident)*]
tags = [$($tag:ident)*]
} => {{
use $crate::autoref_specialization::{DefaultKind as _, OptionKind as _};
match ($($e,)*) {
($($binding,)*) => {
let ($($tag,)*) = (
$((&$binding).optional_kind(),)*
);
let len = 0 $(+ $tag.len(&$binding))*;
let mut result = $crate::reexports::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
$($tag.write_and_advance($binding, &mut next_element);)*
result.set_len(len);
}
result
}
}
}};
{ #generate_idents
exprs = [$e:expr, $($rest:expr,)*]
processed_exprs = [$($processed_exprs:tt)*]
match_bindings = [$first_binding:ident $($bindings:ident)*]
tags = [$($tags:ident)*]
} => {
$crate::reexports::paste! {
$crate::optional_vec! { #generate_idents
exprs = [$($rest,)*]
processed_exprs = [$($processed_exprs)* $e,]
match_bindings = [
[< $first_binding a >]
$first_binding
$($bindings)*
]
tags = [
[< $first_binding a_tag >]
$($tags)*
]
}
}
};
// Entry
[$e:expr $(, $exprs:expr)* $(,)?] => {
$crate::optional_vec! { #generate_idents
exprs = [$($exprs,)+]
processed_exprs = [$e,]
match_bindings = [__optional_vec_a]
tags = [__optional_vec_a_tag]
}
};
}
Playground.
I can also personally recommend
let mut v = vec![a, c];
v.extend(b);
Short and clear.
Sometime the straight forward solution is the best:
fn jim_power(a: u32, b: Option<u32>, c: u32) -> Vec<u32> {
let mut acc = Vec::with_capacity(3);
acc.push(a);
if let Some(b) = b {
acc.push(b);
}
acc.push(c);
acc
}
fn ys_iii(
some_person: String,
best_man: Option<String>,
a_third_person: String,
another_opt: Option<String>,
) -> Vec<String> {
let mut acc = Vec::with_capacity(4);
acc.push(some_person);
best_man.map(|x| acc.push(x));
acc.push(a_third_person);
another_opt.map(|x| acc.push(x));
acc
}
If you don't care about the order of the values, another option is
Iterator::chain(
[a, c].into_iter(),
[b].into_iter().flatten()
).collect()
Playground

How can I create a fixed size array of Strings using constant generics?

I have a function using a constant generic:
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = Default::default(); //It sucks because of default arrays are specified up to 32 only
// Some code
}
How can I create a fixed size array of Strings in my case? let mut row: [String; S] = ["".to_string(), S]; doesn't work because String doesn't implement the Copy trait.
You can do it with MaybeUninit and unsafe:
use std::mem::MaybeUninit;
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = unsafe {
let mut result = MaybeUninit::uninit();
let start = result.as_mut_ptr() as *mut String;
for pos in 0 .. S {
// SAFETY: safe because loop ensures `start.add(pos)`
// is always on an array element, of type String
start.add(pos).write(String::new());
}
// SAFETY: safe because loop ensures entire array
// has been manually initialised
result.assume_init()
};
// Some code
todo!()
}
Of course, it might be easier to abstract such logic to your own trait:
use std::mem::MaybeUninit;
trait DefaultArray {
fn default_array() -> Self;
}
impl<T: Default, const S: usize> DefaultArray for [T; S] {
fn default_array() -> Self {
let mut result = MaybeUninit::uninit();
let start = result.as_mut_ptr() as *mut T;
unsafe {
for pos in 0 .. S {
// SAFETY: safe because loop ensures `start.add(pos)`
// is always on an array element, of type T
start.add(pos).write(T::default());
}
// SAFETY: safe because loop ensures entire array
// has been manually initialised
result.assume_init()
}
}
}
(The only reason for using your own trait rather than Default is that implementations of the latter would conflict with those provided in the standard library for arrays of up to 32 elements; I wholly expect the standard library to replace its implementation of Default with something similar to the above once const generics have stabilised).
In which case you would now have:
fn foo<const S: usize>() -> Vec<[String; S]> {
// Some code
let mut row: [String; S] = DefaultArray::default_array();
// Some code
todo!()
}
See it on the Playground.
As of now, there is no way to compile constant generics. As #AlexLarionov said, you can try to use procedural macros, but that approach still has its bugs and limitations.
If you need a generic that has to be a number, you can use the Num crate, or the more verbose std::num.

Thread-safe mutable non-owning pointer in Rust?

I'm trying to parallelize an algorithm I have. This is a sketch of how I would write it in C++:
void thread_func(std::vector<int>& results, int threadid) {
results[threadid] = threadid;
}
std::vector<int> foo() {
std::vector<int> results(4);
for(int i = 0; i < 4; i++)
{
spawn_thread(thread_func, results, i);
}
join_threads();
return results;
}
The point here is that each thread has a reference to a shared, mutable object that it does not own. It seems like this is difficult to do in Rust. Should I try to cobble it together in terms of (and I'm guessing here) Mutex, Cell and &mut, or is there a better pattern I should follow?
The proper way is to use Arc<Mutex<...>> or, for example, Arc<RWLock<...>>. Arc is a shared ownership-based concurrency-safe pointer to immutable data, and Mutex/RWLock introduce synchronized internal mutability. Your code then would look like this:
use std::sync::{Arc, Mutex};
use std::thread;
fn thread_func(results: Arc<Mutex<Vec<i32>>>, thread_id: i32) {
let mut results = results.lock().unwrap();
results[thread_id as usize] = thread_id;
}
fn foo() -> Arc<Mutex<Vec<i32>>> {
let results = Arc::new(Mutex::new(vec![0; 4]));
let guards: Vec<_> = (0..4).map(|i| {
let results = results.clone();
thread::spawn(move || thread_func(results, i))
}).collect();
for guard in guards {
guard.join();
}
results
}
This unfortunately requires you to return Arc<Mutex<Vec<i32>>> from the function because there is no way to "unwrap" the value. An alternative is to clone the vector before returning.
However, using a crate like scoped_threadpool (whose approach could only be recently made sound; something like it will probably make into the standard library instead of the now deprecated thread::scoped() function, which is unsafe) it can be done in a much nicer way:
extern crate scoped_threadpool;
use scoped_threadpool::Pool;
fn thread_func(result: &mut i32, thread_id: i32) {
*result = thread_id;
}
fn foo() -> Vec<i32> {
let results = vec![0; 4];
let mut pool = Pool::new(4);
pool.scoped(|scope| {
for (i, e) in results.iter_mut().enumerate() {
scope.execute(move || thread_func(e, i as i32));
}
});
results
}
If your thread_func needs to access the whole vector, however, you can't get away without synchronization, so you would need a Mutex, and you would still get the unwrapping problem:
extern crate scoped_threadpool;
use std::sync::Mutex;
use scoped_threadpool::Pool;
fn thread_func(results: &Mutex<Vec<u32>>, thread_id: i32) {
let mut results = results.lock().unwrap();
result[thread_id as usize] = thread_id;
}
fn foo() -> Vec<i32> {
let results = Mutex::new(vec![0; 4]);
let mut pool = Pool::new(4);
pool.scoped(|scope| {
for i in 0..4 {
scope.execute(move || thread_func(&results, i));
}
});
results.lock().unwrap().clone()
}
But at least you don't need any Arcs here. Also execute() method is unsafe if you use stable compiler because it does not have a corresponding fix to make it safe. It is safe on all compiler versions greater than 1.4.0, according to its build script.

Lazy sequence generation in Rust

How can I create what other languages call a lazy sequence or a "generator" function?
In Python, I can use yield as in the following example (from Python's docs) to lazily generate a sequence that is iterable in a way that does not use the memory of an intermediary list:
# a generator that yields items instead of returning a list
def firstn(n):
num = 0
while num < n:
yield num
num += 1
sum_of_first_n = sum(firstn(1000000))
How can I do something similar in Rust?
Rust does have generators, but they are highly experimental and not currently available in stable Rust.
Works in stable Rust 1.0 and above
Range handles your concrete example. You can use it with the syntactical sugar of ..:
fn main() {
let sum: u64 = (0..1_000_000).sum();
println!("{}", sum)
}
What if Range didn't exist? We can create an iterator that models it!
struct MyRange {
start: u64,
end: u64,
}
impl MyRange {
fn new(start: u64, end: u64) -> MyRange {
MyRange {
start: start,
end: end,
}
}
}
impl Iterator for MyRange {
type Item = u64;
fn next(&mut self) -> Option<u64> {
if self.start == self.end {
None
} else {
let result = Some(self.start);
self.start += 1;
result
}
}
}
fn main() {
let sum: u64 = MyRange::new(0, 1_000_000).sum();
println!("{}", sum)
}
The guts are the same, but more explicit than the Python version. Notably, Python's generators keep track of the state for you. Rust prefers explicitness, so we have to create our own state and update it manually. The important part is the implementation of the Iterator trait. We specify that the iterator yields values of a specific type (type Item = u64) and then deal with stepping each iteration and how to tell we have reached the end of iteration.
This example is not as powerful as the real Range, which uses generics, but shows an example of how to go about it.
Works in nightly Rust
Nightly Rust does have generators, but they are highly experimental. You need to bring in a few unstable features to create one. However, it looks pretty close to the Python example, with some Rust-specific additions:
// 1.43.0-nightly (2020-02-09 71c7e149e42cb0fc78a8)
#![feature(generators, generator_trait)]
use std::{
ops::{Generator, GeneratorState},
pin::Pin,
};
fn firstn(n: u64) -> impl Generator<Yield = u64, Return = ()> {
move || {
let mut num = 0;
while num < n {
yield num;
num += 1;
}
}
}
Since everything in current Rust operates on iterators, we create an adapter that converts a generator into an iterator in order to play with the broader ecosystem. I'd expect that such an adapter would be present in the standard library eventually:
struct GeneratorIteratorAdapter<G>(Pin<Box<G>>);
impl<G> GeneratorIteratorAdapter<G>
where
G: Generator<Return = ()>,
{
fn new(gen: G) -> Self {
Self(Box::pin(gen))
}
}
impl<G> Iterator for GeneratorIteratorAdapter<G>
where
G: Generator<Return = ()>,
{
type Item = G::Yield;
fn next(&mut self) -> Option<Self::Item> {
match self.0.as_mut().resume(()) {
GeneratorState::Yielded(x) => Some(x),
GeneratorState::Complete(_) => None,
}
}
}
Now we can use it:
fn main() {
let generator_iterator = GeneratorIteratorAdapter::new(firstn(1_000_000));
let sum: u64 = generator_iterator.sum();
println!("{}", sum);
}
What's interesting about this is that it's less powerful than an implementation of Iterator. For example, iterators have the size_hint method, which allows consumers of the iterator to have an idea of how many elements are remaining. This allows optimizations when collecting into a container. Generators do not have any such information.
As of Rust 1.34 stable, you have convenient std::iter::from_fn utility. It is not a coroutine (i.e. you still have to return each time), but at least it saves you from defining another struct.
from_fn accepts a closure FnMut() -> Option<T> and repeatedly calls it to create an Iterator<T>. In pseudo-Python, def from_fn(f): while (val := f()) is not None: yield val.
// -> Box<dyn std::iter::Iterator<Item=u64>> in Rust 2015
fn firstn(n: u64) -> impl std::iter::Iterator<Item = u64> {
let mut num = 0;
std::iter::from_fn(move || {
let result;
if num < n {
result = Some(num);
num += 1
} else {
result = None
}
result
})
}
fn main() {
let sum_of_first_n = firstn(1000000).sum::<u64>();
println!("sum(0 to 999999): {}", sum_of_first_n);
}
std::iter::successors is also available. It is less general but might be a bit easier to use since you just pass around the seed value explicitly. In pseudo-Python: def successors(seed, f): while seed is not None: yield seed; seed = f(seed).
fn firstn(n: u64) -> impl std::iter::Iterator<Item = u64> {
std::iter::successors(
Some(0),
move |&num| {
let next = num + 1;
if next < n {
Some(next)
} else {
None
}
},
)
}
However, Shepmaster's note applies to these utility too. (tldr: often hand-rolled Iterators are more memory efficient)
What's interesting about this is that it's less powerful than an implementation of Iterator. For example, iterators have the size_hint method, which allows consumers of the iterator to have an idea of how many elements are remaining. This allows optimizations when collecting into a container. Generators do not have any such information.
(Note: returning impl is a Rust 2018 feature. See the Edition Guide for configuration and Announcement or Rust By Example for explanation)
Rust 1.0 does not have generator functions, so you'd have to do it manually with explicit iterators.
First, rewrite your Python example as a class with a next() method, since that is closer to the model you're likely to get in Rust. Then you can rewrite it in Rust with a struct that implements the Iterator trait.
You might also be able to use a function that returns a closure to achieve a similar result, but I don't think it would be possible to have that implement the Iterator trait (since it would require being called to generate a new result).
You can use my stackful Rust generator library which supports stable Rust:
#[macro_use]
extern crate generator;
use generator::{Generator, Gn};
fn firstn(n: usize) -> Generator<'static, (), usize> {
Gn::new_scoped(move |mut s| {
let mut num = 0;
while num < n {
s.yield_(num);
num += 1;
}
done!();
})
}
fn main() {
let sum_of_first_n: usize = firstn(1000000).sum();
println!("sum ={}", sum_of_first_n);
}
or more simply:
let n = 100000;
let range = Gn::new_scoped(move |mut s| {
let mut num = 0;
while num < n {
s.yield_(num);
num += 1;
}
done!();
});
let sum: usize = range.sum();

Resources