I tried a recursive factorial algorithm in Rust. I use this version of the compiler:
rustc 1.12.0 (3191fbae9 2016-09-23)
cargo 0.13.0-nightly (109cb7c 2016-08-19)
Code:
extern crate num_bigint;
extern crate num_traits;
use num_bigint::{BigUint, ToBigUint};
use num_traits::One;
fn factorial(num: u64) -> BigUint {
let current: BigUint = num.to_biguint().unwrap();
if num <= 1 {
return One::one();
}
return current * factorial(num - 1);
}
fn main() {
let num: u64 = 100000;
println!("Factorial {}! = {}", num, factorial(num))
}
I got this error:
$ cargo run
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
error: Process didn't exit successfully
How to fix that? And why do I see this error when using Rust?
Rust doesn't have tail call elimination, so your recursion is limited by your stack size. It may be a feature for Rust in the future (you can read more about it at the Rust FAQ), but in the meantime you will have to either not recurse so deep or use loops.
Why?
This is a stack overflow which occurs whenever there is no stack memory left. For example, stack memory is used by
local variables
function arguments
return values
Recursion uses a lot of stack memory, because for every recursive call, the memory for all local variables, function arguments, ... has to be allocated on the stack.
How to fix that?
The obvious solution is to write your algorithm in a non-recursive manner (you should do this when you want to use the algorithm in production!). But you can also just increase the stack size. While the stack size of the main thread can't be modified, you can create a new thread and set a specific stack size:
fn main() {
let num: u64 = 100_000;
// Size of one stack frame for `factorial()` was measured experimentally
thread::Builder::new().stack_size(num as usize * 0xFF).spawn(move || {
println!("Factorial {}! = {}", num, factorial(num));
}).unwrap().join();
}
This code works and, when executed via cargo run --release (with optimization!), outputs the solution after only a couple of seconds calculation.
Measuring stack frame size
In case you want to know how the stack frame size (memory requirement for one call) for factorial() was measured: I printed the address of the function argument num on each factorial() call:
fn factorial(num: u64) -> BigUint {
println!("{:p}", &num);
// ...
}
The difference between two successive call's addresses is (more or less) the stack frame size. On my machine, the difference was slightly less than 0xFF (255), so I just used that as size.
In case you're wondering why the stack frame size isn't smaller: the Rust compiler doesn't really optimize for this metric. Usually it's really not important, so optimizers tend to sacrifice this memory requirement for better execution speed. I took a look at the assembly and in this case many BigUint methods were inlined. This means that the local variables of other functions are using stack space as well!
Just as an alternative.. (I do not recommend)
Matts answer is true to an extent. There is a crate called stacker (here) that can artificially increase the stack size for usage in recursive algorithms. It does this by allocating some heap memory to overflow into.
As a word of warning... this takes a very long time to run ... but, it runs, and it doesn't blow the stack. Compiling with optimizations brings it down but its still pretty slow. You're likely to get better perf from a loop as Matt suggests. I thought I would throw this out there anyway.
extern crate num_bigint;
extern crate num_traits;
extern crate stacker;
use num_bigint::{BigUint, ToBigUint};
use num_traits::One;
fn factorial(num: u64) -> BigUint {
// println!("Called with: {}", num);
let current: BigUint = num.to_biguint().unwrap();
if num <= 1 {
// println!("Returning...");
return One::one();
}
stacker::maybe_grow(1024 * 1024, 1024 * 1024, || {
current * factorial(num - 1)
})
}
fn main() {
let num: u64 = 100000;
println!("Factorial {}! = {}", num, factorial(num));
}
I have commented out the debug printlns.. you can uncomment them if you like.
Related
How do I determine the current stack frame depth?
fn print_depths() {
println!("stack frame depth {}", stack_frame_depth());
fn print_depth2() {
println!("stack frame depth {}", stack_frame_depth());
}
print_depth2();
}
pub fn main() {
print_depths();
}
would print
stack frame depth 1
stack frame depth 2
I know main has callers so the particular numbers might be different.
I have tried stacker::remaining_stack. However, that prints a count of bytes remaining in the stack. Since the arguments passed to a function affect the stack byte "height" then it is not possible to derive a simple function call "height". I want a count of the current function call height.
If, having noted all the caveats in Peter Hall's answer, it is still desirable to obtain the actual stack depth, consider backtrace::Backtrace:
#[inline]
fn stack_frame_depth() -> usize {
Backtrace::new_unresolved().frames().len()
}
See it on the playground.
If you want to avoid the allocation involved in instantiating a Backtrace, consider backtrace::trace instead:
#[inline]
fn stack_frame_depth() -> usize {
let mut depth = 0;
backtrace::trace(|_| {
depth += 1;
true
});
depth
}
Both of the above approaches require std for thread-safety, however. If your crate is no_std and/or you are ensuring synchronization some other way, there's backtrace::trace_unsynchronized.
After optimisations, the code that actually runs is very unlikely to resemble in any way the structure of functions that you originally wrote.
If Rust exposed a function to give you the number of call frames on the stack, the number would vary hugely, depending on optimisation settings and the exact contents of these functions. For example, adding a debug print statement could change if it is inlined or not. The number could be different depending on if the build is debug or release, if lto is enabled or even if you upgraded to a newer compiler.
In order to accurately track the number of function calls, corresponding to functions in your source code, the compiler would need to add a counter increment to every function body, breaking Rust's promise of zero cost abstraction.
You could write a macro for declaring functions, that increments a counter at the start of each function and decrements it at the end. Then you pay for what you use. For example, here is a simplified (doesn't support generics) declarative macro which does this. A procedural macro would be better for something like this, and you could define it to be called as an attribute instead.
I thought that once an object is moved, the memory occupied by it on the stack can be reused for other purpose. However, the minimal example below shows the opposite.
#[inline(never)]
fn consume_string(s: String) {
drop(s);
}
fn main() {
println!(
"String occupies {} bytes on the stack.",
std::mem::size_of::<String>()
);
let s = String::from("hello");
println!("s at {:p}", &s);
consume_string(s);
let r = String::from("world");
println!("r at {:p}", &r);
consume_string(r);
}
After compiling the code with --release flag, it gives the following output on my computer.
String occupies 24 bytes on the stack.
s at 0x7ffee3b011b0
r at 0x7ffee3b011c8
It is pretty clear that even if s is moved, r does not reuse the 24-byte chunk on the stack that originally belonged to s. I suppose that reusing the stack memory of a moved object is safe, but why does the Rust compiler not do it? Am I missing any corner case?
Update:
If I enclose s by curly brackets, r can reuse the 24-byte chunk on the stack.
#[inline(never)]
fn consume_string(s: String) {
drop(s);
}
fn main() {
println!(
"String occupies {} bytes on the stack.",
std::mem::size_of::<String>()
);
{
let s = String::from("hello");
println!("s at {:p}", &s);
consume_string(s);
}
let r = String::from("world");
println!("r at {:p}", &r);
consume_string(r);
}
The code above gives the output below.
String occupies 24 bytes on the stack.
s at 0x7ffee2ca31f8
r at 0x7ffee2ca31f8
I thought that the curly brackets should not make any difference, because the lifetime of s ends after calling comsume_string(s) and its drop handler is called within comsume_string(). Why does adding the curly brackets enable the optimization?
The version of the Rust compiler I am using is given below.
rustc 1.54.0-nightly (5c0292654 2021-05-11)
binary: rustc
commit-hash: 5c029265465301fe9cb3960ce2a5da6c99b8dcf2
commit-date: 2021-05-11
host: x86_64-apple-darwin
release: 1.54.0-nightly
LLVM version: 12.0.1
Update 2:
I would like to clarify my focus of this question. I want to know the proposed "stack reuse optimization" lies in which category.
This is an invalid optimization. Under certain cases the compiled code may fail if we perform the "optimization".
This is a valid optimization, but the compiler (including both rustc frontend and llvm) is not capable of performing it.
This is a valid optimization, but is temporarily turned off, like this.
This is a valid optimization, but is missed. It will be added in the future.
My TLDR conclusion: A missed optimization opportunity.
So the first thing I did was look into whether your consume_string function actually makes a difference. To do this I created the following (a bit more) minimal example:
struct Obj([u8; 8]);
fn main()
{
println!(
"Obj occupies {} bytes on the stack.",
std::mem::size_of::<Obj>()
);
let s = Obj([1,2,3,4,5,6,7,8]);
println!("{:p}", &s);
std::mem::drop(s);
let r = Obj([11,12,13,14,15,16,17,18]);
println!("{:p}", &r);
std::mem::drop(r);
}
Instead of consume_string I use std::mem::drop which is dedicated to simply consuming an object. This code behaves just like yours:
Obj occupies 8 bytes on the stack.
0x7ffe81a43fa0
0x7ffe81a43fa8
Removing the drop doesn't affect the result.
So the question is then why rustc doesn't notice that s is dead before r goes live. As your second example shows, enclosing s in a scope will allow the optimization.
Why does this work? Because the Rust semantics dictate that an object is dropped at the end of its scope. Since s is in an inner scope, it is dropped before the scope exits. Without the scope, s is alive until the main function exits.
Why doesn't it work when moving s into a function, where it should be dropped on exit?
Probably because rust doesn't correctly flag the memory location used by s as free after the function call. As has been mentioned in the comments, it is LLVM that actually handles this optimization (called 'Stack Coloring' as far as I can tell), which means rustc must correctly tell it when the memory is no longer in use. Clearly, from your last example, rustc does it on scope exit, but apparently not when an object is moved.
i think the fn drop do not free the memory of S, just call the fn drop.
in first case the s still use the stack memory, rust can not be reused.
in second case, because the {} scope, the memory is free. so the stack memory reused
I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}
When doing integer arithmetic with checks for overflows, calculations often need to compose several arithmetic operations. A straightforward way of chaining checked arithmetic in Rust uses checked_* methods and Option chaining:
fn calculate_size(elem_size: usize,
length: usize,
offset: usize)
-> Option<usize> {
elem_size.checked_mul(length)
.and_then(|acc| acc.checked_add(offset))
}
However, this tells the compiler to generate a branch per each elementary operation. I have encountered a more unrolled approach using overflowing_* methods:
fn calculate_size(elem_size: usize,
length: usize,
offset: usize)
-> Option<usize> {
let (acc, oflo1) = elem_size.overflowing_mul(length);
let (acc, oflo2) = acc.overflowing_add(offset);
if oflo1 | oflo2 {
None
} else {
Some(acc)
}
}
Continuing computation regardless of overflows and aggregating the overflow flags with bitwise OR ensures that at most one branching is performed in the entire evaluation (provided that the implementations of overflowing_* generate branchless code). This optimization-friendly approach is more cumbersome and requires some caution in dealing with intermediate values.
Does anyone have experience with how the Rust compiler optimizes either of the patterns above on various CPU architectures, to tell whether the explicit unrolling is worthwhile, especially for more complex expressions?
Does anyone have experience with how the Rust compiler optimizes either of the patterns above on various CPU architectures, to tell whether the explicit unrolling is worthwhile, especially for more complex expressions?
You can use the playground to check how LLVM optimizes things: just click on "LLVM IR" or "ASM" instead of "Run". Stick a #[inline(never)] on the function you wish to check, and pay attention to pass it run-time arguments, to avoid constant folding. As in here:
use std::env;
#[inline(never)]
fn calculate_size(elem_size: usize,
length: usize,
offset: usize)
-> Option<usize> {
let (acc, oflo1) = elem_size.overflowing_mul(length);
let (acc, oflo2) = acc.overflowing_add(offset);
if oflo1 | oflo2 {
None
} else {
Some(acc)
}
}
fn main() {
let vec: Vec<usize> = env::args().map(|s| s.parse().unwrap()).collect();
let result = calculate_size(vec[0], vec[1], vec[2]);
println!("{:?}",result);
}
The answer you'll get, however, is that the overflow intrinsics in Rust and LLVM have been coded for convenience and not performance, unfortunately. This means that while the explicit unrolling optimizes well, counting on LLVM to optimize the checked code is not realistic for now.
Normally this is not an issue; but for a performance hotspot, you may want to unroll manually.
Note: this lack of performance is also the reason that overflow checking is disabled by default in Release mode.
I've tried to use the following code:
fn main() {
let array = box [1, 2, 3];
}
, in my program, and it results in a compile error: error: obsolete syntax: ~[T] is no longer a type.
AFAIU, there are no dynamic size arrays in Rust (the size has to be known at compile time). However, in my code snippet the array does have static size and should be of type ~[T, ..3] (owned static array of size 3) whereas the compiler says it has the type ~[T]. Is there any deep reason why it isn't possible to get a static sized array allocated on the heap?
P.S. Yeah, I've heard about Vec.
Since I ended up here, others might as well. Rust has moved along and at the point of this answer Rust is at 1.53 for stable and 1.55 for nightly.
Box::new([1, 2, 3]) is the recommended way, and does its job, however there is a catch: The array is created on the stack and then copied over to the heap. This is a documented behaviour of Box:
Move a value from the stack to the heap by creating a Box:
Meaning, it contains a hidden memcopy, and with large array, the heap allocation even fails with a stack overflow.
const X: usize = 10_000_000;
let failing_huge_heap_array = [1; X];
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
There are several workarounds to this as of now (Rust 1.53), the most straightforward is to create a vector and turn the vector into a boxed slice:
const X: usize = 10_000_000;
let huge_heap_array = vec![1; X].into_boxed_slice();
This works, but has two small catches: It looses the type information, what should be Box<[i32; 10000000]> is now Box<[usize]> and additionally takes up 16 bytes on the stack as opposed to an array which only takes 8.
...
println!("{}", mem::size_of_val(&huge_heap_array);
16
Not a huge deal, but it hurts my personal Monk factor.
Upon further research, discarding options that need nightly like the OP box [1, 2, 3] which seems to be coming back with the feature #![feature(box_syntax)] and the arr crate which is nice but also needs nightly, the best solution I found to allocating an array on the heap without the hidden memcopy
was a suggestion by Simias
/// A macro similar to `vec![$elem; $size]` which returns a boxed array.
///
/// ```rustc
/// let _: Box<[u8; 1024]> = box_array![0; 1024];
/// ```
macro_rules! box_array {
($val:expr ; $len:expr) => {{
// Use a generic function so that the pointer cast remains type-safe
fn vec_to_boxed_array<T>(vec: Vec<T>) -> Box<[T; $len]> {
let boxed_slice = vec.into_boxed_slice();
let ptr = ::std::boxed::Box::into_raw(boxed_slice) as *mut [T; $len];
unsafe { Box::from_raw(ptr) }
}
vec_to_boxed_array(vec![$val; $len])
}};
}
const X: usize = 10_000_000;
let huge_heap_array = box_array![1; X];
It does not overflow the stack and only takes up 8 bytes while preserving the type.
It uses unsafe, but limits this to a single line of code. Until the arrival of the box [1;X] syntax, IMHO a clean option.
As far as I know the box expression is experimental. You can use Box::new() with something like the code below to suppress warnings.
fn main() {
let array1 = Box::new([1, 2, 3]);
// or even
let array2: Box<[i32]> = Box::new([1, 2, 3]);
}
Check out the comment by Shepmaster below, as these are different types.
Just write like:
let mut buffer= vec![0; k]; it makes u8 array with length equals k.