How do I determine the current stack frame depth?
fn print_depths() {
println!("stack frame depth {}", stack_frame_depth());
fn print_depth2() {
println!("stack frame depth {}", stack_frame_depth());
}
print_depth2();
}
pub fn main() {
print_depths();
}
would print
stack frame depth 1
stack frame depth 2
I know main has callers so the particular numbers might be different.
I have tried stacker::remaining_stack. However, that prints a count of bytes remaining in the stack. Since the arguments passed to a function affect the stack byte "height" then it is not possible to derive a simple function call "height". I want a count of the current function call height.
If, having noted all the caveats in Peter Hall's answer, it is still desirable to obtain the actual stack depth, consider backtrace::Backtrace:
#[inline]
fn stack_frame_depth() -> usize {
Backtrace::new_unresolved().frames().len()
}
See it on the playground.
If you want to avoid the allocation involved in instantiating a Backtrace, consider backtrace::trace instead:
#[inline]
fn stack_frame_depth() -> usize {
let mut depth = 0;
backtrace::trace(|_| {
depth += 1;
true
});
depth
}
Both of the above approaches require std for thread-safety, however. If your crate is no_std and/or you are ensuring synchronization some other way, there's backtrace::trace_unsynchronized.
After optimisations, the code that actually runs is very unlikely to resemble in any way the structure of functions that you originally wrote.
If Rust exposed a function to give you the number of call frames on the stack, the number would vary hugely, depending on optimisation settings and the exact contents of these functions. For example, adding a debug print statement could change if it is inlined or not. The number could be different depending on if the build is debug or release, if lto is enabled or even if you upgraded to a newer compiler.
In order to accurately track the number of function calls, corresponding to functions in your source code, the compiler would need to add a counter increment to every function body, breaking Rust's promise of zero cost abstraction.
You could write a macro for declaring functions, that increments a counter at the start of each function and decrements it at the end. Then you pay for what you use. For example, here is a simplified (doesn't support generics) declarative macro which does this. A procedural macro would be better for something like this, and you could define it to be called as an attribute instead.
Related
I'd like to try to eliminate bounds checking on code generated by Rust. I have variables that are rarely zero and my code paths ensure they do not run into trouble. But because they can be, I cannot use NonZeroU64. When I am sure they are non-zero, how can I signal this to the compiler?
For example, if I have the following function, I know it will be non-zero. Can I tell the compiler this or do I have to have the unnecessary check?
pub fn f(n:u64) -> u32 {
n.trailing_zeros()
}
I can wrap the number in NonZeroU64 when I am sure, but then I've already incurred the check, which defeats the purpose ...
Redundant checks within a single function body can usually be optimized out. So you just need convert the number to NonZeroU64 before calling trailing_zeros(), and rely on the compiler to optimize the bound checks.
use std::num::NonZeroU64;
pub fn g(n: NonZeroU64) -> u32 {
n.trailing_zeros()
}
pub fn other_fun(n: u64) -> u32 {
if n != 0 {
println!("Do something with non-zero!");
let n = NonZeroU64::new(n).unwrap();
g(n)
} else {
42
}
}
In the above code, the if n != 0 makes sure n cannot be zero within the block, and compiler is smart enough to remove the unwrap call, making NonZeroU64::new(n).unwrap() an zero-cost operation. You can check the asm to verify that.
core::intrinsics::assume
Informs the optimizer that a condition is always true. If the
condition is false, the behavior is undefined.
No code is generated for this intrinsic, but the optimizer will try to
preserve it (and its condition) between passes, which may interfere
with optimization of surrounding code and reduce performance. It
should not be used if the invariant can be discovered by the optimizer
on its own, or if it does not enable any significant optimizations.
This intrinsic does not have a stable counterpart.
I have the following functions in a module:
pub fn square(s: u32) -> u64 {
if s < 1 || s > 64 {
panic!("Square must be between 1 and 64")
}
total_for_square(s) - total_for_square(s - 1)
}
fn total_for_square(s: u32) -> u64 {
if s == 64 {
return u64::max_value();
}
2u64.pow(s) - 1
}
pub fn total() -> u64 {
u64::max_value()
}
This works fine when calling individual functions directly. However, I want to optimize it and cache values to total_for_square to speed up future look ups (storing in a HashMap). How should I approach where to store the HashMap so it's available between calls? I know I could refactor to put all of this in a struct, but in this case, I cannot change the API.
In other, higher level languages I have used, I would just have a variable in the same scope as the functions. However, it's not clear if that is possible in Rust on the module level.
In other, higher level languages I have used, I would just have a variable in the same scope as the functions.
You can use something similar in Rust but it's syntactically more complicated: you need to create a global for your cache using lazy_static or once_cell for instance.
The cache will need to be thread-safe though, so either a regular map sitting behind a Mutex or RwLock, or some sort of concurrent map.
Although given you only have 64 inputs, you could just precompute the entire thing and return precomputed values directly.
The cached crate comes in handy:
use cached::proc_macro::cached;
#[cached]
fn total_for_square(s: u32) -> u64 {
if s == 64 {
return u64::MAX;
}
2u64.pow(s) - 1
}
Indeed, you only need to write two lines, and the crate will take care of everything. Internally, the cached values are stored in a hash map.
(Note that u64::max_value() has been superseded by u64::MAX)
Side note: in this specific case, the simplest solution is probably to modify square so that it returns s * s.
I've got question about my code:
pub fn get_signals(path: &String) -> Vec<Vec<f64>> {
let mut rdr = csv::ReaderBuilder::new().delimiter(b';').from_path(&path).unwrap();
let mut signals: Vec<Vec<f64>> = Vec::new();
for record in rdr.records(){
let mut r = record.unwrap();
for (i, value) in r.iter().enumerate(){
match signals.get(i){
Some(_) => {},
None => signals.push(Vec::new())
}
signals[i].push(value.parse::<f64>().unwrap());
}
}
signals
}
How exactly does Rust handle return? When I, for example write let signals = get_signal(&"data.csv".to_string()); does Rust assume I want a new instance of Vec(copies all the data) or just pass a pointer to previously allocated(via Vec::new()) memory? What is the most efficient way to do this? Also, what happens with rdr? I assume, given Rusts memory safety, it's destroyed.
How exactly does Rust handle return?
The only guarantee Rust, the language, makes is that values are never cloned without an explicit .clone() in the code. Therefore, from a semantic point of view, the value is moved which will not require allocating memory.
does Rust assume I want a new instance of Vec(copies all the data) or just pass a pointer to previously allocated (via Vec::new()) memory?
This is implementation specific, and part of the ABI (Application Binary Interface). The Rust ABI is not formalized, and not stable, so there is no standard describing it and no guarantee about this holding up.
Furthermore, this will depend on whether the function call is inlined or not. If the function call is inlined, there is of course no return any longer yet the same behavior should be observed.
For small values, they should be returned via a register (or a couple of registers).
For larger values:
the caller should reserve memory on the stack (properly sized and aligned) and pass a pointer to this area to the callee,
the callee will then construct the return value at the place pointed to, so that by the time it returns the value exists there for the caller to use.
Note: by the size here is the size on the stack, as returned by std::mem::size_of; so size_of::<Vec<_>>() == 24 on 64-bits architecture.
What is the most efficient way to do this?
Returning is as efficient as it gets for a single call.
If however you find yourself in a situation where, say, you want to read a file line by line, then it makes sense to reuse the buffer from one call to the other which can be accomplished either by:
taking a &mut references to the buffer (String or Vec<u8> say),
or taking a buffer by value and returning it.
The point being to avoid memory allocations.
I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.
An example of partial application:
fn add(x: i32, y: i32) -> i32 {
x + y
}
fn main() {
let add7 = |x| add(7, x);
println!("{}", add7(35));
}
Is there overhead to this practice?
Here is the kind of thing I like to do (from a real code):
fn foo(n: u32, things: Vec<Things>) {
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
let new_things = things.clone().into_iter().flat_map(create_new_multiplier);
things.extend(new_things);
}
This is purely visual. I do not like to imbricate too much the stuff.
There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.
In code:
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)
will be the exact same as
things.clone().into_iter().flat_map(|thing| {
ThingMultiplier::new(thing, n)
})
In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.
The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.
In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.
But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.
A simple example:
struct A;
fn main() {
test(2);
test(1);
}
fn test(i: i32) {
println!("test");
let a = A;
if i == 2 {
us(a);
}
println!("end");
}
impl Drop for A {
fn drop(&mut self) {
println!("drop");
}
}
#[allow(unused_variables)]
fn us(a: A){
println!("use");
}
When I run it, the output is:
test
use
drop
end
test
end
drop
I understand in the test(2) case, a is moved at us(a), so it's output is "test-use-drop-end".
However, in the test(1), the output is "test-end-drop", meaning that the compiler knows that a was not moved.
If us(a) is called, it will be unnecessary to drop a in test(i), it will be dropped in us(a); and if us(a) is not called, a must be dropped after println!("end").
Since it's impossible for the compiler to know whether us(a) is called or not, how does compiler know whether a.drop() shall be called or not after println!("end")?
This is explained in the Rustnomicon:
As of Rust 1.0, the drop flags are actually not-so-secretly stashed in a hidden field of any type that implements Drop.
The hidden field tells whether the current value has been dropped, or not, and if it has not then it is. Thus, this is known at run-time, and requires a bit of book keeping.
Looking to the future, there is a RFC to remove these hidden fields.
The idea of the RFC is to replace the hidden fields by:
Identifying unconditional drops (those don't need any run-time check)
Stash a hidden field on the stack, in the function frame, for those values conditionally being dropped
This new strategy has several advantages over the old one:
the main advantage being that #[repr(C)] will now always give a representation equivalent to the C's one even if the struct implements Drop
another important advantage is saving memory (by NOT inflating the struct size)
another slight advantage is a possible slight speed gain due to unconditional drops and better caching (from reducing memory size)