Rust has a tracing library that seems quite popular. It uses a building block called "span":
Spans represent periods of time in the execution of a program.
Now that I've set spans all throughout my app, how can I actually log their duration?
I've so far found:
tracing-timing. Great, but a bit elaborate, printing whole histogram, when I'd like simple durations.
tracing-tree. This one is really close to what I'm looking for, currently set up is failing for me, I'll figure it out, but this one still prints them in a tree, I'm looking more for the plain duration. No tree.
Any way to do that with tracing?
The basic formatting layer from tracing-subscriber is very flexible with what is logged. By default it will only show log events, but there are other events available for spans (new, enter, exit, close). You'd be interested in the logging the "close" events, which indicate when the span has ended and would know the time elapsed from when it started.
You can do this simply using .with_span_events() and FmtSpan::CLOSE. Here's a sample:
[dependencies]
tracing = "0.1.36"
tracing-subscriber = "0.3.15"
use std::time::Duration;
use tracing_subscriber::fmt;
use tracing_subscriber::fmt::format::FmtSpan;
#[tracing::instrument]
fn do_some_work(n: i32) {
std::thread::sleep(Duration::from_millis(100));
if n == 1 {
do_some_more_work();
}
}
#[tracing::instrument]
fn do_some_more_work() {
std::thread::sleep(Duration::from_millis(100));
}
fn main() {
fmt::fmt()
.with_span_events(FmtSpan::CLOSE)
.with_target(false)
.with_level(false)
.init();
for n in 0..3 {
do_some_work(n);
}
}
2022-09-14T15:47:01.684149Z do_some_work{n=0}: close time.busy=110ms time.idle=5.10µs
2022-09-14T15:47:01.904656Z do_some_work{n=1}:do_some_more_work: close time.busy=109ms time.idle=3.00µs
2022-09-14T15:47:01.904986Z do_some_work{n=1}: close time.busy=220ms time.idle=1.60µs
2022-09-14T15:47:02.014846Z do_some_work{n=2}: close time.busy=110ms time.idle=2.20µs
You can customize it to your liking further with other methods or by creating a custom FormatEvent implementation.
I do want to mention that tracing is "a framework for instrumenting Rust programs to collect structured, event-based diagnostic information." While function timing is part of that diagnostic information, it is designed in a way to collect that information in the field. If you're trying to assess the performance of your code in a synthetic environment, I'd encourage you to use a more robust benchmarking library like criterion.
I needed to call Rust code from my Go code. Then I used C as my interface. I did this:
I've created a Rust library that takes a CStr as a parameter, and returns a new processed string back as CStr.
This code is statically compiled to a static C library my_lib.a.
Then, this library is statically linked with my Go code, that then calls the library API using CGo (Go's representation to C String, just like Rusts's Cstr).
The final Go binary is sitting inside a docker container in my kubernetes. Now, my problem is that is some cases where the library's API is called, my pod (container) is crashing. Due to the nature of using CStr and his friends, I must use unsafe scopes in some places, and I highly suspect a segfault that is caused by one of the ptrs used in my code, but I have no way of communicating the error back to the Go code that could be then printed OR alternatively get some sort of a core dump from Rust/C so I can point out the problematic code. The pod just crashes with no info whatsoever, at least to my knowledge..
So my question is, how can I:
Recover from panic/crashes that happen inside an unsafe code? or maybe wrap it with a recoverable safe scope?
Override the SIG handlers so I can at least "catch" the errors and not crash? So I can debug it.
Perhaps communicate a signal interruption that was caused in my c-lib that was generated off Rust back to the caller?
I realize that once Rust is compiled to a c-library, it is a matter of C, but I have no idea how to tackle this one.
Thanks!
I've created a Rust library that takes a CStr as a parameter, and returns a new processed string back as CStr.
Neither operation seems OK:
the CStr documentation specifically notes that CStr is not repr(C)
CStr is a borrowed string, a "new processed string" would have to be owned (so a CString, which also isn't repr(C)).
Due to the nature of using CStr and his friends, I must use unsafe scopes in some places, and I highly suspect a segfault that is caused by one of the ptrs used in my code, but I have no way of communicating the error back to the Go code that could be then printed OR alternatively get some sort of a core dump from Rust/C so I can point out the problematic code. [...] Recover from panic/crashes that happen inside an unsafe code? or maybe wrap it with a recoverable safe scope?
If you're segfaulting there's no panic or crash which Rust can catch or manipulate in any way. A segfault means the OS itself makes your program go away. However you should have a core dump the usual way, this might be a configuration issue with your container thing.
Override the SIG handlers so I can at least "catch" the errors and not crash? So I can debug it.
You can certainly try to handle SIGSEGV, but after a SIGSEGV I'd expect the program state to be completely hosed, this is not an innocuous signal.
Is it possible to benchmark programs in Rust? If yes, how? For example, how would I get execution time of program in seconds?
It might be worth noting two years later (to help any future Rust programmers who stumble on this page) that there are now tools to benchmark Rust code as a part of one's test suite.
(From the guide link below) Using the #[bench] attribute, one can use the standard Rust tooling to benchmark methods in their code.
extern crate test;
use test::Bencher;
#[bench]
fn bench_xor_1000_ints(b: &mut Bencher) {
b.iter(|| {
// Use `test::black_box` to prevent compiler optimizations from disregarding
// Unused values
test::black_box(range(0u, 1000).fold(0, |old, new| old ^ new));
});
}
For the command cargo bench this outputs something like:
running 1 test
test bench_xor_1000_ints ... bench: 375 ns/iter (+/- 148)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
Links:
The Rust Book (section on benchmark tests)
"The Nightly Book" (section on the test crate)
test::Bencher docs
For measuring time without adding third-party dependencies, you can use std::time::Instant:
fn main() {
use std::time::Instant;
let now = Instant::now();
// Code block to measure.
{
my_function_to_measure();
}
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
}
There are several ways to benchmark your Rust program. For most real benchmarks, you should use a proper benchmarking framework as they help with a couple of things that are easy to screw up (including statistical analysis). Please also read the "Why writing benchmarks is hard" section at the very bottom!
Quick and easy: Instant and Duration from the standard library
To quickly check how long a piece of code runs, you can use the types in std::time. The module is fairly minimal, but it is fine for simple time measurements. You should use Instant instead of SystemTime as the former is a monotonically increasing clock and the latter is not. Example (Playground):
use std::time::Instant;
let before = Instant::now();
workload();
println!("Elapsed time: {:.2?}", before.elapsed());
The underlying platform-specific implementations of std's Instant are specified in the documentation. In short: currently (and probably forever) you can assume that it uses the best precision that the platform can provide (or something very close to it). From my measurements and experiences, this is typically approximately around 20 ns.
If std::time does not offer enough features for your case, you could take a look at chrono. However, for measuring durations, it's unlikely you need that external crate.
Using a benchmarking framework
Using frameworks is often a good idea, because they try to prevent you from making common mistakes.
Rust's built-in benchmarking framework (nightly only)
Rust has a convenient built-in benchmarking feature, which is unfortunately still unstable as of 2019-07. You have to add the #[bench] attribute to your function and make it accept one &mut test::Bencher argument:
#![feature(test)]
extern crate test;
use test::Bencher;
#[bench]
fn bench_workload(b: &mut Bencher) {
b.iter(|| workload());
}
Executing cargo bench will print:
running 1 test
test bench_workload ... bench: 78,534 ns/iter (+/- 3,606)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out
Criterion
The crate criterion is a framework that runs on stable, but it is a bit more complicated than the built-in solution. It does more sophisticated statistical analysis, offers a richer API, produces more information and can even automatically generate plots.
See the "Quickstart" section for more information on how to use Criterion.
Why writing benchmarks is hard
There are many pitfalls when writing benchmarks. A single mistake can make your benchmark results meaningless. Here is a list of important but commonly forgotten points:
Compile with optimizations: rustc -O3 or cargo build --release. When you are executing your benchmarks with cargo bench, Cargo will automatically enable optimizations. This step is important as there are often large performance difference between optimized and unoptimized Rust code.
Repeat the workload: only running your workload once is almost always useless. There are many things that can influence your timing: overall system load, the operating system doing stuff, CPU throttling, file system caches, and so on. So repeat your workload as often as possible. For example, Criterion runs every benchmarks for at least 5 seconds (even if the workload only takes a few nanoseconds). All measured times can then be analyzed, with mean and standard deviation being the standard tools.
Make sure your benchmark isn't completely removed: benchmarks are very artificial by nature. Usually, the result of your workload is not inspected as you only want to measure the duration. However, this means that a good optimizer could remove your whole benchmark because it does not have side-effects (well, apart from the passage of time). So to trick the optimizer, you have to somehow use your result value so that your workload cannot be removed. An easy way is to print the result. A better solution is something like black_box. This function basically hides a value from LLVM in that LLVM cannot know what will happen with the value. Nothing happens, but LLVM doesn't know. That is the point.
Good benchmarking frameworks use a block box in several situations. For example, the closure given to the iter method (for both, the built-in and Criterion Bencher) can return a value. That value is automatically passed into a black_box.
Beware of constant values: similarly to the point above, if you specify constant values in a benchmark, the optimizer might generate code specifically for that value. In extreme cases, your whole workload could be constant-folded into a single constant, meaning that your benchmark is useless. Pass all constant values through black_box to avoid LLVM optimizing too aggressively.
Beware of measurement overhead: measuring a duration takes time itself. That is usually only tens of nanoseconds, but can influence your measured times. So for all workloads that are faster than a few tens of nanoseconds, you should not measure each execution time individually. You could execute your workload 100 times and measure how long all 100 executions took. Dividing that by 100 gives you the average single time. The benchmarking frameworks mentioned above also use this trick. Criterion also has a few methods for measuring very short workloads that have side effects (like mutating something).
Many other things: sadly, I cannot list all difficulties here. If you want to write serious benchmarks, please read more online resources.
If you simply want to time a piece of code, you can use the time crate. time meanwhile deprecated, though. A follow-up crate is chrono.
Add time = "*" to your Cargo.toml.
Add
extern crate time;
use time::PreciseTime;
before your main function and
let start = PreciseTime::now();
// whatever you want to do
let end = PreciseTime::now();
println!("{} seconds for whatever you did.", start.to(end));
Complete example
Cargo.toml
[package]
name = "hello_world" # the name of the package
version = "0.0.1" # the current version, obeying semver
authors = [ "you#example.com" ]
[[bin]]
name = "rust"
path = "rust.rs"
[dependencies]
rand = "*" # Or a specific version
time = "*"
rust.rs
extern crate rand;
extern crate time;
use rand::Rng;
use time::PreciseTime;
fn main() {
// Creates an array of 10000000 random integers in the range 0 - 1000000000
//let mut array: [i32; 10000000] = [0; 10000000];
let n = 10000000;
let mut array = Vec::new();
// Fill the array
let mut rng = rand::thread_rng();
for _ in 0..n {
//array[i] = rng.gen::<i32>();
array.push(rng.gen::<i32>());
}
// Sort
let start = PreciseTime::now();
array.sort();
let end = PreciseTime::now();
println!("{} seconds for sorting {} integers.", start.to(end), n);
}
This answer is outdated! The time crate does not offer any advantages over std::time in regards to benchmarking. Please see the answers below for up to date information.
You might try timing individual components within the program using the time crate.
A quick way to find out the execution time of a program, regardless of implementation language, is to run time prog on the command line. For example:
~$ time sleep 4
real 0m4.002s
user 0m0.000s
sys 0m0.000s
The most interesting measurement is usually user, which measures the actual amount of work done by the program, regardless of what's going on in the system (sleep is a pretty boring program to benchmark). real measures the actual time that elapsed, and sys measures the amount of work done by the OS on behalf of the program.
Currently, there is no interface to any of the following Linux functions:
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts)
getrusage
times (manpage: man 2 times)
The available ways to measure the CPU time and hotspots of a Rust program on Linux are:
/usr/bin/time program
perf stat program
perf record --freq 100000 program; perf report
valgrind --tool=callgrind program; kcachegrind callgrind.out.*
The output of perf report and valgrind depends on the availability of debugging information in the program. It may not work.
I created a small crate for this (measure_time), which logs or prints the time until end of scope.
#[macro_use]
extern crate measure_time;
fn main() {
print_time!("measure function");
do_stuff();
}
The other solution of measuring execution time is creating a custom type, for example, a struct and implement Drop trait for it.
For example:
struct Elapsed(&'static str, std::time::SystemTime);
impl Drop for Elapsed {
fn drop(&mut self) {
println!(
"operation {} finished for {} ms",
self.0,
self.1.elapsed().unwrap_or_default().as_millis()
);
}
}
impl Elapsed {
pub fn start(op: &'static str) -> Elapsed {
let now = std::time::SystemTime::now();
Elapsed(op, now)
}
}
And using it in some function:
fn some_heavy_work() {
let _exec_time = Elapsed::start("some_heavy_work_fn");
// Here's some code.
}
When the function ends, the drop method for _exec_time will be called and the message will be printed.
I'm trying to get some performance metrics using the flame crate with code I've written using Rayon:
extern crate flame;
flame::start("TAG-A");
//Assume vec is a Vec<i32>
vec.par_iter_mut().filter(|a| a == 1).for_each(|b| func(b));
//func(b) operates on each i32 and sends some results to a channel
flame::end("TAG-A");
//More code but unrelated
flame::dump_stdout();
This works fine, but only gives information for the entire parallel iterator. I would like to get some more fine grained details on the function func.
I've tried adding a start/end within the function, but the runtime information is only available when I call flame::commit_thread() and then it seems to only print this to stdout. Ideally I'd like to print out the time spent without a given tag when I call dump at the end of my code.
Is there a way to dump tags from all threads? The documentation for flame isn't great.
I'm using the lazy-init and sysinfo crates together. Getting information about a process is quite expensive so I thought I would hide it behind a Lazy<T>, in fact a Lazy<Process>. So I have a little struct - just focusing on the pertinent bits:
pub struct ProgramInfo {
process: Lazy<Process>
}
and a function to get the Process:
impl ProgramInfo {
pub fn process(&self) -> &Process {
self.process.get_or_create(|| {
let system = System::new();
let pid = sysinfo::get_current_pid();
let ref_to_process = system.get_process(pid).unwrap();
ref_to_process.clone()
})
}
I added the clone() to get it to compile, but it bothers me because it appears unnecessary. A second copy of the Process struct is being made simply so that it can be moved into ProgramInfo.process. Is there a way of just moving the Process referenced by ref_to_process instead? I tried changing the last line to just
*ref_to_process
but that won't compile, giving error "cannot move out of borrowed content".
From a cursory reading of the sysinfo crate, the answer is no.
There does not appear to be a method which returns anything but references to Process; and therefore System never relinquishes ownership and it would be unsafe to attempt to steal it...
A solution, which seems more palatable to me, would be to change ProgramInfo to:
hold onto a system: Lazy<System>,
query system each time for the current PID.
How efficient that is would depend on whether system re-reads the process info each time, or not.
That being said, from a purely theoretical standpoint, you could indeed steal it anyway:
you can use ptr::read to create a copy of the instance,
then call mem::forget on system so the System instance is leaked and thus never destroyed.
I doubt this is what you want, I would certainly never recommend it.