Missed optimization based on using a reference to a counter variable - rust

In the following example I'm reading a 2.7GiB file from my desktop, count the bytes and do something with the bytes that prevents optimizing the loop away.
use std::fs::File;
use std::io::Read;
use std::io::{BufReader, ErrorKind};
fn main() {
let file = File::open("/home/seb/Desktop/ubuntu-20.04.1-desktop-amd64.iso")
.expect("Cannot read file.");
let buf = BufReader::new(file);
let mut x = 0u8;
let mut num_bytes = 0usize;
read(buf, &mut num_bytes, &mut x);
print(num_bytes, x);
}
fn read(mut buf: BufReader<File>, num_bytes: &mut usize, x: &mut u8) {
let mut bytes = [0; 512];
loop {
match buf.read(&mut bytes) {
Ok(0) => break,
Ok(n) => {
for i in 0..n {
*num_bytes += n;
*x += bytes[i];
}
}
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
Err(e) => panic!("{:?}", e),
};
}
}
fn print(num_bytes: usize, x: u8) {
println!("{}", num_bytes);
println!("{}", x);
}
I get roughly 20% differences in throughput based on whether num_bytes is passed as a reference to print or by value. Unscientifically testing through time shows consistent differences, always in the same ballpark.
fn print(num_bytes: &usize, x: u8) {
println!("{}", num_bytes);
println!("{}", x);
}
# passing num_bytes by value
cargo build --release && time target/release/test_read
Compiling test_read v0.1.0 (/data/rust_projects/test_read)
Finished release [optimized] target(s) in 0.23s
2785017856
101
target/release/test_read 0,67s user 0,29s system 99% cpu 0,966 total
As soon as a reference to num_bytes is used later, the program is about 20% slower.
fn print(num_bytes: usize, x: u8) {
println!("{}", num_bytes);
println!("{}", x);
}
# passing num_bytes by reference
cargo build --release && time target/release/test_read
Compiling test_read v0.1.0 (/data/rust_projects/test_read)
Finished release [optimized] target(s) in 0.22s
2785017856
101
target/release/test_read 1,00s user 0,26s system 99% cpu 1,258 total
I'm running this on Ubuntu 20.04 and with Rust 1.50.
The example is mostly taken from this question: What is a faster way to iterate through the bytes of a file in Rust?
Is there any explanation for using a reference to a counter variable to cause this?

After narrowing down the example, it turns out that using the reference to the counter variable stops the compiler from optimizing away additions on every loop iteration. by_ref and by_value two benches here have a ten-fold difference in run time, while by_ref_use_length matches by_value:
// benches/bench.rs
pub fn by_ref(buf: &[u8]) {
let mut x = 0u8;
let mut num_bytes = 0usize;
for &byte in buf {
num_bytes += 1;
x += byte;
}
black_box((&num_bytes, x));
}
pub fn by_value(buf: &[u8]) {
let mut x = 0u8;
let mut num_bytes = 0usize;
for &byte in buf {
num_bytes += 1;
x += byte;
}
black_box((num_bytes, x));
}
pub fn by_ref_use_length(buf: &[u8]) {
let mut x = 0u8;
let num_bytes = buf.len();
for &byte in buf {
x += byte;
}
black_box((&num_bytes, x));
}
fn criterion_benchmark(c: &mut Criterion) {
let buf = vec![0; 1024 * 1024 * 64];
c.bench_function("len num_bytes by ref", |b| b.iter(|| by_ref_use_length(&buf)));
c.bench_function("num_bytes by ref", |b| b.iter(|| by_ref(&buf)));
c.bench_function("num_bytes by value", |b| b.iter(|| by_value(&buf)));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
# Cargo.toml
[package]
name = "test_read"
version = "0.1.0"
authors = [""]
edition = "2018"
[dev-dependencies]
criterion = "0.3"
[[bench]]
name = "bench"
harness = false
cargo bench --all
num_bytes by ref time: [723.42 us 726.62 us 729.54 us]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) low mild
len num_bytes by ref time: [696.26 us 699.16 us 701.87 us]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) low mild
num_bytes by value time: [721.68 us 725.48 us 729.55 us]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild
https://rust.godbolt.org/z/zqfn7n shows no instructions related to the counter in the loop when the counter variable is not passed by reference, whereas passing by reference prevents such optimizations. I'm not familiar with reading assembly, but I'm assuming that the length of the input slice is used rather than actually incrementing.
In the original example, the differences were only about 20%, which is probably due to the dependency on IO which restricts the optimization to adding n on every read iteration rather than simply returning the length of the input array. In fact, changing the counter in the loop to a single addition outside the inner loop matches the performance of both implementations:
let mut bytes = [0; 512];
loop {
match buf.read(&mut bytes) {
Ok(0) => break,
Ok(n) => {
// pulling the num_bytes addition out of the inner loop
// seems to imitate the optimization done by the compiler
// when num_bytes is not passed anywhere by reference.
num_bytes += n;
for i in 0..n {
x += bytes[i];
}
},
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
Err(e) => panic!("{:?}", e),
};
}
This should answer what happens, but I have no clue why the optimization depends on a reference being used somewhere downstream.

Related

Accessing Vector elements gives me an error Rust

I trying to write a program that will find the median of any given list.
Eventually, In the FINAL FINAL stretch, an error was shot into my face.
I tried to access elements of a Vector through a variable.
Take a look at the calc_med() function.
use std::io;
use std::sync::Mutex;
#[macro_use]
extern crate lazy_static;
lazy_static! {
static ref num_list: Mutex<Vec<f64>> = Mutex::new(Vec::new());
}
fn main() {
loop {
println!("Enter: ");
let mut inp: String = String::new();
io::stdin().read_line(&mut inp).expect("Failure");
let upd_inp: f64 = match inp.trim().parse() {
Ok(num) => num,
Err(_) => {
if inp.trim() == String::from("q") {
break;
} else if inp.trim() == String::from("d") {
break {
println!("Done!");
calc_med();
};
} else {
continue;
}
}
};
num_list.lock().unwrap().push(upd_inp);
num_list
.lock()
.unwrap()
.sort_by(|a, b| a.partial_cmp(b).unwrap());
println!("{:?}", num_list.lock().unwrap());
}
}
fn calc_med() {
// FOR THE ATTENTION OF STACKOVERFLOW
let n: f64 = ((num_list.lock().unwrap().len()) as f64 + 1.0) / 2.0;
if n.fract() == 0.0 {
let n2: usize = n as usize;
} else {
let n3: u64 = n.round() as u64;
let n4: usize = n3 as usize;
let median: f64 = (num_list[n4] + num_list[n4 - 1]) / 2;
println!("{}", median);
}
}
The error is as following:
Compiling FindTheMedian v0.1.0 (/home/isaak/Documents/Code/Rusty/FindTheMedian)
error[E0608]: cannot index into a value of type `num_list`
--> src/main.rs:50:28
|
50 | let median: f64 = (num_list[n4] + num_list[n4 - 1]) / 2;
| ^^^^^^^^^^^^
error[E0608]: cannot index into a value of type `num_list`
--> src/main.rs:50:43
|
50 | let median: f64 = (num_list[n4] + num_list[n4 - 1]) / 2;
| ^^^^^^^^^^^^^^^^
The current code is trying to index a variable of type Mutex<Vec<f64>>, which is not valid. The way you access the underlying data in a mutex is by calling .lock() on it, which will in turn return a structure that resembles Result<Vec<f64>, Error>.
So, fixing only the line would look like this:
let num_list_vec = num_list.lock().unwrap();
let median: f64 = (num_list_vec[n4] + num_list_vec[n4 - 1]) / 2;
However, since you already locked at the start of the function this will not work, since the mutex is already locked. The best way then is to do the locking + unwraping at the start of the function and use the underlying value in all places:
fn calc_med() {
let num_list_vec = num_list.lock().unwrap();
let n: f64 = ((num_list_vec.len()) as f64 + 1.0) / 2.0;
if n.fract() == 0.0 {
let n2: usize = n as usize;
} else {
let n3: u64 = n.round() as u64;
let n4: usize = n3 as usize;
let median: f64 = (num_list_vec[n4] + num_list_vec[n4 - 1]) / 2;
println!("{}", median);
}
}
Edit: Checking your main, I see you are also lock().unwrap()ing in sequence a lot, which is not the way Mutex should be used. Mutex is mainly used whenever you have a need for multi-threaded programming, so that different threads cannot access the same variable twice. It also incurs a performance hit, so you shouldn't really use it in single-threaded scenarios most of the time.
Unless there's a bigger picture we're missing, you should just define your Vec in main and pass it to calc_med as an argument. If the reason you did what you did was to get it as a global, there are other ways to do that in Rust without performance hits, but due to safe design of Rust these ways are not encouraged and should only be used if you know 100% what you want.
Your error is the num_list is not an vector, it's a mutex with an vector inside of it. To access the value inside of a mutex, you must lock it, and then unwrap the result. You do this correctly in main.
To avoid continually unlocking and locking, it is generally best practice to lock the mutex once, at the start of the function. Rust will automatically drop the lock when the reference goes out of scope. See the updated example:
fn calc_med() { // FOR THE ATTENTION OF STACKOVERFLOW
let nums = num_list.lock().unwrap();
let n: f64 = (nums.len() as f64 + 1.0) / 2.0;
if n.fract() == 0.0 {
let n2: usize = n as usize;
} else {
let n3: u64 = n.round() as u64;
let n4: usize = n3 as usize;
let median: f64 = (nums[n4] + nums[n4 - 1]) / 2;
println!("{}", median);
}
}

How to use common BTreeMap variable in rust(single thread)

Here is my original simplified code, I want to use a global variable instead of the variables in separate functions. What's the suggestion method in rust?
BTW, I've tried to use global or change to function parameter, both are nightmare for a beginner. Too difficult to solve the lifetime & variable type cast issue.
This simple program is only a single thread tool, so, in C language, it is not necessary the extra mutex.
// version 1
use std::collections::BTreeMap;
// Trying but failed
// let mut guess_number = BTreeMap::new();
// | ^^^ expected item
fn read_csv() {
let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv();
}
To explain how hard it is for a beginner, by pass parameter
// version 2
use std::collections::BTreeMap;
fn read_csv(guess_number: BTreeMap) {
// ^^^^^^^^ expected 2 generic arguments
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
After some effort, try & error to get the possible work type BTreeMap<&str, i32>
// version 3
use std::collections::BTreeMap;
fn read_csv(guess_number: &BTreeMap<&str, i32>) {
// let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
will cause following error
7 | fn read_csv(guess_number: &BTreeMap<&str, i32>) {
| -------------------- help: consider changing this to be a mutable reference: `&mut BTreeMap<&str, i32>`
...
16 | guess_number.insert(vec[0], number);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `guess_number` is a `&` reference, so the data it refers to cannot be borrowed as mutable
The final answer (seems not suggest use global in Rust, so use 'mutable reference').
// version 4
use std::collections::BTreeMap;
fn read_csv(guess_number: &mut BTreeMap<&str, i32>) {
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&mut guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
This question is not specific to BTreeMaps but for pretty much all data types, such as numbers, strings, vectors, enums, etc.
If you want to pass a variable (value) from one function to another, you can do that in various ways in Rust. Typically you either move the value or you pass a reference to it. Moving is something quite specific to Rust and its ownership model. This is really essential, so if you have serious intentions to learn Rust, I strongly suggest you read the chapter Understanding Ownership from "the book". Don't get discouraged if you don't understand it from one reading. Spend as much time as needed, as you really can't move forward w/o this knowledge.
As for global variables, there are very few situations where they should be used. In Rust using global variables is slightly more difficult, compared to most other languages. This thread is quite useful, although you might find it a bit difficult to comprehend. My advice to a beginner would be to first fully understand the basic concept of moving and passing references.

Is creating a large Vec full of sequential u64 faster via loop and push() or via collect()?

I am looking for the most efficient way of doing this as I have to create a vector of about 600,000 u64 integers.
Here is my first attempt:
fn latest_ids(current_id: u64, latest_id: u64) -> Vec<u64> {
let mut ids: Vec<u64> = vec![];
let mut start = current_id;
while !(start >= latest_id) {
start += 1;
ids.push(start);
}
ids
}
Second attempt:
fn latest_ids(current_id: u64, latest_id: u64) -> Vec<u64> {
let ids: Vec<u64> = (current_id+1..latest_id).collect();
ids
}
The second version is much shorter/cleaner, but I am not sure how efficient collect() is going to be? Or perhaps there is a better way?
If you're ever in doubt about performance in Rust, don't forget about benchmarks.
#![feature(test)]
extern crate test;
#[cfg(test)]
mod tests {
use test::Bencher;
const CURRENT_ID: u64 = 1;
const LATEST_ID: u64 = 60000;
#[bench]
fn push(b: &mut Bencher) {
b.iter(|| {
let mut ids: Vec<u64> = vec![];
let mut start = CURRENT_ID;
while !(start >= LATEST_ID) {
start += 1;
ids.push(start);
}
});
}
#[bench]
fn collect(b: &mut Bencher) {
b.iter(|| {
let _ids: Vec<u64> = (CURRENT_ID + 1..LATEST_ID).collect();
});
}
}
Running cargo bench,
running 2 tests
test tests::collect ... bench: 29,931 ns/iter (+/- 6,842)
test tests::push ... bench: 85,701 ns/iter (+/- 18,096)
You can see that collect is actually faster than push (by a lot). I'm guessing this has to do with push sometimes having to clone the whole Vec and move it to a different location in memory (don't quote me on that though).

Why does a generic function replicating C's fread for unsigned integers always return zero?

I am trying to read in binary 16-bit machine instructions from a 16-bit architecture (the exact nature of that is irrelevant here), and print them back out as hexadecimal values. In C, I found this simple by using the fread function to read 16 bits into a uint16_t.
I figured that I would try to replicate fread in Rust. It seems to be reasonably trivial if I can know ahead-of-time the exact size of the variable that is being read into, and I had that working specifically for 16 bits.
I decided that I wanted to try to make the fread function generic over the various built-in unsigned integer types. For that I came up with the below function, using some traits from the Num crate:
fn fread<T>(
buffer: &mut T,
element_count: usize,
stream: &mut BufReader<File>,
) -> Result<usize, std::io::Error>
where
T: num::PrimInt + num::Unsigned,
{
let type_size = std::mem::size_of::<T>();
let mut buf = Vec::with_capacity(element_count * type_size);
let buf_slice = buf.as_mut_slice();
let bytes_read = match stream.read_exact(buf_slice) {
Ok(()) => element_count * type_size,
Err(ref e) if e.kind() == std::io::ErrorKind::UnexpectedEof => 0,
Err(e) => panic!("{}", e),
};
*buffer = buf_slice
.iter()
.enumerate()
.map(|(i, &b)| {
let mut holder2: T = num::zero();
holder2 = holder2 | T::from(b).expect("Casting from u8 to T failed");
holder2 << ((type_size - i) * 8)
})
.fold(num::zero(), |acc, h| acc | h);
Ok(bytes_read)
}
The issue is that when I call it in the main function, I seem to always get 0x00 back out, but the number of bytes read that is returned by the function is always 2, so that the program enters an infinite loop:
extern crate num;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::Read;
fn main() -> Result<(), std::io::Error> {
let cmd_line_args = std::env::args().collect::<Vec<_>>();
let f = File::open(&cmd_line_args[1])?;
let mut reader = BufReader::new(f);
let mut instructions: Vec<u16> = Vec::new();
let mut next_instruction: u16 = 0;
fread(&mut next_instruction, 1, &mut reader)?;
let base_address = next_instruction;
while fread(&mut next_instruction, 1, &mut reader)? > 0 {
instructions.push(next_instruction);
}
println!("{:#04x}", base_address);
for i in instructions {
println!("0x{:04x}", i);
}
Ok(())
}
It appears to me that I'm somehow never reading anything from the file, so the function always just returns the number of bytes it was supposed to read. I'm clearly not using something correctly here, but I'm honestly unsure what I'm doing wrong.
This is compiled on Rust 1.26 stable for Windows if that matters.
What am I doing wrong, and what should I do differently to replicate fread? I realise that this is probably a case of the XY problem (in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer), but I'm really curious as to what I'm doing wrong here.
Your problem is that this line:
let mut buf = Vec::with_capacity(element_count * type_size);
creates a zero-length vector, even though it allocates memory for element_count * type_size bytes. Therefore you are asking stream.read_exact to read zero bytes. One way to fix this is to replace the above line with:
let mut buf = vec![0; element_count * type_size];
Side note: when the read succeeds, bytes_read receives the number of bytes you expected to read, not the number of bytes you actually read. You should probably use std::mem::size_of_val (buf_slice) to get the true byte count.
in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer
Yes, use the byteorder crate. This requires no unneeded heap allocation (the Vec in the original code):
extern crate byteorder;
use byteorder::{LittleEndian, ReadBytesExt};
use std::{
fs::File, io::{self, BufReader, Read},
};
fn read_instructions_to_end<R>(mut rdr: R) -> io::Result<Vec<u16>>
where
R: Read,
{
let mut instructions = Vec::new();
loop {
match rdr.read_u16::<LittleEndian>() {
Ok(instruction) => instructions.push(instruction),
Err(e) => {
return if e.kind() == std::io::ErrorKind::UnexpectedEof {
Ok(instructions)
} else {
Err(e)
}
}
}
}
}
fn main() -> Result<(), std::io::Error> {
let name = std::env::args().skip(1).next().expect("no file name");
let f = File::open(name)?;
let mut f = BufReader::new(f);
let base_address = f.read_u16::<LittleEndian>()?;
let instructions = read_instructions_to_end(f)?;
println!("{:#04x}", base_address);
for i in &instructions {
println!("0x{:04x}", i);
}
Ok(())
}

Can I reset a borrow of a local in a loop?

I have a processing loop that needs a pointer to a large lookup table.
The pointer is unfortunately triply indirected from the source data, so keeping that pointer around for the inner loop is essential for performance.
Is there any way I can tell the borrow checker that I'm "unborrowing" the state variable in the unlikely event I need to modify the state... so I can only re-lookup the slice in the event that the modify_state function triggers?
One solution I thought of was to change data to be a slice reference and do a mem::replace on the struct at the beginning of the function and pull the slice into local scope, then replace it back at the end of the function — but that is very brittle and error prone (as I need to remember to replace the item on every return). Is there another way to accomplish this?
struct DoubleIndirect {
data: [u8; 512 * 512],
lut: [usize; 16384],
lut_index: usize,
}
#[cold]
fn modify_state(s: &mut DoubleIndirect) {
s.lut_index += 63;
s.lut_index %= 16384;
}
fn process(state: &mut DoubleIndirect) -> [u8; 65536] {
let mut ret: [u8; 65536] = [0; 65536];
let mut count = 0;
let mut data_slice = &state.data[state.lut[state.lut_index]..];
for ret_item in ret.iter_mut() {
*ret_item = data_slice[count];
if count % 197 == 196 {
data_slice = &[];
modify_state(state);
data_slice = &state.data[state.lut[state.lut_index]..];
}
count += 1
}
return ret;
}
The simplest way to do this is to ensure the borrows of state are all disjoint:
#[cold]
fn modify_state(lut_index: &mut usize) {
*lut_index += 63;
*lut_index %= 16384;
}
fn process(state: &mut DoubleIndirect) -> [u8; 65536] {
let mut ret: [u8; 65536] = [0; 65536];
let mut count = 0;
let mut lut_index = &mut state.lut_index;
let mut data_slice = &state.data[state.lut[*lut_index]..];
for ret_item in ret.iter_mut() {
*ret_item = data_slice[count];
if count % 197 == 196 {
modify_state(lut_index);
data_slice = &state.data[state.lut[*lut_index]..];
}
count += 1
}
return ret;
}
The problem is basically two things: first, Rust will not look beyond a function's signature to find out what it does. As far as the compiler knows, your call to modify_state could be changing state.data as well, and it can't allow that.
The second problem is that borrows are lexical; the compiler looks at the block of code where the borrow might be used as goes with that. It doesn't (currently) bother to try and reduce the length of borrows to match where they're actually active.
You can also play games with, for example, using std::mem::replace to pull state.data out into a local variable, do your work, then replace it back just before you return.

Resources