Why does HashMap::iter.nth(0) give different output for each execution run? [duplicate] - hashmap

This question already has answers here:
Get first element from HashMap
(2 answers)
Closed 4 years ago.
Given the below program:
use std::collections::HashMap;
fn main() {
let mut hm = HashMap::new();
hm.insert(0, 1);
hm.insert(1, 1);
let mut iter = hm.iter();
println!("{:?}", iter.nth(0).expect("Fatal.").0)
}
I get a different output for each execution run of the code:
procyclinsur#procyclinsur:~/Documents/Rust/t1$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/t1`
1
procyclinsur#procyclinsur:~/Documents/Rust/t1$ vim src/main.rs
procyclinsur#procyclinsur:~/Documents/Rust/t1$ cargo run
Compiling t1 v0.1.0 (file:///home/procyclinsur/Documents/Rust/t1)
Finished dev [unoptimized + debuginfo] target(s) in 1.12 secs
Running `target/debug/t1`
1
procyclinsur#procyclinsur:~/Documents/Rust/t1$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/t1`
1
procyclinsur#procyclinsur:~/Documents/Rust/t1$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/t1`
0
procyclinsur#procyclinsur:~/Documents/Rust/t1$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/t1`
1
I'd expect that I should see the same output for each run of the program. Does anyone know the reason why this code behaves in such a way? How do I get this to output only 0 as expected?

It's because a HashMap is unordered. Its iter method
visits all key-value pairs in arbitrary order
The documentation on collections describes this behavior:
For unordered collections like HashMap, the items will be yielded in
whatever order the internal representation made most convenient.
In order to always retrieve a specific value you need to search by the key (or use one of the Iterator methods that don't depend on the order, e.g. find); for example:
use std::collections::HashMap;
fn main() {
let mut hm = HashMap::new();
hm.insert(0, "a");
hm.insert(1, "b");
println!("{:?}", hm.get(&0)) // always Some("a")
}

As others have said, the order is not predictable. However, using a different hasher can at least give you reproducible results. For example the FNV hash function, which you can use like this:
extern crate fnv;
use std::collections::HashMap;
use std::hash::BuildHasherDefault;
use fnv::FnvHasher;
type HashMapFnv<K, V> = HashMap<K, V, BuildHasherDefault<FnvHasher>>;
fn main() {
let mut hm = HashMapFnv::default();
hm.insert(0, 1);
hm.insert(1, 1);
let mut iter = hm.iter();
println!("{:?}", iter.nth(0).expect("Fatal.").0)
}
This should give you the same results each time. However, there are no guarantees about what that order is, so you might get different results on a different operating system, if you update FNV or if you use a different version of Rust itself.
Note that you shouldn't use this hash function in any application that processes external data, as it is be vulnerable to hash collision attacks. The default hash function in Rust is safe.

It is worth mentioning BTreeMap and corresponding iter function
Gets an iterator over the entries of the map, sorted by key.
This may be a simple drop-in replacement.
use std::collections::BTreeMap;
fn main() {
let mut hm = BTreeMap::new();
hm.insert(0, 1);
hm.insert(1, 1);
let mut iter = hm.iter();
println!("{:?}", iter.nth(0).expect("Fatal.").0);
}
playground

Related

CPU time sleep instead of wall-clock time sleep

Currently, I have the following Rust toy program:
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis
.par_iter()
.for_each(|_| thread::sleep(time::Duration::from_millis(1000)));
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
Basically, my program accepts one input argument n. Then, I sleep for 1 second n times. The program executes the sleep tasks in parallel using rayon.
However, this is not exactly what I want. As far as I know, thread::sleep sleeps according to wall-clock time. However, I would like to keep a virtual CPU busy for 1 second in CPU time.
Is there any way to do this?
EDIT
I would like to make this point clear: I don't mind if the OS preempts the tasks. However, if this happens, then I don't want to consider the time the task spends in the ready/waiting queue.
EDIT
This is a simple, illustrative example of what I need to do. In reality, I have to develop a benchmark for a crate that allows defining and simulating models using the DEVS formalism. The benchmark aims to compare DEVS-compliant libraries with each other, and it explicitly says that the models must spend a fixed, known amount of CPU time. That is why I need to make sure of that. Thus, I cannot use a simple busy loop nor simply sleep.
I followed Sven Marnach's suggestions and implemented the following function:
use cpu_time::ThreadTime;
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis.par_iter().for_each(|_| {
let duration = time::Duration::from_millis(1000);
let mut x: u32 = 0;
let now = ThreadTime::now(); // get current thread time
while now.elapsed() < duration { // active sleep
std::hint::black_box(&mut x); // to avoid compiler optimizations
x = x.wrapping_add(1);
}
});
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
If I set n to 8, it takes 2 seconds more or less. I'd expect a better performance (1 second, as I have 8 vCPUs), but I guess that the overhead corresponds to the OS scheduling policy.

How do I modify values in a HashMap in Rust?

If I had a Hashmap in Rust and I wanted to get a value and modify it, for example, lets say the value is a type u32 and I want to increment it, what is the best way to do that? I found an example that works but I am wondering of there a "Best Practices" way to do it. The example I found that does the job is:
`use std::collections::HashMap;
use std::cell::RefCell;
fn main() {
let mut map = HashMap::new();
map.insert("Key".to_owned(), RefCell::new(0));
let value = map.get("Key").unwrap();
*value.borrow_mut() += 1;
println!("{:?}", value.borrow());
}`
Which worked for me, but I was suspicious of using RefCell for this. Is there a better way to do it? Thanks.
Yeah, I'm suspicious of that RefCell too. You'd use that if you had a very specific requirement, such as the interior mutability capabilities of RecCell.
I don't see why you can't just use the example code and ditch the RefCell.
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.insert("Key".to_owned(), 0);
let value = map.get_mut("Key").unwrap();
*value += 1;
println!("{:?}", value);
let read_value = map.get("Key").unwrap();
println!("{:?}", read_value);
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a23a90c4df7eb010945980b9b95eb031
RefCell provides run time borrow checking, as opposed to compile time borrow checking which you would otherwise get.
In many cases you do not need that - you can just use get_mut as suggested in the comments and by #cadolphs.
However if you need to get mutable access to individual elements within the map at the same time, you might use RefCell. For example, consider this code:
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.insert("Key".to_owned(), 0);
map.insert("Key2".to_owned(), 1);
let value = map.get_mut("Key").unwrap();
let value2 = map.get("Key2").unwrap();
*value += *value2;
println!("{:?}", *value);
}
This will fail to compile because I am trying to get a second value from the hashmap while I am still holding a mutable reference to the first:
error[E0502]: cannot borrow `map` as immutable because it is also borrowed as mutable
--> src/main.rs:8:18
|
7 | let value = map.get_mut("Key").unwrap();
| ------------------ mutable borrow occurs here
8 | let value2 = map.get("Key2").unwrap();
| ^^^^^^^^^^^^^^^ immutable borrow occurs here
9 | *value += *value2;
| ----------------- mutable borrow later used here
You could solve that using RefCell like this:
use std::collections::HashMap;
use std::cell::RefCell;
fn main() {
let mut map = HashMap::new();
map.insert("Key".to_owned(), RefCell::new(0));
map.insert("Key2".to_owned(), RefCell::new(1));
let value = map.get("Key").unwrap();
let value2 = map.get("Key2").unwrap();
*value.borrow_mut() += *value2.borrow();
println!("{:?}", *value);
}
Here we can get the value we intend to modify out of the hash using get rather than get_mut, so we are not borrowing the hash mutably. The hash itself is not being modified, just the values inside it - this is the pattern referred to in the Rust community as interior mutability.
This pattern should be used very sparingly though, only when really needed.
For one thing, you are trading a compile time check for a run time check. If you have made a mistake in your logic, you won't find out at compile time, you will find out when the code panics at runtime! You can work around that by using the try_borrow* versions of these methods (eg. try_borrow_mut), which return a Result instead of panicing, but then you need to add error handling to deal with it.
Another reason is that a run time borrow check may harm the performance of your code.
My example above is a case where you can easily avoid the whole thing, because the values in the hashmap are just integers which are Copy, so we can just do this instead:
let value2 = *map.get("Key2").unwrap();
let value = map.get_mut("Key").unwrap();
*value += value2;

Why isn't this out-of-bounds error detected at compile-time?

I've just started coding in rust for a few days now and have stumbled upon this case here that I don't understand
struct Foo {
arr: [u8; 5]
}
fn main() {
let foo = Foo{ arr: [0; 5] };
let bar = &foo;
println!("{}", bar.arr[100]);
}
Why does this code compile? Can't the compiler see that there is an out-of-bounds error there? It can detect it when i try to print foo.arr[100], so what gives?
In general, not all errors that could be detected at compile-time will be reported to the compiler. In this case though, it should be detected since Rust 1.60 and earlier will report this problem, but Rust 1.61 and 1.62 will not:
$ cargo +1.60 build
Compiling mycrate v0.1.0 (/rust-tests)
error: this operation will panic at runtime
--> src/main.rs:133:20
|
133 | println!("{}", bar.arr[100]);
| ^^^^^^^^^^^^ index out of bounds: the length is 5 but the index is 100
|
= note: `#[deny(unconditional_panic)]` on by default
$ cargo +1.61 build
Compiling mycrate v0.1.0 (/rust-tests)
Finished dev [unoptimized + debuginfo] target(s) in 0.78s
This has already been reported as issue #98444: Taking a shared reference of an array suppresses the unconditional_panic lint. You can downgrade your toolchain but hopefully it is resolved soon.

Why is running cargo bench faster than running release build?

I want to benchmark my Rust programs, and was comparing some alternatives to do that. I noted, however, that when running a benchmark with cargo bench and the bencher crate, the code runs consistently faster than when running a production build (cargo build --release) with the same code. For example:
Main code:
use dot_product;
const N: usize = 1000000;
use std::time;
fn main() {
let start = time::Instant::now();
dot_product::rayon_parallel([1; N].to_vec(), [2; N].to_vec());
println!("Time: {:?}", start.elapsed());
}
Average time: ~20ms
Benchmark code:
#[macro_use]
extern crate bencher;
use dot_product;
use bencher::Bencher;
const N: usize = 1000000;
fn parallel(bench: &mut Bencher) {
bench.iter(|| dot_product::rayon_parallel([1; N].to_vec(), [2; N].to_vec()))
}
benchmark_group!(benches, sequential, parallel);
benchmark_main!(benches);
Time: 5,006,199 ns/iter (+/- 1,320,975)
I tried the same with some other programs and cargo bench gives consistently faster results. Why could this happen?
As the comments suggested, you should use criterion::black_box() on all (final) results in the benchmarking code. This function does nothing - and simply gives back its only parameter - but is opaque to the optimizer, so the compiler has to assume the function does something with the input.
When not using black_box(), the benchmarking code doesn't actually do anything, as the compiler is able to figure out that the results of your code are unused and no side-effects can be observed. So it removes all your code during dead-code elimination and what you end up benchmarking is the benchmarking-suite itself.

Parsing 40MB file noticeably slower than equivalent Pascal code [duplicate]

This question already has an answer here:
Why is my Rust program slower than the equivalent Java program?
(1 answer)
Closed 2 years ago.
use std::fs::File;
use std::io::Read;
fn main() {
let mut f = File::open("binary_file_path").expect("no file found");
let mut buf = vec![0u8;15000*707*4];
f.read(&mut buf).expect("Something went berserk");
let result: Vec<_> = buf.chunks(2).map(|chunk| i16::from_le_bytes([chunk[0],chunk[1]])).collect();
}
I want to read a binary file. The last line takes around 15s. I'd expect it to only take a fraction of a second. How can I optimise it?
Your code looks like the compiler should be able to optimise it decently. Make sure that you compile it in release mode using cargo build --release. Converting 40MB of data to native endianness should only take a fraction of a second.
You can simplify the code and save some unnecessary copying by using the byeteorder crate. It defines an extension trait for all implementors of Read, which allows you to directly call read_i16_into() on the file object.
use byteorder::{LittleEndian, ReadBytesExt};
use std::fs::File;
let mut f = File::open("binary_file_path").expect("no file found");
let mut result = vec![0i16; 15000 * 707 * 2];
f.read_i16_into::<LittleEndian>(&mut result).unwrap();
cargo build --release improved the performance

Resources