How do I clone a vector with fixed capacity in Rust? - rust

By default, when a Vec is cloned, only the minimum capacity needed is allocated for the new Vec:
fn main() {
let mut x = Vec::with_capacity(10);
x.push(1);
x.push(2);
x.push(3);
println!("x capacity: {}", x.capacity()); // 10
let y = x.clone();
println!("y capacity: {}", y.capacity()); // 3
}
What's the most efficient way to clone the Vec if I want to keep the original capacity? Is there a way to allocate a new vector with a capacity of x.capacity() and just memcopy the values from my first vector?

let mut original_vec: Vec<usize> = Vec::with_capacity(10);
original_vec.push(1);
original_vec.push(2);
original_vec.push(3);
let mut target_vec = Vec::with_capacity(original_vec.capacity());
target_vec.extend(&original_vec);

Related

How to use common BTreeMap variable in rust(single thread)

Here is my original simplified code, I want to use a global variable instead of the variables in separate functions. What's the suggestion method in rust?
BTW, I've tried to use global or change to function parameter, both are nightmare for a beginner. Too difficult to solve the lifetime & variable type cast issue.
This simple program is only a single thread tool, so, in C language, it is not necessary the extra mutex.
// version 1
use std::collections::BTreeMap;
// Trying but failed
// let mut guess_number = BTreeMap::new();
// | ^^^ expected item
fn read_csv() {
let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv();
}
To explain how hard it is for a beginner, by pass parameter
// version 2
use std::collections::BTreeMap;
fn read_csv(guess_number: BTreeMap) {
// ^^^^^^^^ expected 2 generic arguments
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
After some effort, try & error to get the possible work type BTreeMap<&str, i32>
// version 3
use std::collections::BTreeMap;
fn read_csv(guess_number: &BTreeMap<&str, i32>) {
// let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
will cause following error
7 | fn read_csv(guess_number: &BTreeMap<&str, i32>) {
| -------------------- help: consider changing this to be a mutable reference: `&mut BTreeMap<&str, i32>`
...
16 | guess_number.insert(vec[0], number);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `guess_number` is a `&` reference, so the data it refers to cannot be borrowed as mutable
The final answer (seems not suggest use global in Rust, so use 'mutable reference').
// version 4
use std::collections::BTreeMap;
fn read_csv(guess_number: &mut BTreeMap<&str, i32>) {
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&mut guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
This question is not specific to BTreeMaps but for pretty much all data types, such as numbers, strings, vectors, enums, etc.
If you want to pass a variable (value) from one function to another, you can do that in various ways in Rust. Typically you either move the value or you pass a reference to it. Moving is something quite specific to Rust and its ownership model. This is really essential, so if you have serious intentions to learn Rust, I strongly suggest you read the chapter Understanding Ownership from "the book". Don't get discouraged if you don't understand it from one reading. Spend as much time as needed, as you really can't move forward w/o this knowledge.
As for global variables, there are very few situations where they should be used. In Rust using global variables is slightly more difficult, compared to most other languages. This thread is quite useful, although you might find it a bit difficult to comprehend. My advice to a beginner would be to first fully understand the basic concept of moving and passing references.

Rust double mut borrow in loops

Looking for a way to push in both Vec<Vec<>> and it's inner Vec<>. I do understand why it fails, but still struggle to find some graceful way to solve it.
fn example() {
let mut vec: Vec<Vec<i32>> = vec![];
vec.push(vec![]);
for i in &mut vec {
i.push(1);
if vec.len() < 10 {
vec.push(vec![]); // second mut borrow
}
}
}
The borrow checker won't allow you to iterate over a vector by reference and modify it during iteration. The reason for that is that modifying the vector can reallocate its storage, which would invalidate the references used for iteration. (And there is also the question of what it means to iterate over a changing vector, do you want to visit the elements added during iteration or just the elements that were present originally.)
The easiest fix that allows you to do what you want is to just iterate the vector using an index:
fn example() {
let mut vec: Vec<Vec<i32>> = vec![];
vec.push(vec![]);
let mut ind = 0;
while ind < vec.len() {
let i = &mut vec[ind];
i.push(1);
if vec.len() < 10 {
vec.push(vec![]);
}
ind += 1;
}
}

Missed optimization based on using a reference to a counter variable

In the following example I'm reading a 2.7GiB file from my desktop, count the bytes and do something with the bytes that prevents optimizing the loop away.
use std::fs::File;
use std::io::Read;
use std::io::{BufReader, ErrorKind};
fn main() {
let file = File::open("/home/seb/Desktop/ubuntu-20.04.1-desktop-amd64.iso")
.expect("Cannot read file.");
let buf = BufReader::new(file);
let mut x = 0u8;
let mut num_bytes = 0usize;
read(buf, &mut num_bytes, &mut x);
print(num_bytes, x);
}
fn read(mut buf: BufReader<File>, num_bytes: &mut usize, x: &mut u8) {
let mut bytes = [0; 512];
loop {
match buf.read(&mut bytes) {
Ok(0) => break,
Ok(n) => {
for i in 0..n {
*num_bytes += n;
*x += bytes[i];
}
}
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
Err(e) => panic!("{:?}", e),
};
}
}
fn print(num_bytes: usize, x: u8) {
println!("{}", num_bytes);
println!("{}", x);
}
I get roughly 20% differences in throughput based on whether num_bytes is passed as a reference to print or by value. Unscientifically testing through time shows consistent differences, always in the same ballpark.
fn print(num_bytes: &usize, x: u8) {
println!("{}", num_bytes);
println!("{}", x);
}
# passing num_bytes by value
cargo build --release && time target/release/test_read
Compiling test_read v0.1.0 (/data/rust_projects/test_read)
Finished release [optimized] target(s) in 0.23s
2785017856
101
target/release/test_read 0,67s user 0,29s system 99% cpu 0,966 total
As soon as a reference to num_bytes is used later, the program is about 20% slower.
fn print(num_bytes: usize, x: u8) {
println!("{}", num_bytes);
println!("{}", x);
}
# passing num_bytes by reference
cargo build --release && time target/release/test_read
Compiling test_read v0.1.0 (/data/rust_projects/test_read)
Finished release [optimized] target(s) in 0.22s
2785017856
101
target/release/test_read 1,00s user 0,26s system 99% cpu 1,258 total
I'm running this on Ubuntu 20.04 and with Rust 1.50.
The example is mostly taken from this question: What is a faster way to iterate through the bytes of a file in Rust?
Is there any explanation for using a reference to a counter variable to cause this?
After narrowing down the example, it turns out that using the reference to the counter variable stops the compiler from optimizing away additions on every loop iteration. by_ref and by_value two benches here have a ten-fold difference in run time, while by_ref_use_length matches by_value:
// benches/bench.rs
pub fn by_ref(buf: &[u8]) {
let mut x = 0u8;
let mut num_bytes = 0usize;
for &byte in buf {
num_bytes += 1;
x += byte;
}
black_box((&num_bytes, x));
}
pub fn by_value(buf: &[u8]) {
let mut x = 0u8;
let mut num_bytes = 0usize;
for &byte in buf {
num_bytes += 1;
x += byte;
}
black_box((num_bytes, x));
}
pub fn by_ref_use_length(buf: &[u8]) {
let mut x = 0u8;
let num_bytes = buf.len();
for &byte in buf {
x += byte;
}
black_box((&num_bytes, x));
}
fn criterion_benchmark(c: &mut Criterion) {
let buf = vec![0; 1024 * 1024 * 64];
c.bench_function("len num_bytes by ref", |b| b.iter(|| by_ref_use_length(&buf)));
c.bench_function("num_bytes by ref", |b| b.iter(|| by_ref(&buf)));
c.bench_function("num_bytes by value", |b| b.iter(|| by_value(&buf)));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
# Cargo.toml
[package]
name = "test_read"
version = "0.1.0"
authors = [""]
edition = "2018"
[dev-dependencies]
criterion = "0.3"
[[bench]]
name = "bench"
harness = false
cargo bench --all
num_bytes by ref time: [723.42 us 726.62 us 729.54 us]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) low mild
len num_bytes by ref time: [696.26 us 699.16 us 701.87 us]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) low mild
num_bytes by value time: [721.68 us 725.48 us 729.55 us]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild
https://rust.godbolt.org/z/zqfn7n shows no instructions related to the counter in the loop when the counter variable is not passed by reference, whereas passing by reference prevents such optimizations. I'm not familiar with reading assembly, but I'm assuming that the length of the input slice is used rather than actually incrementing.
In the original example, the differences were only about 20%, which is probably due to the dependency on IO which restricts the optimization to adding n on every read iteration rather than simply returning the length of the input array. In fact, changing the counter in the loop to a single addition outside the inner loop matches the performance of both implementations:
let mut bytes = [0; 512];
loop {
match buf.read(&mut bytes) {
Ok(0) => break,
Ok(n) => {
// pulling the num_bytes addition out of the inner loop
// seems to imitate the optimization done by the compiler
// when num_bytes is not passed anywhere by reference.
num_bytes += n;
for i in 0..n {
x += bytes[i];
}
},
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
Err(e) => panic!("{:?}", e),
};
}
This should answer what happens, but I have no clue why the optimization depends on a reference being used somewhere downstream.

Why does calling filter on a vector not remove elements from the vector?

I am writing a small program that finds a winner of a marathon.
Everything seems logical until I try to filter the vector for runners that are late for some amount of time. The vector remains same after the filter function, and if use iter_mut() it states type errors.
fn main() {
let mut input_line = String::new();
std::io::stdin().read_line(&mut input_line);
let n = input_line.trim().parse::<u8>().unwrap();
let mut v = Vec::with_capacity(n as usize);
for _ in 0..n {
let mut input_line = String::new();
std::io::stdin().read_line(&mut input_line);
let separated = input_line.trim().split(":").collect::<Vec<_>>();
let hours = separated[0].parse::<u8>().unwrap();
let minutes = separated[1].parse::<u8>().unwrap();
let seconds = separated[2].parse::<u8>().unwrap();
v.push((hours, minutes, seconds));
}
//println!("{:?}", v);
filter_hours(&mut v);
filter_minutes(&mut v);
filter_seconds(&mut v);
println!("{:?}", v[0]);
println!("{:?}", v);
}
fn filter_hours(v: &mut Vec<(u8, u8, u8)>) {
let (mut minimum, _, _) = v[0];
for &i in v.iter() {
let (h, _, _) = i;
if h < minimum {
minimum = h;
}
}
v.iter().filter(|&&(h, _, _)| h == minimum);
}
fn filter_minutes(v: &mut Vec<(u8, u8, u8)>) {
let (_, mut minimum, _) = v[0];
for &i in v.iter() {
let (_, m, _) = i;
if m < minimum {
minimum = m;
}
}
v.iter().filter(|&&(_, m, _)| m == minimum);
}
fn filter_seconds(v: &mut Vec<(u8, u8, u8)>) {
let (_, _, mut minimum) = v[0];
for &i in v.iter() {
let (_, _, s) = i;
if s < minimum {
minimum = s;
}
}
v.iter().filter(|&&(_, _, s)| s == minimum);
}
Note that filter operates on an iterator, not on the vector; it removes elements from the iterator and not from the vector. One way to do what you want is to collect the result of filter into a new vector and replace the old one with it: v = v.iter().filter(whatever).collect(); but this will allocate space for a new vector, copy the elements from the old vector into the new one, then free the old vector.
There is an experimental API, drain_filter, which allows you to modify the vector and remove matching elements in place. However since it is experimental, this API is only available in nightly for the time being.
If you want to keep to stable Rust and avoid the overhead of collect, you will need to remove the elements by hand. Something like this should do it (taken from the drain_filter docs):
let mut i = 0;
while i != vec.len() {
if some_predicate(&mut vec[i]) {
let val = vec.remove(i);
// your code here
} else {
i += 1;
}
}
Iterators do not alter the number of items in the original data structure. Instead, you want to use retain:
fn filter_hours(v: &mut Vec<(u8, u8, u8)>) {
let min = v.iter().map(|&(h, _, _)| h).min().unwrap();
v.retain(|&(h, _, _)| h == min);
}

Can I reset a borrow of a local in a loop?

I have a processing loop that needs a pointer to a large lookup table.
The pointer is unfortunately triply indirected from the source data, so keeping that pointer around for the inner loop is essential for performance.
Is there any way I can tell the borrow checker that I'm "unborrowing" the state variable in the unlikely event I need to modify the state... so I can only re-lookup the slice in the event that the modify_state function triggers?
One solution I thought of was to change data to be a slice reference and do a mem::replace on the struct at the beginning of the function and pull the slice into local scope, then replace it back at the end of the function — but that is very brittle and error prone (as I need to remember to replace the item on every return). Is there another way to accomplish this?
struct DoubleIndirect {
data: [u8; 512 * 512],
lut: [usize; 16384],
lut_index: usize,
}
#[cold]
fn modify_state(s: &mut DoubleIndirect) {
s.lut_index += 63;
s.lut_index %= 16384;
}
fn process(state: &mut DoubleIndirect) -> [u8; 65536] {
let mut ret: [u8; 65536] = [0; 65536];
let mut count = 0;
let mut data_slice = &state.data[state.lut[state.lut_index]..];
for ret_item in ret.iter_mut() {
*ret_item = data_slice[count];
if count % 197 == 196 {
data_slice = &[];
modify_state(state);
data_slice = &state.data[state.lut[state.lut_index]..];
}
count += 1
}
return ret;
}
The simplest way to do this is to ensure the borrows of state are all disjoint:
#[cold]
fn modify_state(lut_index: &mut usize) {
*lut_index += 63;
*lut_index %= 16384;
}
fn process(state: &mut DoubleIndirect) -> [u8; 65536] {
let mut ret: [u8; 65536] = [0; 65536];
let mut count = 0;
let mut lut_index = &mut state.lut_index;
let mut data_slice = &state.data[state.lut[*lut_index]..];
for ret_item in ret.iter_mut() {
*ret_item = data_slice[count];
if count % 197 == 196 {
modify_state(lut_index);
data_slice = &state.data[state.lut[*lut_index]..];
}
count += 1
}
return ret;
}
The problem is basically two things: first, Rust will not look beyond a function's signature to find out what it does. As far as the compiler knows, your call to modify_state could be changing state.data as well, and it can't allow that.
The second problem is that borrows are lexical; the compiler looks at the block of code where the borrow might be used as goes with that. It doesn't (currently) bother to try and reduce the length of borrows to match where they're actually active.
You can also play games with, for example, using std::mem::replace to pull state.data out into a local variable, do your work, then replace it back just before you return.

Resources