Add or increment value in HashMap in Rust - rust

I am new to Rust and recently started learning. I wrote a simple program which does the following.
Read a text file
Split the words and store them in a HashMap with their occurrence count
Iterate over the map and print the word and occurrence count
use std::io;
use std::env;
use std::collections::HashMap;
use std::fs;
fn main() {
let path = env::args().nth(1).unwrap();
let contents = read_file(path);
let map = count_words(contents);
print(map);
}
fn read_file(path: String) -> (String){
let contents = fs::read_to_string(path).unwrap();
return contents;
}
fn count_words(contents: String) -> (HashMap<String, i32>){
let split = contents.split(&[' ', '\n'][..]);
let mut map = HashMap::new();
for w in split{
if map.get(w) == None{
map.insert(w.to_owned(), 1);
}
let count = map.get(w);
map.insert(w.to_owned(), count + 1); // error here
}
return map;
}
fn print(map: HashMap<String, i32>){
println!("Occurences..");
for (key, value) in map.iter() {
println!("{key}: {value}");
}
}
I am able to read the file and add the words into a HashMap and print. However, while trying to add or increment, I get below error.
error[E0369]: cannot add {integer} to Option<&{integer}> -->
src\main.rs:27:40 | 27 | map.insert(w.to_owned(), count +
1); | ----- ^ - {integer} |
| | Option<&{integer}>
I know this approach should work in other languages like Java, C# etc. but unsure about how it should work in Rust.

In this block:
if map.get(w) == None{
map.insert(w.to_owned(), 1);
}
let count = map.get(w);
map.insert(w.to_owned(), count + 1); // error here
map.get(w) gives you an Option<w> (doc);
You seem to know this to some extent, as you check if it's a None earlier - the other possibility is that it is what you want - a Some(w) (not just w);
You cannot add an int to a Some, as the compiler tells you - you have to take "the w" (the count integer) out of the Some.
You can unwrap (doc), though it is not advisable.
You can use pattern matching:
match map.get(&w) {
Some(count) => { map.insert(w, count + 1); }
None => { map.insert(w, 1); }
}
Or, written differently:
if let Some(count) = map.get(&w) {
map.insert(w, count + 1);
} else {
map.insert(w, 1);
};
Or, better yet, you can use the Entry API, which turns it into a simple one liner:
*map.entry(w.to_owned()).or_default() += 1;

Related

How to return moved value as `Err` in loop using question mark operator?

In the code below, version 1 does not compile, whereas version 2 does.
fn foo(text: String) -> Result<u32, String> {
let mut acc: u32 = 0;
for string in text.split("_") {
let result: Option<u32> = string.parse::<u32>().ok();
// version 1
let x: u32 = result.ok_or(text)?;
acc += x;
// version 2
// if let Some(x) = result {
// acc += x;
// } else {
// return Err(text)
// }
}
Ok(acc)
}
error[E0382]: use of moved value: `text`
--> src/main.rs:102:35
|
96 | fn foo(text: String) -> Result<u32, String> {
| ---- move occurs because `text` has type `String`, which does not implement the `Copy` trait
...
102 | let x: u32 = result.ok_or(text)?;
| ^^^^ value moved here, in previous iteration of loop
The issue is that I'm moving text into another function (ok_or) on each loop iteration.
So I understand the error message, but is there still a way out to use the shorthand ? notation in this case? The version 2 is the shortest I could get but it still seems too verbose.
(this is just a MWE / toy example, my question is not about summing numbers in a string)
If you cannot afford a copy on the Err case, and you have a lot of places like that, you can use a macro:
macro_rules! try_opt {
( $v:expr, $e:expr $(,)? ) => {
match $v {
Some(v) => v,
None => return Err($e),
}
}
}
let x: u32 = try_opt!(result, text);
If you can afford a string clone in the Err case (not the happy path), you can just take a reference, as String implements From<&String>:
let x: u32 = result.ok_or(&text)?;

Rust - Use threads to iterate over a vector

My program creates a grid of numbers, then based on the sum of the numbers surrounding each on the grid the number will change in a set way. I'm using two vectors currently, filling the first with random numbers, calculating the changes, then putting the new values in the second vector. After the new values go into the second vector I then push them back into the first vector before going through the next loop. The error I get currently is:
error[E0499]: cannot borrow `grid_a` as mutable more than once at a time
--> src\main.rs:40:29
|
38 | thread::scope(|s| {
| - has type `&crossbeam::thread::Scope<'1>`
39 | for j in 0..grid_b.len() {
40 | s.spawn(|_| {
| - ^^^ `grid_a` was mutably borrowed here in the previous iteration of the loop
| _____________________|
| |
41 | | grid_a[j] = grid_b[j];
| | ------ borrows occur due to use of `grid_a` in closure
42 | | });
| |______________________- argument requires that `grid_a` is borrowed for `'1`
My current code is below. I'm way more familiar with C++ and C#, in the process of trying to learn Rust for this assignment. If I remove the thread everything compiles and runs properly. I'm not understanding how to avoid the multiple borrow. Ideally I'd like to use a separate thread::scope on with the for loop above the existing thread::scope as well.
use crossbeam::thread;
use rand::Rng;
use std::sync::{Arc, Mutex};
use std::thread::sleep;
use std::time::Duration;
use std::time::Instant;
static NUMROWS: i32 = 4;
static NUMCOLUMNS: i32 = 7;
static GRIDSIZE: i32 = NUMROWS * NUMCOLUMNS;
static PLUSNC: i32 = NUMCOLUMNS + 1;
static MINUSNC: i32 = NUMCOLUMNS - 1;
static NUMLOOP: i32 = 7;
static HIGH: u32 = 35;
fn main() {
let start = Instant::now();
let length = usize::try_from(GRIDSIZE).unwrap();
let total_checks = Arc::new(Mutex::new(0));
let mut grid_a = Vec::<u32>::with_capacity(length);
let mut grid_b = Vec::<u32>::with_capacity(length);
grid_a = fill_grid();
for h in 1..=NUMLOOP {
println!("-- {} --", h);
print_grid(&grid_a);
if h != NUMLOOP {
for i in 0..grid_a.len() {
let mut total_checks = total_checks.lock().unwrap();
grid_b[i] = checker(&grid_a, i.try_into().unwrap());
*total_checks += 1;
}
grid_a.clear();
thread::scope(|s| {
for j in 0..grid_b.len() {
s.spawn(|_| {
grid_a[j] = grid_b[j];
});
}
})
.unwrap();
grid_b.clear();
}
}
When you access a vector (or any slice) via index you're borrowing the whole vector.
You can use iterators which can give you mutable references to all the items in parallel.
use crossbeam::thread;
static NUMROWS: i32 = 4;
static NUMCOLUMNS: i32 = 7;
static GRIDSIZE: i32 = NUMROWS * NUMCOLUMNS;
static NUMLOOP: i32 = 7;
fn fill_grid() -> Vec<u32> {
(0..GRIDSIZE as u32).into_iter().collect()
}
fn main() {
let length = usize::try_from(GRIDSIZE).unwrap();
// just do this since else we create a vec and throw it away immediately
let mut grid_a = fill_grid();
let mut grid_b = Vec::<u32>::with_capacity(length);
for h in 1..=NUMLOOP {
println!("-- {} --", h);
println!("{grid_a:?}");
if h != NUMLOOP {
// removed a bunch of unrelated code
for i in 0..grid_a.len() {
grid_b.push(grid_a[i]);
}
// since we overwrite grid_a anyways we don't need to clear it.
// it would give us headaches anyways since grid_a[j] on an empty
// Vec panics.
// grid_a.clear();
thread::scope(|s| {
// instead of accessing the element via index we iterate over
// mutable references so we don't have to borrow the whole
// vector inside the thread
for (pa, b) in grid_a.iter_mut().zip(grid_b.iter().copied()) {
s.spawn(move |_| {
*pa = b + 1;
});
}
})
.unwrap();
grid_b.clear();
}
}
} // add missing }

How do I tackle lifetimes in Rust?

I am having issues with the concept of lifetimes in rust. I am trying to use the crate bgpkit_parser to read in a bz2 file via url link and then create a radix trie.
One field extracted from the file is the AS Path which I have named path in my code within the build_routetable function. I am having trouble as to why rust does not like let origin = clean_path.last() which takes the last element in the vector.
fn as_parser(element: &BgpElem) -> Vec<u32> {
let x = &element.as_path.as_ref().unwrap().segments[0];
let mut as_vec = &Vec::new();
let mut as_path: Vec<u32> = Vec::new();
if let AsPathSegment::AsSequence(value) = x {
as_vec = value;
}
for i in as_vec {
as_path.push(i.asn);
}
return as_path;
}
fn prefix_parser(element: &BgpElem) -> String {
let subnet_id = element.prefix.prefix.ip().to_string().to_owned();
let prefix_id = element.prefix.prefix.prefix().to_string().to_owned();
let prefix = format!("{}/{}", subnet_id, prefix_id);//.as_str();
return prefix;
}
fn get_aspath(raw_aspath: Vec<u32>) -> Vec<u32> {
let mut as_path = Vec::new();
for i in raw_aspath {
if i < 64511 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
else if 65535 < i && i < 4000000000 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
}
return as_path;
}
fn build_routetable(mut trie4: Trie<String, Option<&u32>>, mut trie6: Trie<String, Option<&u32>>) {
let url: &str = "http://archive.routeviews.org/route-views.chile/\
bgpdata/2022.06/RIBS/rib.20220601.0000.bz2";
let parser = BgpkitParser::new(url).unwrap();
let mut count = 0;
for elem in parser {
if elem.elem_type == bgpkit_parser::ElemType::ANNOUNCE {
let record_timestamp = &elem.timestamp;
let record_type = "A";
let peer = &elem.peer_ip;
let prefix = prefix_parser(&elem);
let path = as_parser(&elem);
let clean_path = get_aspath(path);
// Issue is on the below line
// `clean_path` does not live long enough
// borrowed value does not live long
// enough rustc E0597
// main.rs(103, 9): `clean_path` dropped
// here while still borrowed
// main.rs(77, 91): let's call the
// lifetime of this reference `'1`
// main.rs(92, 17): argument requires
// that `clean_path` is borrowed for `'1`
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
else {
trie4.insert(prefix, origin);
}
count+=1;
if count >= 10000 {
println!("{:?} | {:?} | {:?} | {:?} | {:?}",
record_type, record_timestamp, peer, prefix, path);
count=0
}
};
}
println!("Trie4 size: {:?} prefixes", trie4.len());
println!("Trie6 size: {:?} prefixes", trie6.len());
}
Short answer: you're "inserting" a reference. But what's being referenced doesn't outlive what it's being inserted into.
Longer: The hint is your trie4 argument, the signature of which is this:
mut trie4: Trie<String, Option<&u32>>
So that lives beyond the length of the loop where things are declared. This is all in the loop:
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
While origin is a Vec<u32> and that's fine, the insert method is no doubt taking a String and either an Option<&u32> or a &u32. Obviously a key/value pair. But here's your problem: the value has to live as long as the collection, but your value is the last element contained in the Vec<u32>, which goes away! So you can't put something into it that will not live as long as the "container" object! Rust has just saved you from dangling references (just like it's supposed to).
Basically, your containers should be Trie<String, Option<u32>> without the reference, and then this'll all just work fine. Your problem is that the elements are references, and not just contained regular values, and given the size of what you're containing, it's actually smaller to contain a u32 than a reference (pointer size (though actually, it'll likely be the same either way, because alignment issues)).
Also of note: trie4 and trie6 will both be gone at the end of this function call, because they were moved into this function (not references or mutable references). I hope that's what you want.

Possible to combine assignment and comparison in an expression?

In C, it's common to assign and compare in a single expression:
n = n_init;
do {
func(n);
} while ((n = n.next) != n_init);
As I understand it this can be expressed in Rust as:
n = n_init;
loop {
func(n);
n = n.next;
if n == n_init {
break;
}
}
Which works the same as the C version (assuming the body of the loop doesn't use continue).
Is there a more terse way to express this in Rust, or is the example above ideal?
For the purposes of this question, assume ownership or satisfying the borrow checker isn't an issue. It's up to developer to satisfy these requirements.
For example, as an integer:
n = n_init;
loop {
func(&vec[n]);
n = vec[n].next;
if n == n_init {
break;
}
}
This may seem obvious that the Rust example is idiomatic Rust - however I'm looking to move quite a lot of this style of loop to Rust, I'm interested to know if there is some better/different way to express it.
The idiomatic way to represent iteration in Rust is to use an Iterator. Thus you would implement an iterator that does the n = n.next and then use a for loop to iterate over the iterator.
struct MyIter<'a> {
pos: &'a MyData,
start: &'a MyData,
}
impl<'a> Iterator for MyIter<'a> {
type Item = &'a MyData;
fn next(&mut self) -> Option<&'a MyData> {
if self.pos as *const _ == self.start as *const _ {
None
} else {
let pos = self.pos;
self.pos = self.pos.next;
Some(pos)
}
}
}
it is left as an exercise to the reader to adapt this iterator to be able to start from the first element instead of starting from the second.
Rust supports pattern matching in if and while:
instead of having a boolean condition, the test is considered successful if the pattern matches
as part of pattern matching, you bind the values matched to names
Thus, if instead of having a boolean condition you were building an Option...
fn check(next: *mut Node, init: *mut Node) -> Option<*mut Node>;
let mut n = n_init;
loop {
func(n);
if let Some(x) = check(n.next, n_init) {
n = x;
} else {
break;
}
}
However, if you can use an Iterator instead you'll be much more idiomatic.
An assignment in Rust returns the empty tuple. If you are fine with non-idiomatic code you can compare the assignment-result with such an empty tuple and use a logical conjunction to chain your actual loop condition.
let mut current = 3;
let mut parent;
while (parent = get_parent(current)) == () && parent != current {
println!("currently {}, parent is {}", current, parent);
current = parent;
}
// example function
fn get_parent(x: usize) -> usize {
if x > 0 { x - 1 } else { x }
}
// currently 3, parent is 2
// currently 2, parent is 1
// currently 1, parent is 0
This has the disadvantage that entering the loop needs to run logic (which you can avoid with C's do {..} while(); style loops).
You can use this approach inside a do-while macro, but readability isn't that great and at that point a refactoring might be preferable. In any case, this is how it could look:
do_it!({
println!("{}", n);
} while (n = n + 1) == () && n < 4);
This is the code for the macro:
macro_rules! do_it {
($b: block while $e:expr) => {
loop {
$b
if !($e) { break };
}
}
}

How do I return a struct or anything more complicated than a primitive?

I've been tinkering with Rust and I'm a little confused with function return types. As an experiment I'm writing an IRC log parser. I'm familiar with the primitive types, and having functions return those. What about more complex types when returning multiple pieces of data?
/* Log line example from log.txt */
/* [17:35] <#botname> name1 [460/702] has challenged name2 [224/739] and taken them in combat! */
#[derive(Show)]
struct Challenger {
challenger: String,
defender: String
}
fn main() {
let path = Path::new("log.txt");
let mut file = BufferedReader::new(File::open(&path));
for line in file.lines() {
let mut unwrapped_line = line.unwrap();
let mut chal = challenges3(unwrapped_line);
println!("Challenger: {}", chal.challenger);
println!("Defender: {}", chal.defender);
}
}
fn challenges3(text: String)-> Challenger {
let s: String = text;
let split: Vec<&str> = s.as_slice().split(' ').collect();
if(split[4] == "has" && split[5] == "challenged") {
let mychallenger = Challenger { challenger: split[2].to_string(), defender: split[6].to_string()};
return mychallenger;
}
}
I realize this code isn't very idiomatic, I'm getting familiar with the language.
I get an error with this code:
"mismatched types: expected `Challenger`, found `()` (expected struct Challenger, found ())"
How can I return a Struct or a HashMap? Is there a better way to return multiple fields of data?
The if in challenges3 has no else block, so if the condition isn't met, execution continues after the if block. There's nothing there, so the function implicitly returns () at this point. You must also return a Challenger after the if block, or panic! to abort the program.
Alternatively, you could change the return type of your function to Option<Challenger>. Return Some(mychallenger) in the if block, and None after the if block:
fn challenges3(text: String) -> Option<Challenger> {
let s: String = text;
let split: Vec<&str> = s.as_slice().split(' ').collect();
if split[4] == "has" && split[5] == "challenged" {
let mychallenger = Challenger { challenger: split[2].to_string(), defender: split[6].to_string()};
return Some(mychallenger);
}
None
}
You can also use Result instead of Option if you want to return some information about the error.

Resources