First try at Rust for logfile analysis - rust

I often perform the same analysis on log files. Initially, I had a small Awk script used with grep and sort. For fun, I rewrote it to Python:
#!/usr/bin/python3
import sys
months = { "Jan": 1, "Feb": 2, "Mar": 3, "Apr": 4, "May": 5, "Jun": 6,
"Jul": 7, "Aug": 8, "Sep": 9, "Oct": 10, "Nov": 11, "Dec": 12 }
months_r = { v:k for k,v in months.items() }
totals = {}
for line in sys.stdin:
if "redis" in line and "Partial" in line:
f1, f2 = line.split()[:2]
w = (months[f1], int(f2))
totals[w] = totals.get(w, 0) + 1
for k in sorted(totals.keys()):
print(months_r[k[0]], k[1], totals[k])
and then to Go (to avoid being too long, I will not quote the Go version here (68 lines)).
I am now trying to express it in Rust (I had only written some toy examples so far) and I am quite stuck. At first, I had tons of errors, and it is getting a bit better, but now I have one that I can not fix...
Could someone give a hand on how to express this in Rust? The Python version is quite short, so it would be nice to have something quite idiomatic and not too verbose.
Here is where I am so far, but I can not make any further progress.
use std::array::IntoIter;
use std::collections::HashMap;
use std::io;
use std::io::prelude::*;
use std::iter::FromIterator;
fn main() {
let m = HashMap::<_, _>::from_iter(IntoIter::new([
("Jan", 1),
("Feb", 2),
("Mar", 3),
("Apr", 4),
("May", 5),
("Jun", 6),
("Jul", 7),
("Aug", 8),
("Sep", 9),
("Oct", 10),
("Nov", 11),
("Dec", 12),
]));
let mut totals = HashMap::new();
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap();
let count = totals.entry((m.get(&f1).unwrap(), f2)).or_insert(0);
*count += 1;
}
}
}
Simple hints would be much appreciated, I am not asking for a full working solution, which is more work (but I will welcome one if it comes, of course).
Thanks a lot!

The problem is that your f2 refers to data owned by the current line string ul. However, ul is dropped at each iteration of the loop and a new one is allocated by the lines() iterator. If you were to insert a slice referring to ul into the totals hashmap, the slice would get invalidated in the next iteration of the loop and the program would crash or malfunction when you later tried to access the deallocated data.
split_whitespace() behaves like that for efficiency: when calling it, you often only need to inspect the strings, and it would be a waste to return freshly allocated copies of everything. Instead, ul.split_whitespace() gives out cheap slices which are effectively views into ul, allowing you to choose whether or not to copy the returned string slices.
The solution is simple, just create an owned string from the returned slice using to_string(). For example, this compiles:
fn main() {
let m = HashMap::<_, _>::from_iter(IntoIter::new([
("Jan", 1),
// ...
]));
let mut totals = HashMap::new();
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap().to_string();
*totals.entry((m.get(f1).unwrap(), f2)).or_insert(0) += 1;
}
}
}
An even simpler option is to do what your Python code does and the Rust translation doesn't, which is parse f2 as integer. The integer can be copied by value into the map and you're no longer required to allocate a copy of f2:
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap();
let w = (m.get(f1).unwrap(), f2.parse::<u32>().unwrap());
*totals.entry(w).or_insert(0) += 1;
}
}
Finally, the sorting and printing should be achieved with a reasonably straightforward translation of the original Python:
let months_r: HashMap<_, _> = m.iter().map(|(k, v)| (v, k)).collect();
let mut totals_keys: Vec<_> = totals.keys().collect();
totals_keys.sort();
for k in totals_keys {
println!(
"{} {} {}", months_r.get(k.0).unwrap(),
k.1, totals.get(k).unwrap()
);
}
Playground
All in all, 48 lines with the stock rustfmt - longer than Python, but still shorter than Go.

Related

How to sum elements of Vec<Vec<f64>> together into a Vec<f64>?

I am looking for an "rusty" way to accumulate a Vec<Vec> into a Vec such that the 1st element of every inner Vec is summed together, every 2nd element of each Vec is summed together, etc..., and the results are collected into a Vec? If I just use sum(), fold(), or accumulate() I believe I will sum entire 1st Vec together into a single element, rather than the 1st element of each inner Vec contained in the 2D Vec.
pub fn main() {
let v1 = vec![1.1, 2.2, 3.3];
let vv = vec![v1; 3];
let desired_result = vec![3.3, 6.6, 9.9];
}
Sometimes it's easy to forget in Rust that the imperative approach exists and is an easy solution.
let mut sums = vec![0.0; vv[0].len()];
for v in vv {
for (i, x) in v.into_iter().enumerate() {
sums[i] += x;
}
}
While I prefer #orlp's solution, if you're hell-bent on doing this the most functionally possible, you could do it like this:
let v1 = vec![1.1, 2.2, 3.3];
let vv = vec![v1; 3];
let sums = vec![0.0; vv[0].len()];
let summed = vv.into_iter().fold(sums, |mut sums, v| {
v.into_iter().enumerate().for_each(|(i, x)| sums[i] += x);
sums
});
Also if knowing beforehand the size of the inner vectors (or taking it from the first occurence in the vv vector), you can use a range iterator:
pub fn main() {
let v1 = vec![1.1, 2.2, 3.3];
let v1_len = v1.len();
let vv = vec![v1; 3];
let res: Vec<f64> = (0..v1_len)
.map(|i| vv.iter().map(|v| v.get(i).unwrap()).sum())
.collect();
println!("{res:?}");
}
Playground

Parallelizing nested loops in rust with rayon

I am trying to parallelize simple nested for loop in Rust with rayon but am unable to:
fn repulsion_force(object: &mut Vec<Node>) {
let v0 = 500.0;
let dx = 0.1;
for i in 0..object.len() {
for j in i + 1..object.len() {
let dir = object[j].position - object[i].position;
let l = dir.length();
let mi = object[i].mass;
let mj = object[j].mass;
let c = (dx / l).powi(13);
let v = dir.normalize() * 3.0 * (v0 / dx) * c;
object[i].current_acceleration -= v / mi;
object[j].current_acceleration += v / mj;
}
}
}
Tried to follow this post and created this:
use rayon::prelude::*;
object.par_iter_mut()
.enumerate()
.zip(object.par_iter_mut().enumerate())
.for_each(|((i, a), (j, b))| {
if j > i {
// code here
}
});
cannot borrow *object as mutable more than once at a time
second mutable borrow occurs here
But it didn't work. My problem is a bit different than one in the post because I modify two elements in one iteration and trying to borrow them both as mutable which Rust does not like, while I don't like idea of doing double the calculations when its not necessary.
Another try was to iterate through Range:
use rayon::prelude::*;
let length = object.len();
(0..length).par_bridge().for_each(|i| {
(i+1..length).for_each(|j| {
let dir = object[j].position - object[i].position;
let l = dir.length();
let mi = object[i].mass;
let mj = object[j].mass;
let c = (dx / l).powi(13);
let v = dir.normalize() * 3.0 * (v0 / dx) * c;
object[i].current_acceleration -= v / mi;
object[j].current_acceleration += v / mj;
});
cannot borrow object as mutable, as it is a captured variable in a Fn closure
This one I honestly don't understand at all, and E0596 isn't much help - my object is a &mut. New to Rust and would appreciate any help!
What you're trying to do is not as trivial as you might imagine :D
But let's give it a shot!
First, let's make a minimal reproducible example, - this is the common way to ask questions on stackoverflow. As you can imagine, we don't know what your code should do. Nor do we have the time to try and figure it out.
We would like to get a simple code piece, which fully describes the problem, copy-paste it, run it and derive a solution.
So here's my minimal example:
#[derive(Debug)]
pub struct Node {
value: i32,
other_value: i32,
}
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
for j in i + 1..object.len() {
let mi = 2 * object[i].value;
let mj = mi + object[j].value;
object[i].other_value -= mi;
object[j].other_value += mj;
}
}
}
Firstly i've created a simple node type. Secondly, i've simplified the operations.
Note that instead of passing a vector, i'm passing a mutable slice. This form retains more flexibility, in case I migth need to pass a slice form an array for exmaple. Since you're not using push(), there's no need to reference a vector.
So next let's reformulate the problem for parallel computation.
First consider the structure of your loops and access pattern.
Your're iterating over all the elements in the slice, but for each i iteration, you're only modifying the object at [i] and [j > i].
so let's split the slice according to that pattern
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
let (left, right) = object.split_at_mut(i + 1);
let mut node_i = &mut left[i];
right.iter_mut().for_each(|node_j| {
let mi = 2 * node_i.value;
let mj = mi + node_j.value;
node_i.other_value -= mi;
node_j.other_value += mj;
});
}
}
By splitting the slice we are getting two slices. The left slice contains [i],
the right slice contains [j > i]. next we rely on an iterator instead of indices for the iteration.
The next step would be to make the internal loop parallel. However, the internal loop modifies node_i at each iteration. That means more than one thread might try to write to node_i at the same time, causing a data race. As such the compiler won't allow it. The solution is to include a synchronization mechanism.
For a general type, that might be a mutex. But since you're using standard mathematical operations i've opted for an atomic, as these are usually faster.
So we modifiy the Node type and the internal loop to
#[derive(Debug)]
pub struct Node {
value: i32,
other_value: AtomicI32,
}
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
let (left, right) = object.split_at_mut(i + 1);
let mut node_i = &mut left[i];
right.iter_mut().par_bridge().for_each(|node_j| {
let mi = 2 * node_i.value;
let mj = mi + node_j.value;
node_i.other_value.fetch_sub(mi, Relaxed);
node_j.other_value.fetch_add(mj, Relaxed);
});
}
}
you can test the code with the snippet
fn main() {
// some arbitrary object vector
let mut object: Vec<Node> = (0..100).map(|k| Node { value: k, other_value: AtomicI32::new(k) }).collect();
repulsion_force(&mut object);
println!("{:?}", object);
}
Hope this help! ;)

How do I get the key associated with the maximum value of a Rust HashMap?

For example, given the data:
2 : 4
1 : 3
5 : 2
The function would return 2 since its value (4) is the highest.
I am doing:
let mut max_val = 0;
let mut max_key = "";
for (k, v) in a_hash_map.iter() {
if *v > max_val {
max_key = k;
max_val = *v;
}
}
Is there a nicer or quicker or simpler way to do this?
Iterate through all the key-value pairs in the hashmap, comparing them by the values, keeping only the key of the maximum:
use std::collections::HashMap;
fn example<K, V>(a_hash_map: &HashMap<K, V>) -> Option<&K>
where
V: Ord,
{
a_hash_map
.iter()
.max_by(|a, b| a.1.cmp(&b.1))
.map(|(k, _v)| k)
}
fn main() {
let map: HashMap<_, _> = vec![(2, 4), (1, 3), (5, 2)].into_iter().collect();
dbg!(example(&map));
}
See also:
How do I create a map from a list in a functional way?
How can min_by_key or max_by_key be used with references to a value created during iteration?
let key_with_max_value = a_hashmap.iter().max_by_key(|entry | entry.1).unwrap();
dbg!(key_with_max_value.0);
You will need to do better error handling. This code just does an unwrap, expecting that there would be at least one element.
perhaps you can have a try with this: if let Some(max) = a_hash_map.keys().max(){println!("max:{}", max);}

Is there any way to insert multiple entries into a HashMap at once in Rust?

Is there any way to insert multiple entries into a HashMap at once in Rust? Or to initialize it with multiple entries? Anything other than manually calling insert on every single element you're inserting?
Edit for an example using English letter frequencies:
I basically want:
let frequencies = {
'a': 0.08167,
'b': 0.01492,
...
'z': 0.00074
}
I know I can achieve the same result by doing a for loop like the following, but I want to know if there is a way to do this without creating additional arrays and then looping over them, or a more elegant solution in general.
let mut frequencies = HashMap::new();
let letters = ['a','b','c', ...... 'z'];
let freqs = [0.08167, 0.01492, 0.02782, ......., 0.00074];
for i in 0..26 {
frequencies.insert(letters[i], freqs[i]);
}
For a literal, I could use the answer here, which will probably work fine for this example, but I'm curious whether there's a way to do this without it being a literal, in case this comes up in the future.
Is there any way to insert multiple entries into a HashMap at once in Rust?
Yes, you can extend a HashMap with values from an Iterator, like this:
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.extend((1..3).map(|n| (format!("{}*2=", n), n * 2)));
map.extend((7..9).map(|n| (format!("{}*2=", n), n * 2)));
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
It is even a bit faster than calling the insert manually, because extend uses the size hint provided by the Iterator in order to reserve some space beforehand.
Check out the source code of the method here, in map.rs.
Or to initialize it with multiple entries?
This is possible as well, thanks to HashMap implementing the FromIterator trait. When a collection implements FromIterator, you can use the Iterator::collect shorthand to construct it. Consider the following examples, all of them generating the same map:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<_, _> = (1..3).map(|n| (format!("{}*2=", n), n * 2)).collect();
map.extend((7..9).map(|n| (format!("{}*2=", n), n * 2)));
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
use std::collections::HashMap;
fn main() {
let map: HashMap<_, _> = (1..3)
.chain(7..9)
.map(|n| (format!("{}*2=", n), n * 2))
.collect();
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
use std::collections::HashMap;
use std::iter::FromIterator;
fn main() {
let iter = (1..3).chain(7..9).map(|n| (format!("{}*2=", n), n * 2));
let map = HashMap::<String, u32>::from_iter(iter);
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
use std::collections::HashMap;
fn main() {
let pairs = [
("a", 1),
("b", 2),
("c", 3),
("z", 50),
];
println!("1. Insert multiple entries into a HashMap at once");
let mut map = HashMap::new();
map.extend(pairs);
println!("map: {map:#?}\n");
println!("2. Initialize with multiple entries");
let map = HashMap::from([
("a", 1),
("b", 2),
("c", 3),
("z", 50),
]);
println!("map: {map:#?}\n");
println!("3. Initialize with multiple entries");
let map = HashMap::from(pairs);
println!("map: {map:#?}\n");
println!("4. Initialize with multiple entries");
let map: HashMap<_, _> = pairs.into();
println!("map: {map:#?}");
}
See the Rust Playground.

How do I concatenate two slices in Rust?

I want to take the x first and last elements from a vector and concatenate them. I have the following code:
fn main() {
let v = (0u64 .. 10).collect::<Vec<_>>();
let l = v.len();
vec![v.iter().take(3), v.iter().skip(l-3)];
}
This gives me the error
error[E0308]: mismatched types
--> <anon>:4:28
|
4 | vec![v.iter().take(3), v.iter().skip(l-3)];
| ^^^^^^^^^^^^^^^^^^ expected struct `std::iter::Take`, found struct `std::iter::Skip`
<anon>:4:5: 4:48 note: in this expansion of vec! (defined in <std macros>)
|
= note: expected type `std::iter::Take<std::slice::Iter<'_, u64>>`
= note: found type `std::iter::Skip<std::slice::Iter<'_, u64>>`
How do I get my vec of 1, 2, 3, 8, 9, 10? I am using Rust 1.12.
Just use .concat() on a slice of slices:
fn main() {
let v = (0u64 .. 10).collect::<Vec<_>>();
let l = v.len();
let first_and_last = [&v[..3], &v[l - 3..]].concat();
println!("{:?}", first_and_last);
// The output is `[0, 1, 2, 7, 8, 9]`
}
This creates a new vector, and it works with arbitrary number of slices.
(Playground link)
Ok, first of all, your initial sequence definition is wrong. You say you want 1, 2, 3, 8, 9, 10 as output, so it should look like:
let v = (1u64 .. 11).collect::<Vec<_>>();
Next, you say you want to concatenate slices, so let's actually use slices:
let head = &v[..3];
let tail = &v[l-3..];
At this point, it's really down to which approach you like the most. You can turn those slices into iterators, chain, then collect...
let v2: Vec<_> = head.iter().chain(tail.iter()).collect();
...or make a vec and extend it with the slices directly...
let mut v3 = vec![];
v3.extend_from_slice(head);
v3.extend_from_slice(tail);
...or extend using more general iterators (which will become equivalent in the future with specialisation, but I don't believe it's as efficient just yet)...
let mut v4: Vec<u64> = vec![];
v4.extend(head);
v4.extend(tail);
...or you could use Vec::with_capacity and push in a loop, or do the chained iterator thing, but using extend... but I have to stop at some point.
Full example code:
fn main() {
let v = (1u64 .. 11).collect::<Vec<_>>();
let l = v.len();
let head = &v[..3];
let tail = &v[l-3..];
println!("head: {:?}", head);
println!("tail: {:?}", tail);
let v2: Vec<_> = head.iter().chain(tail.iter()).collect();
println!("v2: {:?}", v2);
let mut v3 = vec![];
v3.extend_from_slice(head);
v3.extend_from_slice(tail);
println!("v3: {:?}", v3);
// Explicit type to help inference.
let mut v4: Vec<u64> = vec![];
v4.extend(head);
v4.extend(tail);
println!("v4: {:?}", v4);
}
You should collect() the results of the take() and extend() them with the collect()ed results of skip():
let mut p1 = v.iter().take(3).collect::<Vec<_>>();
let p2 = v.iter().skip(l-3);
p1.extend(p2);
println!("{:?}", p1);
Edit: as Neikos said, you don't even need to collect the result of skip(), since extend() accepts arguments implementing IntoIterator (which Skip does, as it is an Iterator).
Edit 2: your numbers are a bit off, though; in order to get 1, 2, 3, 8, 9, 10 you should declare v as follows:
let v = (1u64 .. 11).collect::<Vec<_>>();
Since the Range is left-closed and right-open.

Resources