Get next value in ordered tree - rust

Given an ordered map (like BTreeMap) and a value in the map, how do you get the "next" larger (or smaller) value in the map?
There are many ordered tree libraries. It would be amazing to get an answer for:
BTreeMap
rudy::rudymap::RudyMap (https://docs.rs/rudy/latest/rudy/)
art_tree (https://docs.rs/art-tree/latest/art_tree/)
judy arrays

You can get the next smaller or larger key of a BTreeMap by using an iterator from .range(..):
use std::ops::Bound;
use std::collections::BTreeMap;
fn main() {
let map: BTreeMap<&str, i32> = [
("a", 1),
("c", 4),
("h", 0),
("m", 2),
("z", 5),
].into_iter().collect();
let mut iter = map.range(.."h");
let next_smaller = iter.next_back();
let mut iter = map.range::<&str, _>((Bound::Excluded("h"), Bound::Unbounded));
let next_larger = iter.next();
println!("next_smaller: {:?}", next_smaller);
println!("next_larger : {:?}", next_larger);
}
next_smaller: Some(("c", 4))
next_larger : Some(("m", 2))
Getting the next larger looks a bit gnarly since there is no "h".. range syntax that excludes "h". You can of course use "h".. if you want to, but you'd have to call .next() an additional time to skip over the known element.

Related

Vector of iterators and reversed iterators in Rust

I have a Vec with iterators over slices. Some of them are in normal order, some are reversed.
let mut output = Vec::new();
let input = b"1234567890";
let cuts = [(0,1), (1,3), (3, 5)];
for (start, end) in cuts{
output.push(input[start..end].iter());
output.push(input[start..end].iter().rev());
}
But I can't compile it, because of
expected struct `std::slice::Iter`, found struct `Rev`
I understand the error, but I wounder, if I can convert both iterators into some common type (without 'collecting' them).
UPD: I even tried to use .iter().rev().rev(), but it's a different type from iter().rev()...
An iterator and a reverse iterator are completely different types and cannot be stored in the same vector directly. They most likely aren't even the same size.
However, they both implement Iterator, of course. So you can store them via indirection, either as trait object references (&dyn Iterator) or as boxed trait objects (Box<dyn Iterator>). Which one depends on your usecase; the first one is with less overhead, but borrowed; the second one has a minimal overhead, but owns the objects.
In your case, as you don't keep the iterator objects around and instead want to store them in the list directly, the proper solution would be to use Box<dyn Iterator>, like this:
fn main() {
let mut output: Vec<Box<dyn Iterator<Item = &u8>>> = Vec::new();
let input = b"1234567890";
let cuts = [(0, 1), (1, 3), (3, 5)];
for (start, end) in cuts {
output.push(Box::new(input[start..end].iter()));
output.push(Box::new(input[start..end].iter().rev()));
}
for iter in output {
println!("{:?}", iter.collect::<Vec<_>>());
}
}
[49]
[49]
[50, 51]
[51, 50]
[52, 53]
[53, 52]
Minor nitpick:
Iterators over &u8 are discouraged, because &u8 are actually larger than u8. As u8 are Copy, adding copied() to the iterator reduces the item size without any cost; most likely even with a performance benefit.
Reason is that returning a &u8 is slower than a u8 (because &u8 is either 4 or 8 byte, while u8 is a single byte). Further, accessing a &u8 has one indirection, while accessing a u8 is very fast.
So I'd rewrite your code like this:
fn main() {
let mut output: Vec<Box<dyn Iterator<Item = u8>>> = Vec::new();
let input = b"1234567890";
let cuts = [(0, 1), (1, 3), (3, 5)];
for (start, end) in cuts {
output.push(Box::new(input[start..end].iter().copied()));
output.push(Box::new(input[start..end].iter().copied().rev()));
}
for iter in output {
println!("{:?}", iter.collect::<Vec<_>>());
}
}
[49]
[49]
[50, 51]
[51, 50]
[52, 53]
[53, 52]
Of course this is only true for &u8 and not for &mut u8; if you want to mutate the original items, copying them is counterproductive.
To convert iterators into a common type you can use dynamic dispatch by storing trait objects into your output vector:
let mut output: Vec<Box<dyn Iterator<Item = _>>> = Vec::new();
let input = b"1234567890";
let cuts = [(0, 1), (1, 3), (3, 5)];
for (start, end) in cuts {
output.push(Box::new(input[start..end].iter()));
output.push(Box::new(input[start..end].iter().rev()));
}
Playground

How to convert Vec<T> to HashMap<T,T>?

I have a vector of strings.I need to convert it to HashMap.
Vector's 0 elements should become a key and 1 element should become a value. The same for 2, 3, and so on.
The obvious solution, just to make a for loop and add them to HashMap one by one. However, it will end up several lines of code. I am curious whether there is a cleaner, one-liner.
I know you can do vec.to_iter().collect(). However, this requires a vector to have tuples (vs a flat vector).
You can use chunks_exact plus a few combinators to achieve this. However, I wouldn't recommend putting this on only one line for readability reasons. This does have a downside, and that is extra elements (if the vector has an odd number of elements) will be discarded.
use std::collections::HashMap;
fn main() {
// vector with elements
let vector = vec!["a", "b", "c", "d", "e", "f"];
let map = vector.chunks_exact(2) // chunks_exact returns an iterator of slices
.map(|chunk| (chunk[0], chunk[1])) // map slices to tuples
.collect::<HashMap<_, _>>(); // collect into a hashmap
// outputs: Map {"e": "f", "c": "d", "a": "b"}
println!("Map {:?}", map);
}
slice::array_chunks is currently unstable but when it's stabilized in the future, I would prefer this over .chunks(2):
#![feature(array_chunks)]
use std::collections::HashMap;
fn main() {
let vec = vec![1, 2, 3, 4, 5, 6, 7];
let map = vec
.array_chunks::<2>()
.map(|[k, v]| (k, v))
.collect::<HashMap<_, _>>();
dbg!(map);
}
Output:
[src/main.rs:11] map = {
1: 2,
3: 4,
5: 6,
}
Playground
Using itertools's tuples:
use itertools::Itertools;
use std::collections::HashMap;
fn main() {
let v: Vec<String> = vec!["key1".into(), "val1".into(), "key2".into(), "val2".into()];
// Extra elements are discarded
let hm: HashMap<String, String> = v.into_iter().tuples().collect();
assert_eq!(hm, HashMap::from([("key1".into(), "val1".into()), ("key2".into(), "val2".into())]));
}

Is it possible to detect collisions when collecting into a HashMap?

I want to detect and warn about collisions in the logs when collecting a IntoIterator into a HashMap. The current Rust behavior of collecting into a HashMap is to silently overwrite the earlier values with the latest one.
fn main() {
let a = vec![(0, 1), (0, 2)];
let b: std::collections::HashMap<_, _> = a.into_iter().collect();
println!("{}", b[&0]);
}
Output:
2
(Playground)
A possible workaround is collecting into a Vec then manually writing the conversion code, but that will introduce extra allocation overhead and unreadable code. Not consuming the original collection and comparing the len()s is less noisy, but still hogs 1x more memory (?) and can't detect where exactly the collision is happening.
Is there a more elegent way of handling HashMap collisions?
Depending on what you want to do in the case of a collision, you might be able to use fold (or try_fold) and the entry API to implement your custom functionality:
use std::collections::HashMap;
fn main() {
let a = vec![(0, 1), (0, 3), (0, 2)];
let b: std::collections::HashMap<_, _> = a.into_iter().fold(HashMap::new(), |mut map, (k,v)| {
map.entry(k)
.and_modify(|_| println!("Collision with {}, {}!", k, v))
.or_insert(v);
map
});
println!("{}", b[&0]);
}
(Playground)

First try at Rust for logfile analysis

I often perform the same analysis on log files. Initially, I had a small Awk script used with grep and sort. For fun, I rewrote it to Python:
#!/usr/bin/python3
import sys
months = { "Jan": 1, "Feb": 2, "Mar": 3, "Apr": 4, "May": 5, "Jun": 6,
"Jul": 7, "Aug": 8, "Sep": 9, "Oct": 10, "Nov": 11, "Dec": 12 }
months_r = { v:k for k,v in months.items() }
totals = {}
for line in sys.stdin:
if "redis" in line and "Partial" in line:
f1, f2 = line.split()[:2]
w = (months[f1], int(f2))
totals[w] = totals.get(w, 0) + 1
for k in sorted(totals.keys()):
print(months_r[k[0]], k[1], totals[k])
and then to Go (to avoid being too long, I will not quote the Go version here (68 lines)).
I am now trying to express it in Rust (I had only written some toy examples so far) and I am quite stuck. At first, I had tons of errors, and it is getting a bit better, but now I have one that I can not fix...
Could someone give a hand on how to express this in Rust? The Python version is quite short, so it would be nice to have something quite idiomatic and not too verbose.
Here is where I am so far, but I can not make any further progress.
use std::array::IntoIter;
use std::collections::HashMap;
use std::io;
use std::io::prelude::*;
use std::iter::FromIterator;
fn main() {
let m = HashMap::<_, _>::from_iter(IntoIter::new([
("Jan", 1),
("Feb", 2),
("Mar", 3),
("Apr", 4),
("May", 5),
("Jun", 6),
("Jul", 7),
("Aug", 8),
("Sep", 9),
("Oct", 10),
("Nov", 11),
("Dec", 12),
]));
let mut totals = HashMap::new();
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap();
let count = totals.entry((m.get(&f1).unwrap(), f2)).or_insert(0);
*count += 1;
}
}
}
Simple hints would be much appreciated, I am not asking for a full working solution, which is more work (but I will welcome one if it comes, of course).
Thanks a lot!
The problem is that your f2 refers to data owned by the current line string ul. However, ul is dropped at each iteration of the loop and a new one is allocated by the lines() iterator. If you were to insert a slice referring to ul into the totals hashmap, the slice would get invalidated in the next iteration of the loop and the program would crash or malfunction when you later tried to access the deallocated data.
split_whitespace() behaves like that for efficiency: when calling it, you often only need to inspect the strings, and it would be a waste to return freshly allocated copies of everything. Instead, ul.split_whitespace() gives out cheap slices which are effectively views into ul, allowing you to choose whether or not to copy the returned string slices.
The solution is simple, just create an owned string from the returned slice using to_string(). For example, this compiles:
fn main() {
let m = HashMap::<_, _>::from_iter(IntoIter::new([
("Jan", 1),
// ...
]));
let mut totals = HashMap::new();
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap().to_string();
*totals.entry((m.get(f1).unwrap(), f2)).or_insert(0) += 1;
}
}
}
An even simpler option is to do what your Python code does and the Rust translation doesn't, which is parse f2 as integer. The integer can be copied by value into the map and you're no longer required to allocate a copy of f2:
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap();
let w = (m.get(f1).unwrap(), f2.parse::<u32>().unwrap());
*totals.entry(w).or_insert(0) += 1;
}
}
Finally, the sorting and printing should be achieved with a reasonably straightforward translation of the original Python:
let months_r: HashMap<_, _> = m.iter().map(|(k, v)| (v, k)).collect();
let mut totals_keys: Vec<_> = totals.keys().collect();
totals_keys.sort();
for k in totals_keys {
println!(
"{} {} {}", months_r.get(k.0).unwrap(),
k.1, totals.get(k).unwrap()
);
}
Playground
All in all, 48 lines with the stock rustfmt - longer than Python, but still shorter than Go.

Is there any way to insert multiple entries into a HashMap at once in Rust?

Is there any way to insert multiple entries into a HashMap at once in Rust? Or to initialize it with multiple entries? Anything other than manually calling insert on every single element you're inserting?
Edit for an example using English letter frequencies:
I basically want:
let frequencies = {
'a': 0.08167,
'b': 0.01492,
...
'z': 0.00074
}
I know I can achieve the same result by doing a for loop like the following, but I want to know if there is a way to do this without creating additional arrays and then looping over them, or a more elegant solution in general.
let mut frequencies = HashMap::new();
let letters = ['a','b','c', ...... 'z'];
let freqs = [0.08167, 0.01492, 0.02782, ......., 0.00074];
for i in 0..26 {
frequencies.insert(letters[i], freqs[i]);
}
For a literal, I could use the answer here, which will probably work fine for this example, but I'm curious whether there's a way to do this without it being a literal, in case this comes up in the future.
Is there any way to insert multiple entries into a HashMap at once in Rust?
Yes, you can extend a HashMap with values from an Iterator, like this:
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.extend((1..3).map(|n| (format!("{}*2=", n), n * 2)));
map.extend((7..9).map(|n| (format!("{}*2=", n), n * 2)));
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
It is even a bit faster than calling the insert manually, because extend uses the size hint provided by the Iterator in order to reserve some space beforehand.
Check out the source code of the method here, in map.rs.
Or to initialize it with multiple entries?
This is possible as well, thanks to HashMap implementing the FromIterator trait. When a collection implements FromIterator, you can use the Iterator::collect shorthand to construct it. Consider the following examples, all of them generating the same map:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<_, _> = (1..3).map(|n| (format!("{}*2=", n), n * 2)).collect();
map.extend((7..9).map(|n| (format!("{}*2=", n), n * 2)));
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
use std::collections::HashMap;
fn main() {
let map: HashMap<_, _> = (1..3)
.chain(7..9)
.map(|n| (format!("{}*2=", n), n * 2))
.collect();
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
use std::collections::HashMap;
use std::iter::FromIterator;
fn main() {
let iter = (1..3).chain(7..9).map(|n| (format!("{}*2=", n), n * 2));
let map = HashMap::<String, u32>::from_iter(iter);
println!("{:?}", map); // Prints {"1*2=": 2, "8*2=": 16, "7*2=": 14, "2*2=": 4}.
}
use std::collections::HashMap;
fn main() {
let pairs = [
("a", 1),
("b", 2),
("c", 3),
("z", 50),
];
println!("1. Insert multiple entries into a HashMap at once");
let mut map = HashMap::new();
map.extend(pairs);
println!("map: {map:#?}\n");
println!("2. Initialize with multiple entries");
let map = HashMap::from([
("a", 1),
("b", 2),
("c", 3),
("z", 50),
]);
println!("map: {map:#?}\n");
println!("3. Initialize with multiple entries");
let map = HashMap::from(pairs);
println!("map: {map:#?}\n");
println!("4. Initialize with multiple entries");
let map: HashMap<_, _> = pairs.into();
println!("map: {map:#?}");
}
See the Rust Playground.

Resources