Use Rayon to chunk a hash map

Use Rayon to chunk a hash map - rust

Is it possible to use Rayon to chunk the data in a HashMap? I see several chunking methods, but they seem to want to work on a slice (or something similar).
use rayon::prelude::*;
use std::collections::HashMap;
use log::info;
fn main() {
let foo = vec![1, 2, 3, 4, 5, 6, 7, 8];
foo.par_chunks(3).for_each(|x| {
info!("x: {:?}", x);
});
let bar = HashMap::<String, String>::default();
bar.par_chunks(3).for_each(|x| {
info!("x: {:?}", x);
});
bar.chunks(3).for_each(|x| {
info!("x: {:?}", x);
});
bar.par_iter().chunks(3).for_each(|x| {
info!("x: {:?}", x);
});
The vec code compiles without error, but all o the HashMap attempts fail with "no method named ..." errors.
Edit: The question about how to use an existing iterator with rayon does not answer this question. This question is how to get an iterator that chunks a hash map.
Answer
The way to chunk a hash map is the following:
use itertools::Itertools;
use std::collections::HashMap;
fn main() {
let mut m: HashMap<usize, usize> = HashMap::default();
for n in 0..100 {
m.insert(n, 2 * n);
}
println!("m: {:?}", m);
let res: HashMap<usize, usize> = (&m)
.into_iter()
.chunks(7)
.into_iter()
.map(|c| c.map(|(a, b)| (a + b, b - a)))
.flatten()
.collect();
println!("M still usable: {}", m.len());
println!("res: {:?}", res);
}

Related

How to use common BTreeMap variable in rust(single thread)

Here is my original simplified code, I want to use a global variable instead of the variables in separate functions. What's the suggestion method in rust?
BTW, I've tried to use global or change to function parameter, both are nightmare for a beginner. Too difficult to solve the lifetime & variable type cast issue.
This simple program is only a single thread tool, so, in C language, it is not necessary the extra mutex.
// version 1
use std::collections::BTreeMap;
// Trying but failed
// let mut guess_number = BTreeMap::new();
// | ^^^ expected item
fn read_csv() {
let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv();
}
To explain how hard it is for a beginner, by pass parameter
// version 2
use std::collections::BTreeMap;
fn read_csv(guess_number: BTreeMap) {
// ^^^^^^^^ expected 2 generic arguments
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
After some effort, try & error to get the possible work type BTreeMap<&str, i32>
// version 3
use std::collections::BTreeMap;
fn read_csv(guess_number: &BTreeMap<&str, i32>) {
// let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
will cause following error
7 | fn read_csv(guess_number: &BTreeMap<&str, i32>) {
| -------------------- help: consider changing this to be a mutable reference: `&mut BTreeMap<&str, i32>`
...
16 | guess_number.insert(vec[0], number);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `guess_number` is a `&` reference, so the data it refers to cannot be borrowed as mutable
The final answer (seems not suggest use global in Rust, so use 'mutable reference').
// version 4
use std::collections::BTreeMap;
fn read_csv(guess_number: &mut BTreeMap<&str, i32>) {
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&mut guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}

This question is not specific to BTreeMaps but for pretty much all data types, such as numbers, strings, vectors, enums, etc.
If you want to pass a variable (value) from one function to another, you can do that in various ways in Rust. Typically you either move the value or you pass a reference to it. Moving is something quite specific to Rust and its ownership model. This is really essential, so if you have serious intentions to learn Rust, I strongly suggest you read the chapter Understanding Ownership from "the book". Don't get discouraged if you don't understand it from one reading. Spend as much time as needed, as you really can't move forward w/o this knowledge.
As for global variables, there are very few situations where they should be used. In Rust using global variables is slightly more difficult, compared to most other languages. This thread is quite useful, although you might find it a bit difficult to comprehend. My advice to a beginner would be to first fully understand the basic concept of moving and passing references.

How can I iterate over a sequence multiple times within a function?

I have a function that I would like to take an argument that can be looped over. However I would like to loop over it twice. I tried using the Iterator trait however I can only iterate over it once because it consumes the struct when iterating.
How should I make it so my function can loop twice? I know I could use values: Vec<usize> however I would like to make it generic over any object that is iterable.
Here's an example of what I would like to do: (Please ignore what the loops are actually doing. In my real code I can't condense the two loops into one.)
fn perform<'a, I>(values: I) -> usize
where
I: Iterator<Item = &'a usize>,
{
// Loop one: This works.
let sum = values.sum::<usize>();
// Loop two: This doesn't work due to `error[E0382]: use of moved value:
// `values``.
let max = values.max().unwrap();
sum * max
}
fn main() {
let v: Vec<usize> = vec![1, 2, 3, 4];
let result = perform(v.iter());
print!("Result: {}", result);
}

You can't iterate over the same iterator twice, because iterators are not guaranteed to be randomly accessible. For example, std::iter::from_fn produces an iterator that is most definitely not randomly accessible.
As #mousetail already mentioned, one way to get around this problem is to expect a Cloneable iterator:
fn perform<'a, I>(values: I) -> usize
where
I: Iterator<Item = &'a usize> + Clone,
{
// Loop one: This works.
let sum = values.clone().sum::<usize>();
// Loop two: This doesn't work due to `error[E0382]: use of moved value:
// `values``.
let max = values.max().unwrap();
sum * max
}
fn main() {
let v: Vec<usize> = vec![1, 2, 3, 4];
let result = perform(v.iter());
println!("Result: {}", result);
}
Result: 40
Although in your specific example, I'd compute both sum and max in the same iteration:
fn perform<'a, I>(values: I) -> usize
where
I: Iterator<Item = &'a usize>,
{
let (sum, max) = values.fold((0, usize::MIN), |(sum, max), &el| {
(sum + el, usize::max(max, el))
});
sum * max
}
fn main() {
let v: Vec<usize> = vec![1, 2, 3, 4];
let result = perform(v.iter());
println!("Result: {}", result);
}
Result: 40

Returning all moved values from thread back to original context in Rust

As far as I learned, in Rust to get access to fields from context to the spawned thread I need to move them (not only borrow them) which is ok.
Let's consider example:
use std::thread;
fn main() {
let v1 = vec![1, 2, 3];
let v2 = vec![4, 5, 6];
let handle = thread::spawn(move || {
println!("{:?}", v1);
println!("{:?}", v2);
(v1, v2)
});
let (v1, v2) = handle.join().unwrap();
println!("{:?}", v1);
println!("{:?}", v2);
}
Here v1 and v2 are moved to thread and if I want to use them again in main thread I need to return them from thread and assign them again using handle.join() (which waits until thread is done which is also nice).
My question: is it possible to somehow return all moved values back to their original fields? I can imagine that there is much more than just two fields I would move and writing down all of them to return and assign them again would look obscure.

If you need to move a lot of variables together, the obvious way to do that is with a struct.
use std::thread;
struct ManyFields {
v1: Vec<i32>,
v2: Vec<i32>,
// ...and many others...
}
fn main() {
let fields = ManyFields {
v1: vec![1, 2, 3],
v2: vec![4, 5, 6],
};
let handle = thread::spawn(move || {
println!("{:?}", fields.v1);
println!("{:?}", fields.v2);
fields
});
let fields = handle.join().unwrap();
println!("{:?}", fields.v1);
println!("{:?}", fields.v2);
// and many others...
}
Depending on exactly why you needed the thread to take ownership, you may be able to avoid that altogether using scoped threads. Scopes introduce an explicit lifetime in which a thread is guaranteed to finish, allowing you borrow values as long as they outlive the scope.
use crossbeam::scope;
fn main() {
let v1 = vec![1, 2, 3];
let v2 = vec![4, 5, 6];
scope(|scope| {
scope.spawn(|_| {
println!("{:?}", v1);
println!("{:?}", v2);
});
})
.unwrap();
println!("{:?}", v1);
println!("{:?}", v2);
}
From rust 1.63 you will be able to do this without a third party crate, as it will be part of std.

You can easily create a helper function to return them automatically. You can even use a macro to avoid the re-assignment:
macro_rules! spawn_with_data {
{
| $( $captured:ident ),* $(,)? | $code:expr
} => {
let ( $( $captured, )* ) = ::std::thread::spawn(move || {
$code;
( $( $captured, )* )
}).join().unwrap();
};
}
fn main() {
let v1 = vec![1, 2, 3];
let v2 = vec![4, 5, 6];
spawn_with_data!(|v1, v2| {
println!("{:?}", v1);
println!("{:?}", v2);
});
println!("{:?}", v1);
println!("{:?}", v2);
}
Playground.
But I agree using scoped threads is better here.

Splitting a UTF-8 string into chunks

I want to split a UTF-8 string into chunks of equal size. I came up with a solution that does exactly that. Now I want to simplify it removing the first collect call if possible. Is there a way to do it?
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻ"
.chars()
.collect::<Vec<char>>()
.chunks(3)
.map(|chunk| chunk.iter().collect::<String>())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
Playground link

You can use chunks() from Itertools.
use itertools::Itertools; // 0.10.1
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻ"
.chars()
.chunks(3)
.into_iter()
.map(|chunk| chunk.collect::<String>())
.collect::<Vec<String>>();
println!("{:?}", strings);
}

This doesn't require Itertools as a dependency and also does not allocate, as it iterates over slices of the original string:
fn chunks(s: &str, length: usize) -> impl Iterator<Item=&str> {
assert!(length > 0);
let mut indices = s.char_indices().map(|(idx, _)| idx).peekable();
std::iter::from_fn(move || {
let start_idx = match indices.next() {
Some(idx) => idx,
None => return None,
};
for _ in 0..length - 1 {
indices.next();
}
let end_idx = match indices.peek() {
Some(idx) => *idx,
None => s.bytes().len(),
};
Some(&s[start_idx..end_idx])
})
}
fn main() {
let strings = chunks("ĄĆĘŁŃÓŚĆŹŻ", 3).collect::<Vec<&str>>();
println!("{:?}", strings);
}

Having considered the problem with graphemes I ended up with the following solution.
I used the unicode-segmentation crate.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻèèèèè"
.graphemes(true)
.collect::<Vec<&str>>()
.chunks(length)
.map(|chunk| chunk.concat())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
I hope some simplifications can still be made.

How can I replace `.unwrap()` with `?` when mapping over an `ndarray::Array`?

I'd like to remove the use of .unwrap() from code which maps over an ndarray::Array and use a Result type for get_data() instead.
extern crate ndarray;
use ndarray::prelude::*;
use std::convert::TryFrom;
use std::error::Error;
fn get_data() -> Array2<usize> {
// In actual code, "a" comes from an external source, and the type
// is predetermined
let a: Array2<i32> = arr2(&[[1, 2, 3], [4, 5, 6]]);
let b: Array2<usize> = a.map(|x| usize::try_from(*x).unwrap());
b
}
fn main() -> Result<(), Box<dyn Error>> {
let a = get_data();
println!("{:?}", a);
Ok(())
}
For Vec, I've found this trick: How do I stop iteration and return an error when Iterator::map returns a Result::Err?.
However, this does not work with Arrays (collect isn't defined, and the semantics don't quite match up, since ndarray::Array defines a block of primitive types, which (AFAIU) can't hold Results).
Is there a nice way to handle this?

A native try_map implementation from ndarray would be ideal. It can short-circuit the computation and return as soon as an error occurs. It is also more composable.
Short of that, nothing wrong with a good old mutable sentinel variable:
extern crate ndarray;
use ndarray::prelude::*;
use std::convert::TryFrom;
use std::error::Error;
use std::num::TryFromIntError;
fn get_data() -> Result<Array2<usize>, TryFromIntError> {
let mut err = None;
let a: Array2<i32> = arr2(&[[1, 2, 3], [4, 5, 6]]);
let b: Array2<usize> = a.map(|&x| {
usize::try_from(x).unwrap_or_else(|e| {
err = Some(e);
Default::default()
})
});
err.map_or(Ok(b), Err)
}
fn main() -> Result<(), Box<dyn Error>> {
let a = get_data()?;
println!("{:?}", a);
Ok(())
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Use Rayon to chunk a hash map - rust

Related

How to use common BTreeMap variable in rust(single thread)

How can I iterate over a sequence multiple times within a function?

Returning all moved values from thread back to original context in Rust

Splitting a UTF-8 string into chunks

How can I replace `.unwrap()` with `?` when mapping over an `ndarray::Array`?

Categories

Resources