Does Rust have an equivalent to Python's dictionary comprehension syntax? - rust

How would one translate the following Python, in which several files are read and their contents are used as values to a dictionary (with filename as key), to Rust?
countries = {region: open("{}.txt".format(region)).read() for region in ["canada", "usa", "mexico"]}
My attempt is shown below, but I was wondering if a one-line, idiomatic solution is possible.
use std::{
fs::File,
io::{prelude::*, BufReader},
path::Path,
collections::HashMap,
};
macro_rules! map(
{ $($key:expr => $value:expr),+ } => {
{
let mut m = HashMap::new();
$(
m.insert($key, $value);
)+
m
}
};
);
fn lines_from_file<P>(filename: P) -> Vec<String>
where
P: AsRef<Path>,
{
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let _countries = map!{ "canada" => lines_from_file("canada.txt"),
"usa" => lines_from_file("usa.txt"),
"mexico" => lines_from_file("mexico.txt") };
}

Rust's iterators have map/filter/collect methods which are enough to do anything Python's comprehensions can. You can create a HashMap with collect on an iterator of pairs, but collect can return various types of collections, so you may have to specify the type you want.
For example,
use std::collections::HashMap;
fn main() {
println!(
"{:?}",
(1..5).map(|i| (i + i, i * i)).collect::<HashMap<_, _>>()
);
}
Is roughly equivalent to the Python
print({i+i: i*i for i in range(1, 5)})
But translated very literally, it's actually closer to
from builtins import dict
def main():
print("{!r}".format(dict(map(lambda i: (i+i, i*i), range(1, 5)))))
if __name__ == "__main__":
main()
not that you would ever say it that way in Python.

Python's comprehensions are just sugar for a for loop and accumulator. Rust has macros--you can make any sugar you want.
Take this simple Python example,
print({i+i: i*i for i in range(1, 5)})
You could easily re-write this as a loop and accumulator:
map = {}
for i in range(1, 5):
map[i+i] = i*i
print(map)
You could do it basically the same way in Rust.
use std::collections::HashMap;
fn main() {
let mut hm = HashMap::new();
for i in 1..5 {
hm.insert(i + i, i * i);
}
println!("{:?}", hm);
}
You can use a macro to do the rewriting to this form for you.
use std::collections::HashMap;
macro_rules! hashcomp {
($name:ident = $k:expr => $v:expr; for $i:ident in $itr:expr) => {
let mut $name = HashMap::new();
for $i in $itr {
$name.insert($k, $v);
}
};
}
When you use it, the resulting code is much more compact. And this choice of separator tokens makes it resemble the Python.
fn main() {
hashcomp!(hm = i+i => i*i; for i in 1..5);
println!("{:?}", hm);
}
This is just a basic example that can handle a single loop. Python's comprehensions also can have filters and additional loops, but a more advanced macro could probably do that too.

Without using your own macros I think the closest to
countries = {region: open("{}.txt".format(region)).read() for region in ["canada", "usa", "mexico"]}
in Rust would be
let countries: HashMap<_, _> = ["canada", "usa", "mexico"].iter().map(|&c| {(c,read_to_string(c.to_owned() + ".txt").expect("Error reading file"),)}).collect();
but running a formatter, will make it more readable:
let countries: HashMap<_, _> = ["canada", "usa", "mexico"]
.iter()
.map(|&c| {
(
c,
read_to_string(c.to_owned() + ".txt").expect("Error reading file"),
)
})
.collect();
A few notes:
To map a vector, you need to transform it into an iterator, thus iter().map(...).
To transform an iterator back into a tangible data structure, e.g. a HashMap (dict), use .collect(). This is the advantage and pain of Rust, it is very strict with types, no unexpected conversions.
A complete test program:
use std::collections::HashMap;
use std::fs::{read_to_string, File};
use std::io::Write;
fn create_files() -> std::io::Result<()> {
let regios = [
("canada", "Ottawa"),
("usa", "Washington"),
("mexico", "Mexico city"),
];
for (country, capital) in regios {
let mut file = File::create(country.to_owned() + ".txt")?;
file.write_fmt(format_args!("The capital of {} is {}", country, capital))?;
}
Ok(())
}
fn create_hashmap() -> HashMap<&'static str, String> {
let countries = ["canada", "usa", "mexico"]
.iter()
.map(|&c| {
(
c,
read_to_string(c.to_owned() + ".txt").expect("Error reading file"),
)
})
.collect();
countries
}
fn main() -> std::io::Result<()> {
println!("Hello, world!");
create_files().expect("Failed to create files");
let countries = create_hashmap();
{
println!("{:#?}", countries);
}
std::io::Result::Ok(())
}
Not that specifying the type of countries is not needed here, because the return type of create_hashmap() is defined.

Related

Splitting a UTF-8 string into chunks

I want to split a UTF-8 string into chunks of equal size. I came up with a solution that does exactly that. Now I want to simplify it removing the first collect call if possible. Is there a way to do it?
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻ"
.chars()
.collect::<Vec<char>>()
.chunks(3)
.map(|chunk| chunk.iter().collect::<String>())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
Playground link
You can use chunks() from Itertools.
use itertools::Itertools; // 0.10.1
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻ"
.chars()
.chunks(3)
.into_iter()
.map(|chunk| chunk.collect::<String>())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
This doesn't require Itertools as a dependency and also does not allocate, as it iterates over slices of the original string:
fn chunks(s: &str, length: usize) -> impl Iterator<Item=&str> {
assert!(length > 0);
let mut indices = s.char_indices().map(|(idx, _)| idx).peekable();
std::iter::from_fn(move || {
let start_idx = match indices.next() {
Some(idx) => idx,
None => return None,
};
for _ in 0..length - 1 {
indices.next();
}
let end_idx = match indices.peek() {
Some(idx) => *idx,
None => s.bytes().len(),
};
Some(&s[start_idx..end_idx])
})
}
fn main() {
let strings = chunks("ĄĆĘŁŃÓŚĆŹŻ", 3).collect::<Vec<&str>>();
println!("{:?}", strings);
}
Having considered the problem with graphemes I ended up with the following solution.
I used the unicode-segmentation crate.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻèèèèè"
.graphemes(true)
.collect::<Vec<&str>>()
.chunks(length)
.map(|chunk| chunk.concat())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
I hope some simplifications can still be made.

Read lines in pairs from stdin in rust

My input data is structured as follows:
label_1
value_1
label_2
value_2
...
And my end goal is to read that data into a HashMap
My current working approach is to put even and odd lines in two separate vectors and then read from both vectors to add to Hashmap.
use std::io;
use std::io::prelude::*;
use std::collections::HashMap;
fn main() {
let mut labels: Vec<String> = Vec::new();
let mut values: Vec<String> = Vec::new();
let stdin = io::stdin();
/* Read lines piped from stdin*/
for (i, line) in stdin.lock().lines().enumerate() {
if i % 2 == 0 {
/* store labels (even lines) in labels vector */
labels.push(line.unwrap());
} else {
/* Store values (odd lines) in values vector */
values.push(line.unwrap());
}
}
println!("number of labels: {}", labels.len());
println!("number of values: {}", values.len());
/* Zip labels and values into one iterator */
let double_iter = labels.iter().zip(values.iter());
/* insert (label: value) pairs into hashmap */
let mut records: HashMap<&String, &String> = HashMap::new();
for (label, value) in double_iter {
records.insert(label, value);
}
}
I would like ask how to achieve this result without going though an intermediary step with vectors ?
You can use .tuples() from the itertools crate:
use itertools::Itertools;
use std::io::{stdin, BufRead};
fn main() {
for (label, value) in stdin().lock().lines().tuples() {
println!("{}: {}", label.unwrap(), value.unwrap());
}
}
See also:
This answer on "Are there equivalents to slice::chunks/windows for iterators to loop over pairs, triplets etc?"
You can manually advance an iterator with .next()
use std::io;
use std::io::prelude::*;
use std::collections::HashMap;
fn main() {
let stdin = io::stdin();
let mut lines = stdin.lock().lines();
let mut records = HashMap::new();
while let Some(label) = lines.next() {
let value = lines.next().expect("No value for label");
records.insert(label.unwrap(), value.unwrap());
}
}
Playground
How about:
fn main() {
let lines = vec![1,2,3,4,5,6];
let mut records = std::collections::HashMap::new();
for i in (0..lines.len()).step_by(2) {
// make sure the `i+1` is existed
println!("{}{}", lines[i], lines[i + 1]);
records.insert(lines[i], lines[i + 1]);
}
}

How can I set a struct field value by string name?

Out of habit from interpreted programming languages, I want to rewrite many values based on their key. I assumed that I would store all the information in the struct prepared for this project. So I started iterating:
struct Container {
x: String,
y: String,
z: String
}
impl Container {
// (...)
fn load_data(&self, data: &HashMap<String, String>) {
let valid_keys = vec_of_strings![ // It's simple vector with Strings
"x", "y", "z"
] ;
for key_name in &valid_keys {
if data.contains_key(key_name) {
self[key_name] = Some(data.get(key_name);
// It's invalid of course but
// I do not know how to write it correctly.
// For example, in PHP I would write it like this:
// $this[$key_name] = $data[$key_name];
}
}
}
// (...)
}
Maybe macros? I tried to use them. key_name is always interpreted as it is, I cannot get value of key_name instead.
How can I do this without repeating the code for each value?
With macros, I always advocate starting from the direct code, then seeing what duplication there is. In this case, we'd start with
fn load_data(&mut self, data: &HashMap<String, String>) {
if let Some(v) = data.get("x") {
self.x = v.clone();
}
if let Some(v) = data.get("y") {
self.y = v.clone();
}
if let Some(v) = data.get("z") {
self.z = v.clone();
}
}
Note the number of differences:
The struct must take &mut self.
It's inefficient to check if a value is there and then get it separately.
We need to clone the value because we only only have a reference.
We cannot store an Option in a String.
Once you have your code working, you can see how to abstract things. Always start by trying to use "lighter" abstractions (functions, traits, etc.). Only after exhausting that, I'd start bringing in macros. Let's start by using stringify
if let Some(v) = data.get(stringify!(x)) {
self.x = v.clone();
}
Then you can extract out a macro:
macro_rules! thing {
($this: ident, $data: ident, $($name: ident),+) => {
$(
if let Some(v) = $data.get(stringify!($name)) {
$this.$name = v.clone();
}
)+
};
}
impl Container {
fn load_data(&mut self, data: &HashMap<String, String>) {
thing!(self, data, x, y, z);
}
}
fn main() {
let mut c = Container::default();
let d: HashMap<_, _> = vec![("x".into(), "alpha".into())].into_iter().collect();
c.load_data(&d);
println!("{:?}", c);
}
Full disclosure: I don't think this is a good idea.

Implement slice_shift_char using the std library

I'd like to use the &str method slice_shift_char, but it is marked as unstable in the documentation:
Unstable: awaiting conventions about shifting and slices and may not
be warranted with the existence of the chars and/or char_indices
iterators
What would be a good way to implement this method, with Rust's current std library? So far I have:
fn slice_shift_char(s: &str) -> Option<(char, &str)> {
let mut ixs = s.char_indices();
let next = ixs.next();
match next {
Some((next_pos, ch)) => {
let rest = unsafe {
s.slice_unchecked(next_pos, s.len())
};
Some((ch, rest))
},
None => None
}
}
I'd like to avoid the call to slice_unchecked. I'm using Rust 1.1.
Well, you can look at the source code, and you'll get https://github.com/rust-lang/rust/blob/master/src/libcollections/str.rs#L776-L778 and https://github.com/rust-lang/rust/blob/master/src/libcore/str/mod.rs#L1531-L1539 . The second:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let next_s = unsafe { self.slice_unchecked(ch.len_utf8(), self.len()) };
Some((ch, next_s))
}
}
If you don't want the unsafe, you can just use a normal slice:
fn slice_shift_char(&self) -> Option<(char, &str)> {
if self.is_empty() {
None
} else {
let ch = self.char_at(0);
let len = self.len();
let next_s = &self[ch.len_utf8().. len];
Some((ch, next_s))
}
}
The unstable slice_shift_char function has been deprecated since Rust 1.9.0 and removed completely in Rust 1.11.0.
As of Rust 1.4.0, the recommended approach of implementing this is:
Use .chars() to get an iterator of the char content
Iterate on this iterator once to get the first character.
Call .as_str() on that iterator to recover the remaining uniterated string.
fn slice_shift_char(a: &str) -> Option<(char, &str)> {
let mut chars = a.chars();
chars.next().map(|c| (c, chars.as_str()))
}
fn main() {
assert_eq!(slice_shift_char("hello"), Some(('h', "ello")));
assert_eq!(slice_shift_char("ĺḿńóṕ"), Some(('ĺ', "ḿńóṕ")));
assert_eq!(slice_shift_char(""), None);
}

Using the same iterator multiple times in Rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?
As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.
take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

Resources