My input data is structured as follows:
label_1
value_1
label_2
value_2
...
And my end goal is to read that data into a HashMap
My current working approach is to put even and odd lines in two separate vectors and then read from both vectors to add to Hashmap.
use std::io;
use std::io::prelude::*;
use std::collections::HashMap;
fn main() {
let mut labels: Vec<String> = Vec::new();
let mut values: Vec<String> = Vec::new();
let stdin = io::stdin();
/* Read lines piped from stdin*/
for (i, line) in stdin.lock().lines().enumerate() {
if i % 2 == 0 {
/* store labels (even lines) in labels vector */
labels.push(line.unwrap());
} else {
/* Store values (odd lines) in values vector */
values.push(line.unwrap());
}
}
println!("number of labels: {}", labels.len());
println!("number of values: {}", values.len());
/* Zip labels and values into one iterator */
let double_iter = labels.iter().zip(values.iter());
/* insert (label: value) pairs into hashmap */
let mut records: HashMap<&String, &String> = HashMap::new();
for (label, value) in double_iter {
records.insert(label, value);
}
}
I would like ask how to achieve this result without going though an intermediary step with vectors ?
You can use .tuples() from the itertools crate:
use itertools::Itertools;
use std::io::{stdin, BufRead};
fn main() {
for (label, value) in stdin().lock().lines().tuples() {
println!("{}: {}", label.unwrap(), value.unwrap());
}
}
See also:
This answer on "Are there equivalents to slice::chunks/windows for iterators to loop over pairs, triplets etc?"
You can manually advance an iterator with .next()
use std::io;
use std::io::prelude::*;
use std::collections::HashMap;
fn main() {
let stdin = io::stdin();
let mut lines = stdin.lock().lines();
let mut records = HashMap::new();
while let Some(label) = lines.next() {
let value = lines.next().expect("No value for label");
records.insert(label.unwrap(), value.unwrap());
}
}
Playground
How about:
fn main() {
let lines = vec![1,2,3,4,5,6];
let mut records = std::collections::HashMap::new();
for i in (0..lines.len()).step_by(2) {
// make sure the `i+1` is existed
println!("{}{}", lines[i], lines[i + 1]);
records.insert(lines[i], lines[i + 1]);
}
}
Related
I am attempting to relearn data-science in rust.
I have a Vec<String> that includes a delimiter "|" and a new line "!end".
What I'd like to end up with is Vec<Vec<String>> that can be put into a 2D ND array.
I have this python Code:
file = open('somefile.dat')
lst = []
for line in file:
lst += [line.split('|')]
df = pd.DataFrame(lst)
SAMV2FinalDataFrame = pd.DataFrame(lst,columns=column_names)
And i've recreated it here in rust:
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let lines = lines_from_file(".dat");
let mut new_arr = vec![];
//Here i get a lines immitable borrow
for line in lines{
new_arr.push([*line.split("!end")]);
}
// here i get expeected closure found str
let x = lines.split("!end");
let array = Array::from(lines)
what i have: ['1','1','1','end!','2','2','2','!end']
What i need: [['1','1','1'],['2','2','2']]
Edit: also why when i turbo fish does it make it disappear on Stack Overflow?
I think part of the issue you ran into was due how you worked with arrays. For example, Vec::push will only add a single element so you would want to use Vec::extend instead. I also ran into a few cases of empty strings due to splitting by "!end" would leave trailing '|' on the ends of substrings. The errors were quite strange, I am not completely sure where the closure came from.
let lines = vec!["1|1|1|!end|2|2|2|!end".to_string()];
let mut new_arr = Vec::new();
// Iterate over &lines so we don't consume lines and it can be used again later
for line in &lines {
new_arr.extend(line.split("!end")
// Remove trailing empty string
.filter(|x| !x.is_empty())
// Convert each &str into a Vec<String>
.map(|x| {
x.split('|')
// Remove empty strings from ends split (Ex split: "|2|2|2|")
.filter(|x| !x.is_empty())
// Convert &str into owned String
.map(|x| x.to_string())
// Turn iterator into Vec<String>
.collect::<Vec<_>>()
}));
}
println!("{:?}", new_arr);
I also came up with this other version which should handle your use case better. The earlier approach dropped all empty strings, while this one should preserve them while correctly handling the "!end".
use std::io::{self, BufRead, BufReader, Read, Cursor};
fn split_data<R: Read>(buffer: &mut R) -> io::Result<Vec<Vec<String>>> {
let mut sections = Vec::new();
let mut current_section = Vec::new();
for line in BufReader::new(buffer).lines() {
for item in line?.split('|') {
if item != "!end" {
current_section.push(item.to_string());
} else {
sections.push(current_section);
current_section = Vec::new();
}
}
}
Ok(sections)
}
In this example, I used Read for easier testing, but it will also work with a file.
let sample_input = b"1|1|1|!end|2|2|2|!end";
println!("{:?}", split_data(&mut Cursor::new(sample_input)));
// Output: Ok([["1", "1", "1"], ["2", "2", "2"]])
// You can also use a file instead
let mut file = File::new("somefile.dat");
let solution: Vec<Vec<String>> = split_data(&mut file).unwrap();
playground link
I would like to take a vector of characters and duplicate the first letter and the last one.
The only way I managed to do that is with this ugly code:
fn repeat_ends(s: &Vec<char>) -> Vec<char> {
let mut result: Vec<char> = Vec::new();
let first = s.first().unwrap();
let last = s.last().unwrap();
result.push(*first);
result.append(&mut s.clone());
result.push(*last);
result
}
fn main() {
let test: Vec<char> = String::from("Hello world !").chars().collect();
println!("{:?}", repeat_ends(&test)); // "HHello world !!"
}
What would be a better way to do it?
I am not sure if it is "better" but one way is using slice patterns:
fn repeat_ends(s: &Vec<char>) -> Vec<char> {
match s[..] {
[first, .. , last ] => {
let mut out = Vec::with_capacity(s.len() + 2);
out.push(first);
out.extend(s);
out.push(last);
out
},
_ => panic!("whatever"), // or s.clone()
}
}
If it can be mutable:
fn repeat_ends(s: &mut Vec<char>) {
if let [first, .. , last ] = s[..] {
s.insert(0, first);
s.push(last);
}
}
If it's ok to mutate the original vector, this does the job:
fn repeat_ends(s: &mut Vec<char>) {
let first = *s.first().unwrap();
s.insert(0, first);
let last = *s.last().unwrap();
s.push(last);
}
fn main() {
let mut test: Vec<char> = String::from("Hello world !").chars().collect();
repeat_ends(&mut test);
println!("{}", test.into_iter().collect::<String>()); // "HHello world !!"
}
Vec::insert:
Inserts an element at position index within the vector, shifting all elements after it to the right.
This means the function repeat_ends would be O(n) with n being the number of characters in the vector. I'm not sure if there is a more efficient method if you need to use a vector, but I'd be curious to hear it if there is.
I have three structs:
struct A;
struct B;
struct C {
a: Option<A>,
b: Option<B>
}
Given inputs Vec<A> and Vec<B> and some predicate function, I want to create an output Vec<C>, which is a combination of the elements of the inputs, something like the following:
let aVec: Vec<A> = vec![];
let bVec: Vec<B> = vec![];
let mut cVec: Vec<C> = vec![];
for a in aVec {
if let Some(b) = bVec.into_iter().find(predicate) {
cVec.push(C{a: Some(a), b: Some(b)});
}
}
Is there a way to do this without needing B to be copyable? Both input vectors aren't required after the operation. Also, is this possible without the loop?
You can:
Find the index of the element satisfying predicate. (I would use Iterator::position.)
remove or swap_remove the element at the position obtained by the previous step.
push the previously removed element into result.
In code:
use itertools; // 0.8.2
use itertools::Itertools;
struct A {}
struct B {
n: usize,
}
struct C {
a: Option<A>,
b: Option<B>
}
fn main() {
let aVec: Vec<A> = vec![];
let mut bVec: Vec<B> = vec![];
let mut cVec: Vec<C> = vec![];
for a in aVec {
if let Some(idx) = bVec.iter()
.position(|b| b.n==42)
{
let b = bVec.remove(idx); // or swap_remove if ordering does not need to be preserved
cVec.push(C{a: Some(a), b: Some(b)});
}
}
}
I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators
How would one translate the following Python, in which several files are read and their contents are used as values to a dictionary (with filename as key), to Rust?
countries = {region: open("{}.txt".format(region)).read() for region in ["canada", "usa", "mexico"]}
My attempt is shown below, but I was wondering if a one-line, idiomatic solution is possible.
use std::{
fs::File,
io::{prelude::*, BufReader},
path::Path,
collections::HashMap,
};
macro_rules! map(
{ $($key:expr => $value:expr),+ } => {
{
let mut m = HashMap::new();
$(
m.insert($key, $value);
)+
m
}
};
);
fn lines_from_file<P>(filename: P) -> Vec<String>
where
P: AsRef<Path>,
{
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let _countries = map!{ "canada" => lines_from_file("canada.txt"),
"usa" => lines_from_file("usa.txt"),
"mexico" => lines_from_file("mexico.txt") };
}
Rust's iterators have map/filter/collect methods which are enough to do anything Python's comprehensions can. You can create a HashMap with collect on an iterator of pairs, but collect can return various types of collections, so you may have to specify the type you want.
For example,
use std::collections::HashMap;
fn main() {
println!(
"{:?}",
(1..5).map(|i| (i + i, i * i)).collect::<HashMap<_, _>>()
);
}
Is roughly equivalent to the Python
print({i+i: i*i for i in range(1, 5)})
But translated very literally, it's actually closer to
from builtins import dict
def main():
print("{!r}".format(dict(map(lambda i: (i+i, i*i), range(1, 5)))))
if __name__ == "__main__":
main()
not that you would ever say it that way in Python.
Python's comprehensions are just sugar for a for loop and accumulator. Rust has macros--you can make any sugar you want.
Take this simple Python example,
print({i+i: i*i for i in range(1, 5)})
You could easily re-write this as a loop and accumulator:
map = {}
for i in range(1, 5):
map[i+i] = i*i
print(map)
You could do it basically the same way in Rust.
use std::collections::HashMap;
fn main() {
let mut hm = HashMap::new();
for i in 1..5 {
hm.insert(i + i, i * i);
}
println!("{:?}", hm);
}
You can use a macro to do the rewriting to this form for you.
use std::collections::HashMap;
macro_rules! hashcomp {
($name:ident = $k:expr => $v:expr; for $i:ident in $itr:expr) => {
let mut $name = HashMap::new();
for $i in $itr {
$name.insert($k, $v);
}
};
}
When you use it, the resulting code is much more compact. And this choice of separator tokens makes it resemble the Python.
fn main() {
hashcomp!(hm = i+i => i*i; for i in 1..5);
println!("{:?}", hm);
}
This is just a basic example that can handle a single loop. Python's comprehensions also can have filters and additional loops, but a more advanced macro could probably do that too.
Without using your own macros I think the closest to
countries = {region: open("{}.txt".format(region)).read() for region in ["canada", "usa", "mexico"]}
in Rust would be
let countries: HashMap<_, _> = ["canada", "usa", "mexico"].iter().map(|&c| {(c,read_to_string(c.to_owned() + ".txt").expect("Error reading file"),)}).collect();
but running a formatter, will make it more readable:
let countries: HashMap<_, _> = ["canada", "usa", "mexico"]
.iter()
.map(|&c| {
(
c,
read_to_string(c.to_owned() + ".txt").expect("Error reading file"),
)
})
.collect();
A few notes:
To map a vector, you need to transform it into an iterator, thus iter().map(...).
To transform an iterator back into a tangible data structure, e.g. a HashMap (dict), use .collect(). This is the advantage and pain of Rust, it is very strict with types, no unexpected conversions.
A complete test program:
use std::collections::HashMap;
use std::fs::{read_to_string, File};
use std::io::Write;
fn create_files() -> std::io::Result<()> {
let regios = [
("canada", "Ottawa"),
("usa", "Washington"),
("mexico", "Mexico city"),
];
for (country, capital) in regios {
let mut file = File::create(country.to_owned() + ".txt")?;
file.write_fmt(format_args!("The capital of {} is {}", country, capital))?;
}
Ok(())
}
fn create_hashmap() -> HashMap<&'static str, String> {
let countries = ["canada", "usa", "mexico"]
.iter()
.map(|&c| {
(
c,
read_to_string(c.to_owned() + ".txt").expect("Error reading file"),
)
})
.collect();
countries
}
fn main() -> std::io::Result<()> {
println!("Hello, world!");
create_files().expect("Failed to create files");
let countries = create_hashmap();
{
println!("{:#?}", countries);
}
std::io::Result::Ok(())
}
Not that specifying the type of countries is not needed here, because the return type of create_hashmap() is defined.