Populating a Hashmap with a vector of string slices in rust - rust

I've been pulling my hair out with this one.
I apologize in advance if it's a poorly worded question.
So, I have a Hashmap in the outer scope and want to populate it with string slices.
// Hashmap declaration.
let mut words: std::collections::HashMap< &str, std::vec::Vec<&str> > = std::collections::HashMap::new();
for file_name in ["conjuctions", "nouns", "verbs"].iter() { // Reading some file.
let file_content = std::fs::read_to_string("../wordlists/{file_name}.txt");
let mut fc = match file_content {
Ok(file_content) => file_content,
Err(_) => panic!("Failed to read the file: ../wordlists/{file_name}.txt"),
};
let mut wordlist_vec: Vec<&str> = fc.split("\n").collect();
words.insert( file_name, wordlist_vec );
}
println!(words["conjunctions"])
// Using it outside the above scope throws an error. That FC was dropped but still borrowed.
So basically, my question is, how can I use the hash map outside the scope for the loop above?
I think the issues emanate from using string slices (split returns slices ig) but I'm not too sure.

You simply need to use an owned String instead of &strs.
let mut words: HashMap<String, Vec<String>> = HashMap::new();
// ...
// We use map to change the elements of the iterator to owned Strings.
let mut wordlist_vec: Vec<String> = fc.split("\n").map(String::from).collect();
words.insert(file_name.to_string(), wordlist_vec);

Related

Splitting a Vec of strings into Vec<Vec<String>>

I am attempting to relearn data-science in rust.
I have a Vec<String> that includes a delimiter "|" and a new line "!end".
What I'd like to end up with is Vec<Vec<String>> that can be put into a 2D ND array.
I have this python Code:
file = open('somefile.dat')
lst = []
for line in file:
lst += [line.split('|')]
df = pd.DataFrame(lst)
SAMV2FinalDataFrame = pd.DataFrame(lst,columns=column_names)
And i've recreated it here in rust:
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let lines = lines_from_file(".dat");
let mut new_arr = vec![];
//Here i get a lines immitable borrow
for line in lines{
new_arr.push([*line.split("!end")]);
}
// here i get expeected closure found str
let x = lines.split("!end");
let array = Array::from(lines)
what i have: ['1','1','1','end!','2','2','2','!end']
What i need: [['1','1','1'],['2','2','2']]
Edit: also why when i turbo fish does it make it disappear on Stack Overflow?
I think part of the issue you ran into was due how you worked with arrays. For example, Vec::push will only add a single element so you would want to use Vec::extend instead. I also ran into a few cases of empty strings due to splitting by "!end" would leave trailing '|' on the ends of substrings. The errors were quite strange, I am not completely sure where the closure came from.
let lines = vec!["1|1|1|!end|2|2|2|!end".to_string()];
let mut new_arr = Vec::new();
// Iterate over &lines so we don't consume lines and it can be used again later
for line in &lines {
new_arr.extend(line.split("!end")
// Remove trailing empty string
.filter(|x| !x.is_empty())
// Convert each &str into a Vec<String>
.map(|x| {
x.split('|')
// Remove empty strings from ends split (Ex split: "|2|2|2|")
.filter(|x| !x.is_empty())
// Convert &str into owned String
.map(|x| x.to_string())
// Turn iterator into Vec<String>
.collect::<Vec<_>>()
}));
}
println!("{:?}", new_arr);
I also came up with this other version which should handle your use case better. The earlier approach dropped all empty strings, while this one should preserve them while correctly handling the "!end".
use std::io::{self, BufRead, BufReader, Read, Cursor};
fn split_data<R: Read>(buffer: &mut R) -> io::Result<Vec<Vec<String>>> {
let mut sections = Vec::new();
let mut current_section = Vec::new();
for line in BufReader::new(buffer).lines() {
for item in line?.split('|') {
if item != "!end" {
current_section.push(item.to_string());
} else {
sections.push(current_section);
current_section = Vec::new();
}
}
}
Ok(sections)
}
In this example, I used Read for easier testing, but it will also work with a file.
let sample_input = b"1|1|1|!end|2|2|2|!end";
println!("{:?}", split_data(&mut Cursor::new(sample_input)));
// Output: Ok([["1", "1", "1"], ["2", "2", "2"]])
// You can also use a file instead
let mut file = File::new("somefile.dat");
let solution: Vec<Vec<String>> = split_data(&mut file).unwrap();
playground link

How to move a range of elements from BytesMut?

I have a method that takes a mutable instance of BytesMut. I want to move chunks of it into other instances of BytesMut but am not sure about the syntax to do so. Are there any examples out there?
You could use the range operator on the original buf to move things around or split_off based on some offset value. For example:
use bytes::{BufMut, BytesMut};
fn main() {
let mut buf = BytesMut::with_capacity(64);
let mut buf_to = BytesMut::with_capacity(64);
buf.put_u8(b't');
buf.put_u8(b'e');
buf.put_u8(b's');
buf.put_u8(b't');
// move last 2 elements
buf_to.put(&buf[2..]);
println!("{:#?}", buf_to); // b"st"
// You can also split_off the original value
let mut another_buf = buf.split_off(2);
println!("{:#?}", another_buf); // b"st"
println!("{:#?}", buf); // b"te"
}

How do I modify Rc<RefCell> from inside the closure?

I am trying to pass RefCell to a function in a closure and then modify the same variable from inside the closure. Here is my code:
let path: Rc<RefCell<Option<Vec<PathBuf>>>> = Rc::new(RefCell::new(None));
...
//valid value assigned to path
...
let cloned_path = path.clone();
button_run.connect_clicked(move |_| {
let to_remove: usize = open_dir(&mut cloned_path.borrow_mut().deref_mut());
//Here I need to remove "to_remove" index from cloned_path
});
//Choose a random directory from Vec and open it. Returns opened index.
fn open_dir(path_two: &mut Option<Vec<PathBuf>>) -> usize {
let vec = path_two.clone();
let vec_length = vec.unwrap().len();
let mut rng = thread_rng();
let rand_number = rng.gen_range(0, vec_length);
let p: &str = &*path_two.clone().expect("8")[rand_number].to_str().unwrap().to_string();
Command::new("explorer.exe").arg(p).output();
rand_number.clone()
}
First I thought that since my open_dir() function accepts &mut, I can modify the vector inside the function. But no matter what I tried I kept getting cannot move out of borrowed content error.
Then I thought - ok, I can return the index from the function and access cloned_path from the closure itself. But the only code that I could get to compile is
button_run.connect_clicked(move |_| {
let to_remove: usize = open_dir(&mut cloned_path.borrow_mut().deref_mut());
let x = &*cloned_path.borrow_mut().clone().unwrap().remove(to_remove);
});
It works, but it removes from a cloned version of cloned_path, leaving the original unaffected. Is there a way to access cloned_path directly to modify it's contents and if there is one, how do I approach this task?
The main way to modify contents of an enum value (and Option is enum) is pattern matching:
fn do_something(path_two: &mut Option<Vec<PathBuf>>) {
if let Some(ref mut paths) = *path_two {
paths.push(Path::new("abcde").to_path_buf());
}
}
Note that paths pattern variable is bound with ref mut qualifier - it means that it will be of type &mut Vec<PathBuf>, that is, a mutable reference to the internals of the option, exactly what you need to modify the vector, in case it is present.

Default mutable value from HashMap

Suppose I have a HashMap and I want to get a mutable reference to an entry, or if that entry does not exist I want a mutable reference to a new object, how can I do it? I've tried using unwrap_or(), something like this:
fn foo() {
let mut map: HashMap<&str, Vec<&str>> = HashMap::new();
let mut ref = map.get_mut("whatever").unwrap_or( &mut Vec::<&str>::new() );
// Modify ref.
}
But that doesn't work because the lifetime of the Vec isn't long enough. Is there any way to tell Rust that I want the returned Vec to have the same lifetime as foo()? I mean there is this obvious solution but I feel like there should be a better way:
fn foo() {
let mut map: HashMap<&str, Vec<&str>> = HashMap::new();
let mut dummy: Vec<&str> = Vec::new();
let mut ref = map.get_mut("whatever").unwrap_or( &dummy );
// Modify ref.
}
As mentioned by Shepmaster, here is an example of using the entry pattern. It seems verbose at first, but this avoids allocating an array you might not use unless you need it. I'm sure you could make a generic function around this to cut down on the chatter :)
use std::collections::HashMap;
use std::collections::hash_map::Entry::{Occupied, Vacant};
fn foo() {
let mut map = HashMap::<&str, Vec<&str>>::new();
let mut result = match map.entry("whatever") {
Vacant(entry) => entry.insert(Vec::new()),
Occupied(entry) => entry.into_mut(),
};
// Do the work
result.push("One thing");
result.push("Then another");
}
This can also be shortened to or_insert as I just discovered!
use std::collections::HashMap;
fn foo() {
let mut map = HashMap::<&str, Vec<&str>>::new();
let mut result = map.entry("whatever").or_insert(Vec::new());
// Do the work
result.push("One thing");
result.push("Then another");
}
If you want to add your dummy into the map, then this is a duplicate of How to properly use HashMap::entry? or Want to add to HashMap using pattern match, get borrow mutable more than once at a time (or any question about the entry API).
If you don't want to add it, then your code is fine, you just need to follow the compiler error messages to fix it. You are trying to use a keyword as an identifier (ref), and you need to get a mutable reference to dummy (& mut dummy):
use std::collections::HashMap;
fn foo() {
let mut map: HashMap<&str, Vec<&str>> = HashMap::new();
let mut dummy: Vec<&str> = Vec::new();
let f = map.get_mut("whatever").unwrap_or( &mut dummy );
}
fn main() {}

Creating a vector of strings using the new std::fs::File

Porting my code from old_io to the new std::io
let path = Path::new("src/wordslist/english.txt");
let display = path.display();
let mut file = match File::open(&path) {
// The `desc` field of `IoError` is a string that describes the error
Err(why) => panic!("couldn't open {}: {}", display,
Error::description(&why)),
Ok(file) => file,
};
let mut s = String::new();
match file.read_to_string(&mut s) {
Err(why) => panic!("couldn't read {}: {}", display,
Error::description(&why)),
Ok(s) => s,
};
let words: Vec<_> = s.words().collect();
So this works but requires me to have a mutable string s to read the file contents, and then use words().collect() to gather into into a vector,
Is there a way to read the contents of a file to a vector using something like words() WITHOUT reading it to the mutable buffer string first? My thought is that this would be more performant in situations where the collect() call might happen at a later point, or after a words().map(something).
Your approach has a problem. .words() operates on an &str (string slice) which needs a parent String to refer to. Your example works fine because the Vec produced by s.words().collect() resides in the same scope as s, so it won't outlive the source string. But if you want to move it somewhere else, you'll need to end up with a Vec<String> instead of a Vec<&str>, which I'm assuming you already want if you're concerned about intermediate buffers.
You do have some options. Here's two that I can think of.
You can iterate over the characters of the file and split on whitespace:
// `.peekable()` gives us `.is_empty()` for an `Iterator`
// `.chars()` yields a `Result<char, CharsError>` which needs to be dealt with
let mut chars = file.chars().map(Result::unwrap).peekable();
let mut words: Vec<String> = Vec::new();
while !chars.is_empty() {
// This needs a type hint because it can't rely on info
// from the following `if` block
let word: String = chars.take_while(|ch| !ch.is_whitespace()).collect();
// We'll have an empty string if there's more than one
// whitespace character between words
// (more than one because the first is eaten
// by the last iteration of `.take_while()`)
if !word.is_empty() {
words.push(word);
}
}
You can wrap the File object in a std::io::BufReader and read it line-by-line with the .lines() iterator:
let mut reader = BufReader::new(file);
let mut words = Vec::new();
// `.lines()` yields `Result<String, io::Error>` so we have to handle that.
// (it will not yield an EOF error, this is for abnormal errors during reading)
for line in reader.lines().map(Result::unwrap) {
words.extend(line.words().map(String::from_str));
}
// Or alternately (this may not work due to lifetime errors in `flat_map()`
let words: Vec<_> = reader.lines().map(Result::unwrap)
.flat_map(|line| line.words().map(String::from_str))
.collect();
It's up to you to decide which of the two solutions you prefer. The former is probably more efficient but maybe less intuitive. The latter is easier to read, especially the for-loop version, but allocates intermediate buffers.

Resources