Confusion about Rust HashMap and String borrowing - hashmap

This program accepts an integer N, followed by N lines containing two strings separated by a space. I want to put those lines into a HashMap using the first string as the key and the second string as the value:
use std::collections::HashMap;
use std::io;
fn main() {
let mut input = String::new();
io::stdin().read_line(&mut input)
.expect("unable to read line");
let desc_num: u32 = match input.trim().parse() {
Ok(num) => num,
Err(_) => panic!("unable to parse")
};
let mut map = HashMap::<&str, &str>::new();
for _ in 0..desc_num {
input.clear();
io::stdin().read_line(&mut input)
.expect("unable to read line");
let data = input.split_whitespace().collect::<Vec<&str>>();
println!("{:?}", data);
// map.insert(data[0], data[1]);
}
}
The program works as intended:
3
a 1
["a", "1"]
b 2
["b", "2"]
c 3
["c", "3"]
When I try to put those parsed strings into a HashMap and uncomment map.insert(data[0], data[1]);, the compilation fails with this error:
error: cannot borrow `input` as mutable because it is also borrowed as immutable [E0502]
input.clear();
^~~~~
note: previous borrow of `input` occurs here; the immutable borrow prevents subsequent moves or mutable borrows of `input` until the borrow ends
let data = input.split_whitespace().collect::<Vec<&str>>();
^~~~~
note: previous borrow ends here
fn main() {
...
}
^
I don't understand why this error would come up, since I think the map.insert() expression doesn't borrow the string input at all.

split_whitespace() doesn't give you two new Strings containing (copies of) the non-whitespace parts of the input. Instead you get two references into the memory managed by input, of type &str. So when you then try to clear input and read the next line of input into it, you try overwriting memory that's still being used by the hash map.
Why does split_whitespace (and many other string methods, I should add) complicate matters by returning &str? Because it's often enough, and in those cases it avoid unnecessary copies. In this specific case however, it's probably best to explicitly copy the relevant parts of the string:
map.insert(data[0].clone(), data[1].clone());

Related

How do I read a String from a File, split it, and create a Vec<&str> in one statement?

I need help regarding how to convert file input taken as a string to an vector.
I tried
let content = fs::read_to_string(file_path).expect("Failed to read input");
let content: Vec<&str> = content.split("\n").collect();
This works, but I wanted to convert it to one statement. Something like
let content: Vec<&str> = fs::read_to_string(file_path)
.expect("Failed to read input")
.split("\n")
.collect();
I tried using
let content: Vec<&str> = match fs::read_to_string(file_path) {
Ok(value) => value.split("\n").collect(),
Err(err) => {
println!("Error Unable to read the file {}", err);
return ();
}
};
and
let content: Vec<&str> = match fs::read_to_string(file_path) {
Ok(value) => value,
Err(err) => {
println!("Error Unable to read the file {}", err);
return ();
}
}
.split("\n")
.collect();
The compiler says that the borrowed values does not live long enough (1st) and value in freed while in use (2nd) (problem with borrowing, scope and ownership).
error[E0716]: temporary value dropped while borrowed
--> src/lib.rs:4:26
|
4 | let content: Vec<&str> = fs::read_to_string("")
| __________________________^
5 | | .expect("Failed to read input")
| |___________________________________^ creates a temporary which is freed while still in use
6 | .split("\n")
7 | .collect();
| - temporary value is freed at the end of this statement
8 |
9 | dbg!(content);
| ------- borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
I still lack much understanding about how to fix them.
It is impossible to do this in one expression. Use two expressions with a let, as you already are and as the compiler tells you.
The problem is that split produces string slices (&str) that reference the temporary String. That String is deallocated at the end of the statement, making the references invalid. Rust is preventing you from introducing memory unsafety:
fs::read_to_string(file_path) // Creates a String
.expect("Failed to read input")
.split("\n") // Takes references into the String
.collect(); // String is dropped, invalidating references
If you didn't need a Vec<&str>, you could have a Vec<String>:
fs::read_to_string(file_path)
.expect("Failed to read input")
.split("\n")
.map(|s| s.to_string()) // Convert &str to String
.collect();
See also:
Temporary value dropped while borrowed, but I don't want to do a let
Using a `let` binding to increase a values lifetime
"borrowed value does not live long enough" seems to blame the wrong thing
Why does the compiler tell me to consider using a `let` binding" when I already am?
Do I need to use a `let` binding to create a longer lived value?
Why is it legal to borrow a temporary?
You create a String with read_to_string, but it is not bound to any variable because you are using it right after by using it with split. The String is the temporary variable mentioned in the error. It is not bound to anything. The split function returns references to the contents of the string... but this String will be deallocated by the end of the line, because again, it is not bound to a variable.
If you really need to do it in one line, but a bit less efficiently:
let content: Vec<String> = fs::read_to_string(file_path)
.expect("Failed to read input")
.split("\n")
.map(|line| line.to_string())
.collect();

How can I iterate over a delimited string, accumulating state from previous iterations without explicitly tracking the state?

I want to produce an iterator over a delimited string such that each substring separated by the delimiter is returned on each iteration with the substring from the previous iteration, including the delimiter.
For example, given the string "ab:cde:fg", the iterator should return the following:
"ab"
"ab:cde"
"ab:cde:fg"
Simple Solution
A simple solution is to just iterate over collection returned from splitting on the delimiter, keeping track of the previous path:
let mut state = String::new();
for part in "ab:cde:fg".split(':') {
if !state.is_empty() {
state.push_str(":");
}
state.push_str(part);
dbg!(&state);
}
The downside here is the need to explicitly keep track of the state with an extra mutable variable.
Using scan
I thought scan could be used to hide the state:
"ab:cde:fg"
.split(":")
.scan(String::new(), |state, x| {
if !state.is_empty() {
state.push_str(":");
}
state.push_str(x);
Some(&state)
})
.for_each(|x| { dbg!(x); });
However, this fails with the error:
cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
What is the problem with the scan version and how can it be fixed?
Why even build a new string?
You can get the indices of the : and use slices to the original string.
fn main() {
let test = "ab:cde:fg";
let strings = test
.match_indices(":") // get the positions of the `:`
.map(|(i, _)| &test[0..i]) // get the string to that position
.chain(std::iter::once(test)); // let's not forget about the entire string
for substring in strings {
println!("{:?}", substring);
}
}
(Permalink to the playground)
First of all, let us cheat and get your code to compile, so that we can inspect the issue at hand. We can do so by cloning the state. Also, let's add some debug message:
fn main() -> () {
"ab:cde:fg"
.split(":")
.scan(String::new(), |state, x| { // (1)
if !state.is_empty() {
state.push_str(":");
}
state.push_str(x);
eprintln!(">>> scan with {} {}", state, x);
Some(state.clone())
})
.for_each(|x| { // (2)
dbg!(x);
});
}
This results in the following output:
scan with ab ab
[src/main.rs:13] x = "ab"
scan with ab:cde cde
[src/main.rs:13] x = "ab:cde"
scan with ab:cde:fg fg
[src/main.rs:13] x = "ab:cde:fg"
Note how the eprintln! and dbg! outputs are interleaved? That's the result of Iterator's laziness. However, in practice, this means that our intermediate String is borrowed twice:
in the anonymous function |state, x| in state (1)
in the anonymous function |x| in, well, x (2)
However, this would lead to duplicate borrows, even though at least one of them is mutable. The mutable borrow therefore enforces the lifetime of our String to be bound to the anonymous function, whereas the latter function still needs an alive String. Even if we somehow managed to annotate lifetimes, we would just end up with an invalid borrow in (2), as the value is still borrowed as mutable.
The easy way out is a clone. The smarter way out uses match_indices and string slices.

How to push a value to a Vec and append it to a String at the same time?

I want to write a program that sets the shell for the system's nslookup command line program:
fn main() {
let mut v: Vec<String> = Vec::new();
let mut newstr = String::from("nslookup");
for arg in std::env::args() {
v.push(arg);
newstr.push_str(&format!(" {}", arg));
}
println!("{:?}", v);
println!("{}", newstr);
}
error[E0382]: borrow of moved value: `arg`
--> src/main.rs:6:41
|
5 | v.push(arg);
| --- value moved here
6 | newstr.push_str(&format!(" {}", arg));
| ^^^ value borrowed here after move
|
= note: move occurs because `arg` has type `std::string::String`, which does not implement the `Copy` trait
How to correct the code without traversing env::args() again?
Reverse the order of the lines that use arg:
for arg in std::env::args() {
//newstr.push_str(&format!(" {}", arg));
write!(&mut newstr, " {}", arg);
v.push(arg);
}
Vec::push takes its argument by value, which moves ownership of arg so it can't be used anymore after v.push(arg). format! and related macros implicitly borrow their arguments, so you can use arg again after using it in one of those.
If you really needed to move the same String to two different locations, you would need to add .clone(), which copies the string. But that's not necessary in this case.
Also note that format! creates a new String, which is wasteful when all you want is to add on to the end of an existing String. If you add use std::fmt::Write; to the top of your file, you can use write! instead (as shown above), which is more concise and may be more performant.
See also
What are move semantics in Rust?
error: use of moved value - should I use "&" or "mut" or something else?
Does println! borrow or own the variable?
You can do like that:
fn main() {
let args: Vec<_> = std::env::args().collect();
let s = args.join(" ");
println!("{}", s);
}
First, you create the vector, and then you create your string.

How to know when a borrow ends

I wrote this for simple input parsing:
use std::io;
fn main() {
let mut line = String::new();
io::stdin().read_line(&mut line)
.expect("Cannot read line.");
let parts = line.split_whitespace();
for p in parts {
println!("{}", p);
}
line.clear();
io::stdin().read_line(&mut line)
.expect("Cannot read line.");
}
The above code creates a String object, reads a line into it, splits it by whitespace and prints he output. Then it tries to do the same using the same String object. On compilation I get error:
--> src/main.rs:15:5
|
9 | let parts = line.split_whitespace();
| ---- immutable borrow occurs here
...
15 | line.clear();
| ^^^^ mutable borrow occurs here
...
19 | }
| - immutable borrow ends here
As String is owned by an iterator. The solution is described as:
let parts: Vec<String> = line.split_whitespace()
.map(|s| String::from(s))
.collect();
I have few questions here:
I have already consumed the iterator by calling for each on it. Its borrow should have ended.
How do I know lifetimes of borrow from function definitions?
If a function is borrowing an object how do I know its releasing it? e.g. in solution using collect() releases the borrow.
I think I am missing an important concept here.
The problem in your code is that you bind the result of line.split_whitespace() to a name (parts). If you write this instead:
io::stdin().read_line(&mut line)
.expect("Cannot read line.");
for p in line.split_whitespace() { // <-- pass directly into loop
println!("{}", p);
}
line.clear();
io::stdin().read_line(&mut line)
.expect("Cannot read line.");
That way it just works. Another possibility is to artificially restrict the lifetimes of parts, like so:
io::stdin().read_line(&mut line)
.expect("Cannot read line.");
{
let parts = line.split_whitespace();
for p in parts {
println!("{}", p);
}
}
line.clear();
io::stdin().read_line(&mut line)
.expect("Cannot read line.");
This also works.
So why is that? This is due to how the compiler currently works, often called "lexical borrows". The problem here is that each non-temporary value which contains a borrow will be "alive" until the end of its scope.
In your case: since you assign the result of split_whitespace() (which borrows the string) to parts, the borrow is "alive" until the end of scope of parts. Not until the end of life of parts.
In the first version in this answer, we don't bind a name to the value, thus the result of split_whitespace() is only a temporary and the borrow doesn't extend out the the whole scope. That's also why your collect() example works: not because of collect(), but because there is never a name bound to something borrowing the string. In my second version, we just restrict the scope.
Note, that this is a known shortcoming of the compiler. You are right, the compiler just doesn't see it.

Rust string lifetimes and iterator adapters (lifetime compile error)

I'm working with CSV so I need to trim newlines and split on commas for each line, and filter out any lines that have a '?' in them.
let instances: Vec<Vec<&str>> = file.lines()
.map(|x| x.unwrap())
.filter(|x| !(x.contains("?")))
.map(|x| x.as_slice().trim_chars('\n').split_str(",").collect()).collect();
This is the compiler error message I'm getting:
.../src/main.rs:13:18: 13:19 error: `x` does not live long enough
.../src/main.rs:13 .map(|x| x.as_slice().trim_chars('\n').split_str(",").collect()).collect();
^
.../src/main.rs:7:11: 21:2 note: reference must be valid for the block at 7:10...
.../src/main.rs:7 fn main() {
.../src/main.rs:8 let path = Path::new("./...");
.../src/main.rs:9 let mut file = BufferedReader::new(File::open(&path));
.../src/main.rs:10 let instances: Vec<Vec<&str>> = file.lines()
.../src/main.rs:11 .map(|x| x.unwrap())
.../src/main.rs:12 .filter(|x| !(x.contains("?")))
...
.../src/main.rs:13:18: 13:72 note: ...but borrowed value is only valid for the block at 13:17
.../src/main.rs:13 .map(|x| x.as_slice().trim_chars('\n').split_str(",").collect()).collect();
I don't understand how the lifetimes of the string types in Rust are supposed to be used in this context. Changing instances to Vec<Vec<String>> doesn't fix the problem either.
What's extra confusing to me is that following works with a single String:
let x: Vec<&str> = some_string.as_slice().trim_chars('\n').split_str(",").collect();
What am I doing wrong with the lifetime of these values to cause this compiler error?
If iterator adapters are not an idiomatic approach to this problem please explain why and how I should approach this differently.
The x &str is a reference to the contents of a String, the ones yielded by lines(). A &str can only live as long as the String it is a reference into, and you’re not storing the String anywhere. You would need to either store the lines in another variable:
let lines = file.lines().map(|x| x.unwrap()).collect::<Vec<_>>();
let instances: Vec<Vec<&str>> = lines.iter()
.filter(|x| !(x.contains("?")))
.map(|x| x.trim_chars('\n').split_str(",").collect()).collect();
Or else you would convert all of the &strs into Strings:
let instances: Vec<Vec<String>> = file.lines()
.map(|x| x.unwrap())
.filter(|x| !(x.contains("?")))
.map(|x| x.trim_chars('\n').split_str(",")
.map(|x| x.into_string()).collect()).collect();
As an incidental note, the collect() calls can be written collect::<Vec<_>>(), allowing you to remove the type annotation from the instances variable. Which is better? Up to you to decide.

Resources