String concatenation in rust - rust

I am trying to get a &str and &str to concatenate in a for loop withe intention of using the new combined string after a number of parts have been added to it. A general layout of the for loop can be seen below but I am having a lot of trouble combining strings due to numerous errors.
for line in reader.lines() {
let split_line = line.unwrap().split(",");
let mut edited_line = "";
for word in split_line {
if !word.contains("substring") {
let test_string = [edited_line, word].join(",");
edited_line = &test_string;
}
}
let _ = writeln!(outfile, "{}", edited_line).expect("Unable to write to file");
}
First error:
error[E0716]: temporary value dropped while borrowed
Comes when running the above.
Second error:
error[E0308]: mismatched types expected &str, found struct std::string::String
happens when you remove the & from test_string when it is assigned to edited_line
Note: format! and concat! macros both also give error 2.
It seems to be if I get error 2 and convert the std::string:String and convert it to &str I get the error stating the variables don't live long enough.
How am I supposed to go about building a string of many parts?

Note that Rust has two string types, String and &str (actually, there are more, but that's irrelevant here).
String is an owned string and can grow and shrink dynamically.
&str is a borrowed string and is immutable.
Calling [edited_line, word].join(",") creates a new String, which is allocated on the heap. edited_line = &test_string then borrows the String and implicitly converts it to a &str.
The problem is that its memory is freed as soon as the owner (test_string) goes out of scope, but the borrow lives longer than test_string. This is fundamentally impossible in Rust, since it would otherwise be a use-after-free bug.
The correct and most efficient way to do this is to create an empty String outside of the loop and only append to it in the loop:
let mut edited_line = String::new();
for word in split_line {
if !word.contains("substring") {
edited_line.push(',');
edited_line.push_str(word);
}
}
Note that the resulting string will start with a comma, which might not be desired. To avoid it, you can write
let mut edited_line = String::new();
for word in split_line {
if !word.contains("substring") {
if !edited_line.is_empty() {
edited_line.push(',');
}
edited_line.push_str(word);
}
}
This could be done more elegantly with the itertools crate, which provides a join method for iterators:
use itertools::Itertools;
let edited_line: String = line
.unwrap()
.split(",")
.filter(|word| !word.contains("substring"))
.join(",");

let mut edited_line = ""; makes edited_line a &str with a static lifetime.
To actually make edited_line a string, either append .to_owned(), or use String::new():
let mut edited_line = String::new();
// Or
let mut edited_line = "".to_owned();
See What are the differences between Rust's `String` and `str`? if you are unfamiliar with the differences.
Most importantly for your case, you can't extend a &str, but you can extend a String.
Once you switched edited_line to a String, using the method of setting edited_line to [edited_line, word].join(","); works:
for line in reader.lines() {
let split_line = line.unwrap().split(",");
let mut edited_line = String::new();
for word in split_line {
if !word.contains("substring") {
let test_string = [edited_line.as_str(), word].join(","); // Added .as_str() to edited_line
edited_line = test_string; // Removed the & here
}
}
let _ = writeln!(outfile, "{}", edited_line).expect("Unable to write to file");
}
Playground
However, this is both not very efficient, nor ergonomic. Also it has the (probably unintended) result of prepending each line with a ,.
Here is an alternative that uses only one String instance:
for line in reader.lines() {
let split_line = line.unwrap().split(",");
let mut edited_line = String::new();
for word in split_line {
if !word.contains("substring") {
edited_line.push(',');
edited_line.push_str(word);
}
}
let _ = writeln!(outfile, "{}", edited_line).expect("Unable to write to file");
}
This still prepends the , character before each line however. You can probably fix that by checking if edited_line is not empty before pushing the ,.
Playground
The third option is to change the for loop into an iterator:
for line in reader.lines() {
let edited_line = line.split(",")
.filter(|word| !word.contains("substring"))
.collect::<Vec<&str>>() // Collecting allows us to use the join function.
.join(",");
let _ = writeln!(outfile, "{}", edited_line).expect("Unable to write to file");
}
Playground
This way we can use the join function as intended, neatly eliminating the initial , at the start of each line.
PS: If you have trouble knowing what types each variable is, I suggest using an IDE like Intellij-rust, which shows type hints for each variable as you write them.

Related

Populating a Hashmap with a vector of string slices in rust

I've been pulling my hair out with this one.
I apologize in advance if it's a poorly worded question.
So, I have a Hashmap in the outer scope and want to populate it with string slices.
// Hashmap declaration.
let mut words: std::collections::HashMap< &str, std::vec::Vec<&str> > = std::collections::HashMap::new();
for file_name in ["conjuctions", "nouns", "verbs"].iter() { // Reading some file.
let file_content = std::fs::read_to_string("../wordlists/{file_name}.txt");
let mut fc = match file_content {
Ok(file_content) => file_content,
Err(_) => panic!("Failed to read the file: ../wordlists/{file_name}.txt"),
};
let mut wordlist_vec: Vec<&str> = fc.split("\n").collect();
words.insert( file_name, wordlist_vec );
}
println!(words["conjunctions"])
// Using it outside the above scope throws an error. That FC was dropped but still borrowed.
So basically, my question is, how can I use the hash map outside the scope for the loop above?
I think the issues emanate from using string slices (split returns slices ig) but I'm not too sure.
You simply need to use an owned String instead of &strs.
let mut words: HashMap<String, Vec<String>> = HashMap::new();
// ...
// We use map to change the elements of the iterator to owned Strings.
let mut wordlist_vec: Vec<String> = fc.split("\n").map(String::from).collect();
words.insert(file_name.to_string(), wordlist_vec);

Save command line argument to variable and use a default if it is missing or invalid

The idea here is simple but I have tried three different ways with different errors each time: read in a string as an argument, but if the string is invalid or the string isn't provided, use a default.
Can this be done using Result to detect a valid string or a panic?
The basic structure I expect:
use std::env;
use std::io;
fn main() {
let args: Vec<String> = args().collect();
let word: Result<String, Error> = &args[1].expect("Valid string");
let word: String = match word {
Ok(word) = word,
Err(_) = "World",
}
println!("Hello, {}", word);
}
So, there are a lot of issues in your code.
First and foremost, in a match statement, you do not use =, you use =>.
Additionally, your match statement returns something, which makes it not an executing block, but rather a returning block (those are not the official terms). That means that your blocks result is bound to a variable. Any such returning block must end with a semicolon.
So your match statement would become:
let word: String = match word {
Ok(word) => word,
Err(_) => ...,
};
Next, when you do use std::env, you do not import all of the functions from it into your namespace. All you do is that you create an alias, so that the compiler turns env::<something> intostd::env::<something> automatically.
Therefore, this needs to be changed:
let args: Vec<String> = env::args().collect();
The same problem exists in your next line. What is Error? Well, what you actually mean is io::Error, that is also not imported due to the same reasons stated above. You might be wondering now, how Result does not need to be imported. Well, it is because the Rust Team has decided on a certain set of functions and struct, which are automatically imported into every project. Error is not one of them.
let word: Result<String, io::Error> = ...;
The next part is wrong twice (or even thrice).
First of all, the operation [x] does not return a Result, it returns the value and panics if it is out-of-bounds.
Now, even if it was a result, this line would still be wrong. Why? Because you expect(...) the result. That would turn any Result into its value.
Now, what you are looking for is the .get(index) operation. It tries to get a value and if it fails, it returns None, so it returns an option. What is an option? It is like a result, but there is no error value. It must be noted that get() returns the option filled with a reference to the string.
The line should look something like this:
let word: Option<&String> = args.get(1);
Now you have two options to handle default values, but before we come to that, I need to tell you why your error value is wrong.
In Rust, there are two kinds of Strings.
There is ยด&str`, which you can create like this:
let a: &str = "Hello, World!";
These are immutable and non-borrowed strings stored on the stack. So you cannot just create a new one with arbitary values on the fly.
On the other hand, we have mutable and heap-allocated Strings.
let mut a: String = String::new();
a.push_str("Hello, World!");
// Or...
let b: String = String::from("Hello, World");
You store your arguments as a String, but in your match statement, you try to return a &str.
So, there are two ways to handle your error:
let word: Option<&String> = args.get(1);
let word: String = match word {
Some(word) => word.to_string(),
None => String::from("World"),
};
If you do not want to allocate that second string, you can also use
let word: Option<&String> = args.get(1);
let word: &str = match word {
Some(word) => word.as_str(),
None => "World",
};
The second option, unwrap_or
let args: Vec<String> = env::args().collect();
let default = &String::from("World");
let word: &String = args.get(1).unwrap_or(default);
println!("Hello, {}", word);
is a bit uglier, as it requires you to bind the default value to a variable. This will do what your match statement above does, but it's a bit prettier.
This works too:
let word: &str = args.get(1).unwrap_or(default);
So this is my favourite version of your program above:
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let default = &String::from("World");
let word: &str = args.get(1).unwrap_or(default);
println!("Hello, {}", word);
}
But this one works too:
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let word: Option<&String> = args.get(0);
let word: &str = match word {
Some(word) => word.as_str(),
None => "World",
};
println!("Hello, {}", word);
}

Splitting a Vec of strings into Vec<Vec<String>>

I am attempting to relearn data-science in rust.
I have a Vec<String> that includes a delimiter "|" and a new line "!end".
What I'd like to end up with is Vec<Vec<String>> that can be put into a 2D ND array.
I have this python Code:
file = open('somefile.dat')
lst = []
for line in file:
lst += [line.split('|')]
df = pd.DataFrame(lst)
SAMV2FinalDataFrame = pd.DataFrame(lst,columns=column_names)
And i've recreated it here in rust:
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let lines = lines_from_file(".dat");
let mut new_arr = vec![];
//Here i get a lines immitable borrow
for line in lines{
new_arr.push([*line.split("!end")]);
}
// here i get expeected closure found str
let x = lines.split("!end");
let array = Array::from(lines)
what i have: ['1','1','1','end!','2','2','2','!end']
What i need: [['1','1','1'],['2','2','2']]
Edit: also why when i turbo fish does it make it disappear on Stack Overflow?
I think part of the issue you ran into was due how you worked with arrays. For example, Vec::push will only add a single element so you would want to use Vec::extend instead. I also ran into a few cases of empty strings due to splitting by "!end" would leave trailing '|' on the ends of substrings. The errors were quite strange, I am not completely sure where the closure came from.
let lines = vec!["1|1|1|!end|2|2|2|!end".to_string()];
let mut new_arr = Vec::new();
// Iterate over &lines so we don't consume lines and it can be used again later
for line in &lines {
new_arr.extend(line.split("!end")
// Remove trailing empty string
.filter(|x| !x.is_empty())
// Convert each &str into a Vec<String>
.map(|x| {
x.split('|')
// Remove empty strings from ends split (Ex split: "|2|2|2|")
.filter(|x| !x.is_empty())
// Convert &str into owned String
.map(|x| x.to_string())
// Turn iterator into Vec<String>
.collect::<Vec<_>>()
}));
}
println!("{:?}", new_arr);
I also came up with this other version which should handle your use case better. The earlier approach dropped all empty strings, while this one should preserve them while correctly handling the "!end".
use std::io::{self, BufRead, BufReader, Read, Cursor};
fn split_data<R: Read>(buffer: &mut R) -> io::Result<Vec<Vec<String>>> {
let mut sections = Vec::new();
let mut current_section = Vec::new();
for line in BufReader::new(buffer).lines() {
for item in line?.split('|') {
if item != "!end" {
current_section.push(item.to_string());
} else {
sections.push(current_section);
current_section = Vec::new();
}
}
}
Ok(sections)
}
In this example, I used Read for easier testing, but it will also work with a file.
let sample_input = b"1|1|1|!end|2|2|2|!end";
println!("{:?}", split_data(&mut Cursor::new(sample_input)));
// Output: Ok([["1", "1", "1"], ["2", "2", "2"]])
// You can also use a file instead
let mut file = File::new("somefile.dat");
let solution: Vec<Vec<String>> = split_data(&mut file).unwrap();
playground link

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]

Creating a vector of strings using the new std::fs::File

Porting my code from old_io to the new std::io
let path = Path::new("src/wordslist/english.txt");
let display = path.display();
let mut file = match File::open(&path) {
// The `desc` field of `IoError` is a string that describes the error
Err(why) => panic!("couldn't open {}: {}", display,
Error::description(&why)),
Ok(file) => file,
};
let mut s = String::new();
match file.read_to_string(&mut s) {
Err(why) => panic!("couldn't read {}: {}", display,
Error::description(&why)),
Ok(s) => s,
};
let words: Vec<_> = s.words().collect();
So this works but requires me to have a mutable string s to read the file contents, and then use words().collect() to gather into into a vector,
Is there a way to read the contents of a file to a vector using something like words() WITHOUT reading it to the mutable buffer string first? My thought is that this would be more performant in situations where the collect() call might happen at a later point, or after a words().map(something).
Your approach has a problem. .words() operates on an &str (string slice) which needs a parent String to refer to. Your example works fine because the Vec produced by s.words().collect() resides in the same scope as s, so it won't outlive the source string. But if you want to move it somewhere else, you'll need to end up with a Vec<String> instead of a Vec<&str>, which I'm assuming you already want if you're concerned about intermediate buffers.
You do have some options. Here's two that I can think of.
You can iterate over the characters of the file and split on whitespace:
// `.peekable()` gives us `.is_empty()` for an `Iterator`
// `.chars()` yields a `Result<char, CharsError>` which needs to be dealt with
let mut chars = file.chars().map(Result::unwrap).peekable();
let mut words: Vec<String> = Vec::new();
while !chars.is_empty() {
// This needs a type hint because it can't rely on info
// from the following `if` block
let word: String = chars.take_while(|ch| !ch.is_whitespace()).collect();
// We'll have an empty string if there's more than one
// whitespace character between words
// (more than one because the first is eaten
// by the last iteration of `.take_while()`)
if !word.is_empty() {
words.push(word);
}
}
You can wrap the File object in a std::io::BufReader and read it line-by-line with the .lines() iterator:
let mut reader = BufReader::new(file);
let mut words = Vec::new();
// `.lines()` yields `Result<String, io::Error>` so we have to handle that.
// (it will not yield an EOF error, this is for abnormal errors during reading)
for line in reader.lines().map(Result::unwrap) {
words.extend(line.words().map(String::from_str));
}
// Or alternately (this may not work due to lifetime errors in `flat_map()`
let words: Vec<_> = reader.lines().map(Result::unwrap)
.flat_map(|line| line.words().map(String::from_str))
.collect();
It's up to you to decide which of the two solutions you prefer. The former is probably more efficient but maybe less intuitive. The latter is easier to read, especially the for-loop version, but allocates intermediate buffers.

Resources