Isn't passing `&str` bad for performance in Rust? [duplicate] - string

This question already has an answer here:
Is it more conventional to pass-by-value or pass-by-reference when the method needs ownership of the value?
(1 answer)
Closed 2 years ago.
In general it's suggested to accept &str instead of String in Rust.
Let's assume i have a couple of functions and a String instance:
use std::collections::HashMap;
fn do_something_str(string: &str) {
let mut map = HashMap::new();
map.insert(string.to_owned() /* copying (expensive)? */, "value");
}
fn do_something_string(string: String) {
let mut map = HashMap::new();
map.insert(string /* moving (cheap)? */, "value");
}
fn main() {
let string = String::from("123");
do_something_str(&string);
do_something_string(string);
}
Playground
Does copying happen in do_something_str() meaning it will be slower/higher temporary memory consumption?
PS. i know i don't have to call .to_owned() explicitly and the following will also work:
fn do_something_str(string: &str) {
let mut map = HashMap::new();
map.insert(string /* copying (expensive)? */, "value");
}
But since a hashmap owns keys i believe it will clone it implicitly. Please correct me, if i'm wrong.

In general it's suggested to accept &str instead of String in Rust.
Not quite. The general wisdom is to accept &str instead of &String. That is, when you already intend to operate on a reference, the more general type is considered to be better.
If you need an owned String, it is in fact wasteful to pass a reference only to immediately clone it. Accepting a String in the first place leaves the choice to the caller: if they need to keep a copy of the string for themselves, they can clone(). Else, the caller just moves the String into the HashMap and no cloning is involved.
But since a hashmap owns keys i believe it will clone it implicitly. Please correct me, if i'm wrong.
HashMap does not clone on insertion. That would require a Clone trait bound on HashMap.
HashMap<&str, T> would still "own" the key, but the key in this case is &str, not its owned equivalent (String). Naturally, this would prevent the HashMap from outliving the string that the key references. The following example fails the borrow check:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<&str, ()> = HashMap::new();
{
let key = String::from("foo");
map.insert(&key, ()); // error[E0597]: `key` does not live long enough
}
println!("{:?}", map.get("foo"));
}

Calling .to_owned() on a &str will indeed create an owned copy of the string and is associated with the corresponding performance/memory penaltly.
As you correctly noted, you can also have a HashMap<&str, V>, i.e. a HashMap whose keys are of type &str. This does not require cloning of the string.
This however means that the lifetime of the HashMap is limited by the lifetime of the &str.

Related

How to write a function that accepts a `Vec` of borrowed or owned elements?

In Rust, how can I pass a vector of owned objects to a function that expects a vector of borrowed objects? Is my only option to create a new vector?
What is the best practice for the signature of a function in which I care about the type of the contained generic of a struct but don't care about if it is borrowed or not?
Example situation:
fn using_vec_of_borrows(borrows: &Vec<&String>) {
//...
}
fn main() {
let foo: Vec<String> = Vec::new();
using_vec_of_borrows(&foo);
}
How could I write using_vec_of_borrows() so that it accepts a vector of borrowed strings or a vector of owned strings? Or if it was part of an external library, could I convert my vector of owned strings to a vector of borrowed strings without iterating over it?
Keep in mind here that String is just used as an example type. It could be anything.
Quoting #trent from this answer:
When you hear "generic over references and non-references", think Borrow<T>.
Making your function generic over elements that implement Borrow<String> will allow it to accept both a &Vec<String> and &Vec<&String>:
use std::borrow::Borrow;
fn using_vec_of_borrows<T: Borrow<String>>(borrows: &[T]) {
//...
}
fn main() {
let foo: Vec<String> = Vec::new();
using_vec_of_borrows(&foo);
let foo: Vec<&String> = Vec::new();
using_vec_of_borrows(&foo);
}
You can even change it to an iterator for more flexibility if the elements don't need to be contiguous.
Note: I changed the signature from &Vec<T> to &[T] since the latter is more idiomatic. And I would've changed Borrow<String> into Borrow<str> for a similar reason, but &String does not implement Borrow<str>.
See also:
Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?

`HashMap::get_mut` leading to "returns reference to local value", any efficient work-around?

There have been a fair number of questions around this, and the solution is mostly "use Entry".
However this is an issue because HashMap::entry() requires an owned value meaning possibly expensive copies / allocations even when the key is already present and we just want to update the value in-place, hence the use of get_mut. However the use of get_mut on a reference to a local leads rustc to assume that said reference gets stored into the hashmap, and thus that returning the hashmap is an error:
use std::borrow::Cow;
use std::collections::HashMap;
fn get_string() -> String { String::from("xxxxxxx") }
fn foo() -> HashMap<Cow<'static, str>, usize> {
let mut v = HashMap::new();
// stand-in for "get a string slice as key",
// real case is getting a String from an
// mpsc and the key being a segment of that string
let s = get_string();
// stand-in for a structure which contains an `Option<Cow>`
let k = Cow::from(&s[2..3]);
// because of get_mut, `&s` is apparently considered to be stored in `v`?
if let Some(e) = v.get_mut(&k) {
*e += 1;
} else {
v.insert(Cow::from(k.into_owned()), 0);
}
v
}
Note that the manipulations at lines 9~13 are there to clarify the point of the pattern, but get_mut alone is sufficient to trigger the issue
Is there a way around without the efficiency hit, or is an eager allocation the only way? (note: because this is a static issue, dynamic gates like contains_key or get obviously don't do anything).
According to the docs, HashSet::get_mut() requires a value of type &Q such that the key of the hash implements Borrow<Q>.
The key of your hash is Cow<'static, str>, that implements Borrow<str>. This means that you can use either a &Cow<'static, str> or a &str. But you are passing a &Cow<'local, str> for some 'local lifetime. The compiler tries to match that 'local with 'static and issues a somewhat confusing error message about lifetimes.
The solution is actually easy, because you can get an &str from the Cow either calling k.as_ref() or doing &*k, and the lifetime of the &str is unrestricted: (playground)
let k = Cow::from(&s[2..3]);
if let Some(e) = v.get_mut(k.as_ref()) { /* ...*/ }

Does PathBuf::from(&some_other_pathbuf) clone the data of some_other_pathbuf?

I'm writing a custom_rename function that receives a String and an immutable reference to a PathBuf:
fn custom_rename(new_name: String, old_path: &PathBuf) {
let mut new_path = PathBuf::from(&old_path);
new_path.pop();
new_path.push(new_name);
std::fs::rename(old_path, new_path).expect("error");
}
Does the PathBuf::from() function clone the data of old_path? According to The Rust Programming Language, Rustaceans try to avoid cloning.
Yes, a PathBuf owns the data. The only way to own the data when presented with a reference is to clone it.
I'd write this as
use std::{fs, path::Path};
fn custom_rename(new_name: &str, old_path: &Path) {
let mut new_path = old_path.to_owned();
new_path.pop();
new_path.push(new_name);
fs::rename(old_path, new_path).expect("error");
}
See also:
Is it more conventional to pass-by-value or pass-by-reference when the method needs ownership of the value?
Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?

Why is it discouraged to accept a reference &String, &Vec, or &Box as a function argument?

I wrote some Rust code that takes a &String as an argument:
fn awesome_greeting(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
I've also written code that takes in a reference to a Vec or Box:
fn total_price(prices: &Vec<i32>) -> i32 {
prices.iter().sum()
}
fn is_even(value: &Box<i32>) -> bool {
**value % 2 == 0
}
However, I received some feedback that doing it like this isn't a good idea. Why not?
TL;DR: One can instead use &str, &[T] or &T to allow for more generic code.
One of the main reasons to use a String or a Vec is because they allow increasing or decreasing the capacity. However, when you accept an immutable reference, you cannot use any of those interesting methods on the Vec or String.
Accepting a &String, &Vec or &Box also requires the argument to be allocated on the heap before you can call the function. Accepting a &str allows a string literal (saved in the program data) and accepting a &[T] or &T allows a stack-allocated array or variable. Unnecessary allocation is a performance loss. This is usually exposed right away when you try to call these methods in a test or a main method:
awesome_greeting(&String::from("Anna"));
total_price(&vec![42, 13, 1337])
is_even(&Box::new(42))
Another performance consideration is that &String, &Vec and &Box introduce an unnecessary layer of indirection as you have to dereference the &String to get a String and then perform a second dereference to end up at &str.
Instead, you should accept a string slice (&str), a slice (&[T]), or just a reference (&T). A &String, &Vec<T> or &Box<T> will be automatically coerced (via deref coercion) to a &str, &[T] or &T, respectively.
fn awesome_greeting(name: &str) {
println!("Wow, you are awesome, {}!", name);
}
fn total_price(prices: &[i32]) -> i32 {
prices.iter().sum()
}
fn is_even(value: &i32) -> bool {
*value % 2 == 0
}
Now you can call these methods with a broader set of types. For example, awesome_greeting can be called with a string literal ("Anna") or an allocated String. total_price can be called with a reference to an array (&[1, 2, 3]) or an allocated Vec.
If you'd like to add or remove items from the String or Vec<T>, you can take a mutable reference (&mut String or &mut Vec<T>):
fn add_greeting_target(greeting: &mut String) {
greeting.push_str("world!");
}
fn add_candy_prices(prices: &mut Vec<i32>) {
prices.push(5);
prices.push(25);
}
Specifically for slices, you can also accept a &mut [T] or &mut str. This allows you to mutate a specific value inside the slice, but you cannot change the number of items inside the slice (which means it's very restricted for strings):
fn reset_first_price(prices: &mut [i32]) {
prices[0] = 0;
}
fn lowercase_first_ascii_character(s: &mut str) {
if let Some(f) = s.get_mut(0..1) {
f.make_ascii_lowercase();
}
}
In addition to Shepmaster's answer, another reason to accept a &str (and similarly &[T] etc) is because of all of the other types besides String and &str that also satisfy Deref<Target = str>. One of the most notable examples is Cow<str>, which lets you be very flexible about whether you are dealing with owned or borrowed data.
If you have:
fn awesome_greeting(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
But you need to call it with a Cow<str>, you'll have to do this:
let c: Cow<str> = Cow::from("hello");
// Allocate an owned String from a str reference and then makes a reference to it anyway!
awesome_greeting(&c.to_string());
When you change the argument type to &str, you can use Cow seamlessly, without any unnecessary allocation, just like with String:
let c: Cow<str> = Cow::from("hello");
// Just pass the same reference along
awesome_greeting(&c);
let c: Cow<str> = Cow::from(String::from("hello"));
// Pass a reference to the owned string that you already have
awesome_greeting(&c);
Accepting &str makes calling your function more uniform and convenient, and the "easiest" way is now also the most efficient. These examples will also work with Cow<[T]> etc.
The recommendation is using &str over &String because &str also satisfies &String which could be used for both owned strings and the string slices but not the other way around:
use std::borrow::Cow;
fn greeting_one(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
fn greeting_two(name: &str) {
println!("Wow, you are awesome, {}!", name);
}
fn main() {
let s1 = "John Doe".to_string();
let s2 = "Jenny Doe";
let s3 = Cow::Borrowed("Sally Doe");
let s4 = Cow::Owned("Sally Doe".to_string());
greeting_one(&s1);
// greeting_one(&s2); // Does not compile
// greeting_one(&s3); // Does not compile
greeting_one(&s4);
greeting_two(&s1);
greeting_two(s2);
greeting_two(&s3);
greeting_two(&s4);
}
Using vectors to manipulate text is never a good idea and does not even deserve discussion because you will loose all the sanity checks and performance optimizations. String type uses vector internally anyway. Remember, Rust uses UTF-8 for strings for storage efficiency. If you use vector, you have to repeat all the hard work. Other than that, borrowing vectors or boxed values should be OK.
Because those types can be coerced, so if we use those types functions will accept less types:
1- a reference to String can be coerced to a str slice. For example create a function:
fn count_wovels(words:&String)->usize{
let wovels_count=words.chars().into_iter().filter(|x|(*x=='a') | (*x=='e')| (*x=='i')| (*x=='o')|(*x=='u')).count();
wovels_count
}
if you pass &str, it will not be accepted:
let name="yilmaz".to_string();
println!("{}",count_wovels(&name));
// this is not allowed because argument should be &String but we are passing str
// println!("{}",wovels("yilmaz"))
But if that function accepts &str instead
// words:&str
fn count_wovels(words:&str)->usize{ ... }
we can pass both types to the function
let name="yilmaz".to_string();
println!("{}",count_wovels(&name));
println!("{}",wovels("yilmaz"))
With this, our function can accept more types
2- Similary, a reference to Box &Box[T], will be coerced to the reference to the value inside the Box Box[&T]. for example
fn length(name:&Box<&str>){
println!("lenght {}",name.len())
}
this accepts only &Box<&str> type
let boxed_str=Box::new("Hello");
length(&boxed_str);
// expected reference `&Box<&str>` found reference `&'static str`
// length("hello")
If we pass &str as type, we can pass both types
3- Similar relation exists between ref to a Vec and ref to an array
fn square(nums:&Vec<i32>){
for num in nums{
println!("square of {} is {}",num,num*num)
}
}
fn main(){
let nums=vec![1,2,3,4,5];
let nums_array=[1,2,3,4,5];
// only &Vec<i32> is accepted
square(&nums);
// mismatched types: mismatched types expected reference `&Vec<i32>` found reference `&[{integer}; 5]`
//square(&nums_array)
}
this will work for both types
fn square(nums:&[i32]){..}

How would I create and use a string to string Hashmap in Rust?

How would I idiomatically create a string to string hashmap in rust. The following works, but is it the right way to do it? is there a different kind of string I should be using?
use std::collections::hashmap::HashMap;
//use std::str;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{0}", mymap["foo".to_string()]);
}
Assuming you would like the flexibility of String, HashMap<String, String> is correct. The other choice is &str, but that imposes significant restrictions on how the HashMap can be used/where it can be passed around; but if it it works, changing one or both parameter to &str will be more efficient. This choice should be dictated by what sort of ownership semantics you need, and how dynamic the strings are, see this answer and the strings guide for more.
BTW, searching a HashMap<String, ...> with a String can be expensive: if you don't already have one, it requires allocating a new String. We have a work around in the form of find_equiv, which allows you to pass a string literal (and, more generally, any &str) without allocating a new String:
use std::collections::HashMap;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{}", mymap.find_equiv(&"foo"));
println!("{}", mymap.find_equiv(&"not there"));
}
playpen (note I've left the Option in the return value, one could call .unwrap() or handle a missing key properly).
Another slightly different option (more general in some circumstances, less in others), is the std::string::as_string function, which allows viewing the data in &str as if it were a &String, without allocating (as the name suggests). It returns an object that can be dereferenced to a String, e.g.
use std::collections::HashMap;
use std::string;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{}", mymap[*string::as_string("foo")]);
}
playpen
(There is a similar std::vec::as_vec.)
Writing this answer for future readers. huon's answer is correct at the time but *_equiv methods were purged some time ago.
The HashMap documentation provides an example on using String-String hashmaps, where &str can be used.
The following code will work just fine. No new String allocation necessary:
use std::collections::HashMap;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{0}", mymap["foo"]);
println!("{0}", mymap.get("foo").unwrap());
}

Resources