Convert Vec<String> into a slice of &str in Rust? - string

Per Steve Klabnik's writeup in the pre-Rust 1.0 documentation on the difference between String and &str, in Rust you should use &str unless you really need to have ownership over a String. Similarly, it's recommended to use references to slices (&[]) instead of Vecs unless you really need ownership over the Vec.
I have a Vec<String> and I want to write a function that uses this sequence of strings and it doesn't need ownership over the Vec or String instances, should that function take &[&str]? If so, what's the best way to reference the Vec<String> into &[&str]? Or, is this coercion overkill?

You can create a function that accepts both &[String] and &[&str] using the AsRef trait:
fn test<T: AsRef<str>>(inp: &[T]) {
for x in inp { print!("{} ", x.as_ref()) }
println!("");
}
fn main() {
let vref = vec!["Hello", "world!"];
let vown = vec!["May the Force".to_owned(), "be with you.".to_owned()];
test(&vref);
test(&vown);
}

This is actually impossible without either memory allocation or per-element call1.
Going from String to &str is not just viewing the bits in a different light; String and &str have a different memory layout, and thus going from one to the other requires creating a new object. The same applies to Vec and &[]
Therefore, whilst you can go from Vec<T> to &[T], and thus from Vec<String> to &[String], you cannot directly go from Vec<String> to &[&str]. Your choices are:
either accept &[String]
allocate a new Vec<&str> referencing the first Vec, and convert that into a &[&str]
As an example of the allocation:
fn usage(_: &[&str]) {}
fn main() {
let owned = vec![String::new()];
let half_owned: Vec<_> = owned.iter().map(String::as_str).collect();
usage(&half_owned);
}
1 Using generics and the AsRef<str> bound as shown in #aSpex's answer you get a slightly more verbose function declaration with the flexibility you were asking for, but you do have to call .as_ref() in all elements.

Related

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?
The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.
TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

How to look up values in a HashMap<Option<String>, V> without copying?

I have a HashMap with Option<String> keys; is it possible to do lookups with a key of type Option<&str>? I know I can use a &str to lookup in a HashMap<String, V> because str implements Borrow<String>.
Do I have to convert to an owned string just to do a lookup?
It's slightly less efficient, but you could use Cow here. It avoids the problem with the Borrow trait by instead having a single type that can represent either a reference or an owned value, like so:
use std::borrow::Cow;
use std::collections::HashMap;
fn main() {
let mut map = HashMap::<Option<Cow<'static, str>>, i32>::new();
map.insert(None, 5);
map.insert(Some(Cow::Borrowed("hello")), 10);
map.insert(Some(Cow::Borrowed("world")), 15);
// works with None and constant string slices...
assert_eq!(map.get(&None), Some(&5));
assert_eq!(map.get(&Some(Cow::Borrowed("hello"))), Some(&10));
// ...and also works with heap-allocated strings, without copies
let stack = String::from("world");
assert_eq!(map.get(&Some(Cow::Borrowed(&stack))), Some(&15));
}

Can I transform the strings in a [&'static str; N] into a &[&str] without creating multiple temporary values? [duplicate]

Per Steve Klabnik's writeup in the pre-Rust 1.0 documentation on the difference between String and &str, in Rust you should use &str unless you really need to have ownership over a String. Similarly, it's recommended to use references to slices (&[]) instead of Vecs unless you really need ownership over the Vec.
I have a Vec<String> and I want to write a function that uses this sequence of strings and it doesn't need ownership over the Vec or String instances, should that function take &[&str]? If so, what's the best way to reference the Vec<String> into &[&str]? Or, is this coercion overkill?
You can create a function that accepts both &[String] and &[&str] using the AsRef trait:
fn test<T: AsRef<str>>(inp: &[T]) {
for x in inp { print!("{} ", x.as_ref()) }
println!("");
}
fn main() {
let vref = vec!["Hello", "world!"];
let vown = vec!["May the Force".to_owned(), "be with you.".to_owned()];
test(&vref);
test(&vown);
}
This is actually impossible without either memory allocation or per-element call1.
Going from String to &str is not just viewing the bits in a different light; String and &str have a different memory layout, and thus going from one to the other requires creating a new object. The same applies to Vec and &[]
Therefore, whilst you can go from Vec<T> to &[T], and thus from Vec<String> to &[String], you cannot directly go from Vec<String> to &[&str]. Your choices are:
either accept &[String]
allocate a new Vec<&str> referencing the first Vec, and convert that into a &[&str]
As an example of the allocation:
fn usage(_: &[&str]) {}
fn main() {
let owned = vec![String::new()];
let half_owned: Vec<_> = owned.iter().map(String::as_str).collect();
usage(&half_owned);
}
1 Using generics and the AsRef<str> bound as shown in #aSpex's answer you get a slightly more verbose function declaration with the flexibility you were asking for, but you do have to call .as_ref() in all elements.

Generic operation on slice of Cow<str>

I'm trying to implement the following code, which removes the prefix from a slice of Cow<str>'s.
fn remove_prefix(v: &mut [Cow<str>], prefix: &str) {
for t in v.iter_mut() {
match *t {
Borrowed(&s) => s = s.trim_left_matches(prefix),
Owned(s) => s = s.trim_left_matches(prefix).to_string(),
}
}
}
I have two questions:
I can't get this to compile - I've tried loads of combinations of &'s and *'s but to no avail.
Is there a better way to apply functions to a Cow<str> without having to match it to Borrowed and Owned every time. I mean it seems like I should just be able to do something like *t = t.trim_left_matches(prefix) and if t is a Borrowed(str) it leaves it as a str (since trim_left_matches allows that), and if it is an Owned(String) it leaves it as a String. Similarly for replace() it would realise it has to convert both to a String (since you can't use replace() on a str). Is something like that possible?
Question #1 strongly implies how you think pattern matching and/or pointers work in Rust doesn't quite line up with how they actually work. The following code compiles:
fn remove_prefix(v: &mut [Cow<str>], prefix: &str) {
use std::borrow::Cow::*;
for t in v.iter_mut() {
match *t {
Borrowed(ref mut s) => *s = s.trim_left_matches(prefix),
Owned(ref mut s) => *s = s.trim_left_matches(prefix).to_string(),
}
}
}
If your case, Borrowed(&s) is matched against Borrowed(&str), meaning that s is of type str. This is impossible: you absolutely cannot have a variable of a dynamically sized type. It's also counter-productive. Given that you want to modify s, binding to it by value won't help at all.
What you want is to modify the thing contained in the Borrowed variant. This means you want a mutable pointer to that storage location. Hence, Borrowed(ref mut s): this is not destructuring the value inside the Borrowed at all. Rather, it binds directly to the &str, meaning that s is of type &mut &str; a mutable pointer to a (pointer to a str). In other words: a mutable pointer to a string slice.
At that point, mutating the contents of the Borrowed is done by re-assigning the value through the mutable pointer: *s = ....
Finally, the exact same reasoning applies to the Owned case: you were trying to bind by-value, then mutate it, which cannot possibly do what you want. Instead, bind by mutable pointer to the storage location, then re-assign it.
As for question #2... not really. That would imply some kind of overloading, which Rust doesn't do (by deliberate choice). If you are doing this a lot, you could write an extension trait that adds methods of interest to Cow.
You can definitely do it.
fn remove_prefix(v: &mut [Cow<str>], prefix: &str) {
for t in v.iter_mut() {
match *t {
Cow::Borrowed(ref mut s) => *s = s.trim_left_matches(prefix),
Cow::Owned(ref mut s) => *s = s.trim_left_matches(prefix).to_string(),
}
}
}
ref mut s means “take a mutable reference to the value and call it s” in a pattern. Thus you have s of type &mut &str or &mut String. You must then use *s =  in order to change what that mutable reference is pointing to (thus, change the string inside the Cow).

Extend lifetime of variable

I'm trying to return a slice from a vector which is built inside my function. Obviously this doesn't work because v's lifetime expires too soon. I'm wondering if there's a way to extend v's lifetime. I want to return a plain slice, not a vector.
pub fn find<'a>(&'a self, name: &str) -> &'a[&'a Element] {
let v: Vec<&'a Element> = self.iter_elements().filter(|&elem| elem.name.borrow().local_name == name).collect();
v.as_slice()
}
You can't forcibly extend a value's lifetime; you just have to return the full Vec. If I may ask, why do you want to return the slice itself? It is almost always unnecessary, since a Vec can be cheaply (both in the sense of easy syntax and low-overhead at runtime) coerced to a slice.
Alternatively, you could return the iterator:
use std::iter;
pub fn find<'a>(&'a self, name: &str) -> Box<Iterator<Item = &'a Element> + 'a> {
Box::new(self.iter_elements()
.filter(move |&elem| elem.name.borrow().local_name == name))
}
For now, you will have to use an iterator trait object, since closure have types that are unnameable.
NB. I had to change the filter closure to capture-by-move (the move keyword) to ensure that it can be returned, or else the name variable would just passed into the closure pointer into find's stack frame, and hence would be restricted from leaving find.

Resources