vector of string slices goes out of scope but original string remains, why is checker saying there is an error? - rust

Beginner at rust here. I understand why the code below has an error. test(x) creates y then returns a value that references the &str owned by y. y is destroyed as it goes out of scope so it can't do that.
Here's my issue the thing is the &str owned by y is actually a slice of x that has NOT went out of scope yet... so technically the reference should still work.
enum TestThing<'a> {
Blah(&'a str)
}
fn test(x: &str) -> Vec<TestThing> {
let y = x.split(" ").collect::<Vec<&str>>();
parse(&y)
}
fn parse<'a>(x: &'a Vec<&str>) -> Vec<TestThing<'a>> {
let mut result: Vec<TestThing> = vec![];
for v in x {
result.push(TestThing::Blah(v));
}
result
}
Is the checker just being over-zealous here? Is there a method around this? Am I missing something? Is this just something to do with split? I also tried cloning v, and that didn't work either.

Move the lifetime here: x: &'a Vec<&str> -> x: &Vec<&'a str>.
P.S. Using a slice (&[&'a str]) would be better, since it's smaller and more flexible, see Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?. Some kind of impl Iterator or impl IntoIterator would be even more flexible.

Related

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?
The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.
TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

rust lifetimes and borrow checker

I'm reading through the "Learn rust" tutorial, and I'm trying to understand lifetimes. Chapter 10-3 has the following non-working example:
fn longest(x: &str, y: &str) -> &str {
if x.len() > y.len() {
x
} else {
y
}
}
The paragraph below explains the error as :
When we’re defining this function, we don’t know the concrete values that will be passed into this function, so we don’t know whether the if case or the else case will execute.
However, if I change the block of code to do something else; say, print x and return y; so that we know what is being returned each time, the same error occurs. Why?
fn longest(x: &str, y: &str) -> &str {
println!("{}", x);
y
}
It book also says :
The borrow checker can’t determine this either, because it doesn’t know how the lifetimes of x and y relate to the lifetime of the return value.
My doubts are:
Is the borrow checker capable of tracking lifetimes across functions? If so, can you please provide an example?
I don't understand the error. x and y are references passed into longest, so the compiler should know that its owner is elsewhere(and that its lifetime would continue beyond longest). When the compiler sees that the return values are either x or y, why is there a confusion on lifetimes?
Think of the function as a black box. Neither you, nor the compiler knows what happens inside. You may say that the compiler "knows", but that's not really true. Imagine that it returns X or Y based on the result of a remote HTTP call. How can it know in advance ?
Yet it needs to provide some guarantee that the returned reference is safe to use. That works by forcing you (i.e. the developer) to explicitly specify the relationships between the input parameters and the returned value.
First you need to specify the lifetimes of the parameters. I'll use 'x, for x, 'y for y and 'r for the result. Thus our function will look like:
fn longest<'x, 'y, 'r>(x: &'x str, y: &'y str) -> &'r str
But this is not enough. We still need to tell the compiler what the relationships are. There are two way to do it (the magic syntax will be explained later):
Inside the <> brackets like that: <'a, 'b: 'a>
In a where clause like that: where 'b: 'a
Both options are the same but the where clause will be more readable if you have a large number of generic parameters.
Back to the problem. We need to tell the compiler that 'r depends on both 'x and 'y and that it will be valid as long as they are valid. We can do that by saying 'long: 'short which translates to "lifetime 'long must be valid at least as long as lifetime 'short".
Thus we need to modify our function like that:
fn longest<'x, 'y, 'r>(x: &'x str, y: &'y str) -> &'r str
where
'x: 'r,
'y: 'r,
{
if x.len() > y.len() {
x
} else {
y
}
}
I.e. we are saying that our returned value will not outlive the function parameters, thus preventing a "use after free" situation.
PS: In this example you can actually do it with only one lifetime parameter, as we are not interested in the relationship between them. In this case the lifetime will be the smaller one of x/y:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str

Partial application at compile-time in Rust?

I have a function that takes two parameters (let's say two strings):
fn foo(x: String, y: String) -> String {
x + y
}
I always know x at compile-time, but I do not know y until run-time.
How can I write this for maximum efficiency, without copy-pasting the function for each x?
Note that your function foo currently requires two heap-allocated strings unnecessarily. Here is another version, which is more versatile and efficient (although YMMV as I will describe next):
fn foo<T>(x: T, y: &str) -> String
where
T: Into<String>,
{
x.into() + y
}
assert_eq!(foo("mobile ", "phones"), "mobile phones");
Concatenation will almost always require a memory allocation somewhere, but this one can take heap-allocated strings as well as arbitrary string slices.
It can also avoid a reallocation if the capacity of x is large enough, although this is not very likely to be the case, considering that x is obtained from a string known at compile time. String::insert_str would have allowed us to revert the position of the type parameter, but an insertion at the front of the string has an O(n) cost. Knowing the first operand of a string concatenation a priori is not very beneficial to the compiler in terms of what optimizations it can employ.
Let us assume that we still want to perform a partial function at compile time. This seems to be another use case where const generics would shine. With this feature, one could indeed monomorphize this function over a &'static str. As of nightly-2022-06-29, being able to use a &'static str as a const parameter is still unstable, but the code below compiles and works as intended. This feature is tracked in issue 95174.
#![feature(adt_const_params)]
fn foo<const X: &'static str>(y: &str) -> String {
X.to_string() + y
}
let s = "string".to_string();
println!("{}", foo::<"I am ">(&s));
Alas, const generics applied to string slices are still unstable, and not quite ready for this yet. Albeit less ergonomic, we can instead replicate the effect of instancing one function for each string literal with rule-based macros:
macro_rules! define_foo {
($fname: ident, $x: literal) => {
fn $fname (y: &str) -> String {
$x.to_string() + y
}
}
}
Using:
define_foo!(bar, "Conan ");
assert_eq!(bar("Osíris"), "Conan Osíris");
See also:
Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?
How do I concatenate strings?
I was able to do it in nightly using a const function returning closure:
#![feature(const_fn)]
fn foo(x: String, y: String) -> String {
x + &y
}
const fn foo_applied(x: String) -> impl Fn(String) -> String {
move |y| foo(x.clone(), y)
}
fn main() {
let foo_1 = foo_applied("1 ".into());
println!("{}", foo_1("2".into()));
let foo_2 = foo_applied("2 ".into());
println!("{}", foo_2("1".into()));
}
Playground

How do I fix a missing lifetime specifier?

I have a very simple method. The first argument takes in vector components ("A", 5, 0) and I will compare this to every element of another vector to see if they have the same ( _ , 5 , _) and then print out the found element's string.
Comparing ("A", 5, 0 ) and ("Q", 5, 2) should print out Q.
fn is_same_space(x: &str, y1: i32, p: i32, vector: &Vec<(&str, i32, i32)>) -> (&str) {
let mut foundString = "";
for i in 0..vector.len() {
if y1 == vector[i].1 {
foundString = vector[i].0;
}
}
foundString
}
However, I get this error
error[E0106]: missing lifetime specifier
--> src/main.rs:1:80
|
1 | fn is_same_space(x: &str, y1: i32, p: i32, vector: &Vec<(&str, i32, i32)>) -> (&str) {
| ^ expected lifetime parameter
|
= help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or one of `vector`'s 2 elided lifetimes
By specifying a lifetime:
fn is_same_space<'a>(x: &'a str, y1: i32, p: i32, vector: &'a Vec<(&'a str, i32, i32)>) -> (&'a str)
This is only one of many possible interpretations of what you might have meant for the function to do, and as such it's a very conservative choice - it uses a unified lifetime of all the referenced parameters.
Perhaps you wanted to return a string that lives as long as x or as long as vector or as long as the strings inside vector; all of those are potentially valid.
I strongly recommend that you go back and re-read The Rust Programming Language. It's free, and aimed at beginners to Rust, and it covers all the things that make Rust unique and are new to programmers. Many people have spent a lot of time on this book and it answers many beginner questions such as this one.
Specifically, you should read the chapters on:
ownership
references and borrowing
lifetimes
There's even a second edition in the works, with chapters like:
Understanding Ownership
Generic Types, Traits, and Lifetimes
For fun, I'd rewrite your code using iterators:
fn is_same_space<'a>(y1: i32, vector: &[(&'a str, i32, i32)]) -> &'a str {
vector.iter()
.rev() // start from the end
.filter(|item| item.1 == y1) // element that matches
.map(|item| item.0) // first element of the tuple
.next() // take the first (from the end)
.unwrap_or("") // Use a default value
}
Removed the unneeded parameters.
Using an iterator avoids the overhead of bounds checks, and more clearly exposes your intent.
Why is it discouraged to accept a reference to a String (&String) or Vec (&Vec) as a function argument?
Rust does not use camelCase variable names.
I assume that you do want to return the string from inside vector.
Remove the redundant parens on the return type
So the problem comes from the fact that vector has two inferred lifetimes, one for vector itself (the &Vec part) and one for the &str inside the vector. You also have an inferred lifetime on x, but that really inconsequential.
To fix it, just specify that the returned &str lives as long as the &str in the vector:
fn is_same_space<'a>( // Must declare the lifetime here
x: &str, // This borrow doesn't have to be related (x isn't even used)
y1: i32, // Not borrowed
p: i32, // Not borrowed or used
vector: &'a Vec<(&'a str, i32, i32)> // Vector and some of its data are borrowed here
) -> &'a str { // This tells rustc how long the return value should live
...
}

Extend lifetime of variable

I'm trying to return a slice from a vector which is built inside my function. Obviously this doesn't work because v's lifetime expires too soon. I'm wondering if there's a way to extend v's lifetime. I want to return a plain slice, not a vector.
pub fn find<'a>(&'a self, name: &str) -> &'a[&'a Element] {
let v: Vec<&'a Element> = self.iter_elements().filter(|&elem| elem.name.borrow().local_name == name).collect();
v.as_slice()
}
You can't forcibly extend a value's lifetime; you just have to return the full Vec. If I may ask, why do you want to return the slice itself? It is almost always unnecessary, since a Vec can be cheaply (both in the sense of easy syntax and low-overhead at runtime) coerced to a slice.
Alternatively, you could return the iterator:
use std::iter;
pub fn find<'a>(&'a self, name: &str) -> Box<Iterator<Item = &'a Element> + 'a> {
Box::new(self.iter_elements()
.filter(move |&elem| elem.name.borrow().local_name == name))
}
For now, you will have to use an iterator trait object, since closure have types that are unnameable.
NB. I had to change the filter closure to capture-by-move (the move keyword) to ensure that it can be returned, or else the name variable would just passed into the closure pointer into find's stack frame, and hence would be restricted from leaving find.

Resources