I wish to extract all named groups from a match into a HashMap and I'm running into a "does not live long enough" error while trying to compile this code:
extern crate regex;
use std::collections::HashMap;
use regex::Regex;
pub struct Route {
regex: Regex,
}
pub struct Router<'a> {
pub namespace_seperator: &'a str,
routes: Vec<Route>,
}
impl<'a> Router<'a> {
// ...
pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
for route in &self.routes {
if route.regex.is_match(path) {
let mut hash = HashMap::new();
for cap in route.regex.captures_iter(path) {
for (name, value) in cap.iter_named() {
hash.insert(name, value.unwrap());
}
}
return Some(hash);
}
}
None
}
}
fn main() {}
Here's the error output:
error: `cap` does not live long enough
--> src/main.rs:23:42
|>
23 |> for (name, value) in cap.iter_named() {
|> ^^^
note: reference must be valid for the anonymous lifetime #1 defined on the block at 18:79...
--> src/main.rs:18:80
|>
18 |> pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
|> ^
note: ...but borrowed value is only valid for the for at 22:16
--> src/main.rs:22:17
|>
22 |> for cap in route.regex.captures_iter(path) {
|> ^
Obviously I still have a thing or two to learn about Rust lifetimes.
Let's follow the lifetime lines:
route.regex.captures_iter(path) creates a FindCapture<'r, 't> where the lifetime 'r is that of route.regex and the lifetime 't is that of path
this iterator yields a Captures<'t>, only linked to the lifetime of path
whose method iter_named(&'t self) yields a SubCapture<'t> itself linked to the lifetime of path and the lifetime of the cap
this iterator yields a (&'t str, Option<&'t str>) so that both keys and values of the HashMap are linked to the lifetime of path and the lifetime of the cap
Therefore, it is unfortunately impossible to have the HashMap outlive the cap variable as this variable is used by the code as a "marker" to keep the buffers containing the groups alive.
I am afraid that the only solution without significant re-structuring is to return a HashMap<String, String>, as unsatisfying as it is. It also occurs to me that a single capture group may match multiple times, not sure if you want to bother with this.
Matthieu M. already explained the lifetime situation well. The good news is that the regex crate recognized the problem and there's a fix in the pipeline for 1.0.
As stated in the commit message:
It was always possible to work around this by using indices.
It is also possible to work around this by using Regex::capture_names, although it's a bit more nested this way:
pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
for route in &self.routes {
if let Some(captures) = route.regex.captures(path) {
let mut hash = HashMap::new();
for name in route.regex.capture_names() {
if let Some(name) = name {
if let Some(value) = captures.name(name) {
hash.insert(name, value);
}
}
}
return Some(hash);
}
}
None
}
Note that I also removed the outer is_match — it's inefficient to run the regex once and then again.
Related
Context
Link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9a9ffa99023735f4fbedec09e1c7ac55
Here's a contrived repro of what I'm running into
fn main() {
let mut s = String::from("Hello World");
example(&mut s);
}
fn example(s: &mut str) -> Option<String> {
other_func(Some(s.to_owned()))
// other random mutable stuff happens
}
fn other_func(s: Option<String>) {
match s {
Some(ref s) => other_func2(*s),
None => panic!()
}
}
fn other_func2(s: String) {
println!("{}", &s)
}
and the error
Compiling playground v0.0.1 (/playground)
error[E0507]: cannot move out of `*s` which is behind a shared reference
--> src/main.rs:12:36
|
12 | Some(ref s) => other_func2(*s),
| ^^ move occurs because `*s` has type `String`, which does not implement the `Copy` trait
Question
In the following code, why can't I deference the &String without having to do some sort of clone/copy? i.e. this doesn't work
fn other_func(s: Option<String>) {
match s {
Some(ref s) => other_func2(*s),
None => panic!()
}
}
but it works if I replace *s with s.to_owned()/s.to_string()/s.clone()
As an aside, I understand this can probably be solved by refactoring to use &str, but I'm specifically interested in turning &String -> String
Why would the compiler allow you to?
s is &String. And you cannot get a String from a &String without cloning. That's obvious.
And the fact that it was created from an owned String? The compiler doesn't care, and it is right. This is not different from the following code:
let s: String = ...;
let r: &String = ...;
let s2: String = *r; // Error
Which is in turn not different from the following code, for instance, as far as the compiler is concerned:
let r: &String = ...;
let s: String = *s;
And we no longer have an owned string at the beginning. In general, the compiler doesn't track data flow. And rightfully so - when it type-checks the move it doesn't even can confirm that this reference isn't aliased. Or that the owned value is not used anymore. References are just references, they give you no right to drop the value.
Changing that will not be feasible in the general case (for example, the compiler will have to track data flow across function calls), and will require some form of manual annotation to say "this value is mine". And you already have such annotation - use an owned value, String, instead of &String: this is exactly what it's about.
Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.
This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.
Here's a Thing:
struct Thing(i32);
impl Thing {
pub fn increment_self(&mut self) {
self.0 += 1;
println!("incremented: {}", self.0);
}
}
And here's a function that tries to mutate a Thing and returns either true or false, depending on if a Thing is available:
fn try_increment(handle: Option<&mut Thing>) -> bool {
if let Some(t) = handle {
t.increment_self();
true
} else {
println!("warning: increment failed");
false
}
}
Here's a sample of usage:
fn main() {
try_increment(None);
let mut thing = Thing(0);
try_increment(Some(&mut thing));
try_increment(Some(&mut thing));
try_increment(None);
}
As written, above, it works just fine (link to Rust playground). Output below:
warning: increment failed
incremented: 1
incremented: 2
warning: increment failed
The problem arises when I want to write a function that mutates the Thing twice. For example, the following does not work:
fn try_increment_twice(handle: Option<&mut Thing>) {
try_increment(handle);
try_increment(handle);
}
fn main() {
try_increment_twice(None);
let mut thing = Thing(0);
try_increment_twice(Some(&mut thing));
try_increment_twice(None);
}
The error makes perfect sense. The first call to try_increment(handle) gives ownership of handle away and so the second call is illegal. As is often the case, the Rust compiler yields a sensible error message:
|
24 | try_increment(handle);
| ------ value moved here
25 | try_increment(handle);
| ^^^^^^ value used here after move
|
In an attempt to solve this, I thought it would make sense to pass handle by reference. It should be an immutable reference, mind, because I don't want try_increment to be able to change handle itself (assigning None to it, for example) only to be able to call mutations on its value.
My problem is that I couldn't figure out how to do this.
Here is the closest working version that I could get:
struct Thing(i32);
impl Thing {
pub fn increment_self(&mut self) {
self.0 += 1;
println!("incremented: {}", self.0);
}
}
fn try_increment(handle: &mut Option<&mut Thing>) -> bool {
// PROBLEM: this line is allowed!
// (*handle) = None;
if let Some(ref mut t) = handle {
t.increment_self();
true
} else {
println!("warning: increment failed");
false
}
}
fn try_increment_twice(mut handle: Option<&mut Thing>) {
try_increment(&mut handle);
try_increment(&mut handle);
}
fn main() {
try_increment_twice(None);
let mut thing = Thing(0);
try_increment_twice(Some(&mut thing));
try_increment_twice(None);
}
This code runs, as expected, but the Option is now passed about by mutable reference and that is not what I want:
I'm allowed to mutate the Option by reassigning None to it, breaking all following mutations. (Uncomment line 12 ((*handle) = None;) for example.)
It's messy: There are a whole lot of extraneous &mut's lying about.
It's doubly messy: Heaven only knows why I must use ref mut in the if let statement while the convention is to use &mut everywhere else.
It defeats the purpose of having the complicated borrow-checking and mutability checking rules in the compiler.
Is there any way to actually achieve what I want: passing an immutable Option around, by reference, and actually being able to use its contents?
You can't extract a mutable reference from an immutable one, even a reference to its internals. That's kind of the point! Multiple aliases of immutable references are allowed so, if Rust allowed you to do that, you could have a situation where two pieces of code are able to mutate the same data at the same time.
Rust provides several escape hatches for interior mutability, for example the RefCell:
use std::cell::RefCell;
fn try_increment(handle: &Option<RefCell<Thing>>) -> bool {
if let Some(t) = handle {
t.borrow_mut().increment_self();
true
} else {
println!("warning: increment failed");
false
}
}
fn try_increment_twice(handle: Option<RefCell<Thing>>) {
try_increment(&handle);
try_increment(&handle);
}
fn main() {
let mut thing = RefCell::new(Thing(0));
try_increment_twice(Some(thing));
try_increment_twice(None);
}
TL;DR: The answer is No, I can't.
After the discussions with #Peter Hall and #Stargateur, I have come to understand why I need to use &mut Option<&mut Thing> everywhere. RefCell<> would also be a feasible work-around but it is no neater and does not really achieve the pattern I was originally seeking to implement.
The problem is this: if one were allowed to mutate the object for which one has only an immutable reference to an Option<&mut T> one could use this power to break the borrowing rules entirely. Concretely, you could, essentially, have many mutable references to the same object because you could have many such immutable references.
I knew there was only one mutable reference to the Thing (owned by the Option<>) but, as soon as I started taking references to the Option<>, the compiler no longer knew that there weren't many of those.
The best version of the pattern is as follows:
fn try_increment(handle: &mut Option<&mut Thing>) -> bool {
if let Some(ref mut t) = handle {
t.increment_self();
true
}
else {
println!("warning: increment failed");
false
}
}
fn try_increment_twice(mut handle: Option<&mut Thing>) {
try_increment(&mut handle);
try_increment(&mut handle);
}
fn main() {
try_increment_twice(None);
let mut thing = Thing(0);
try_increment_twice(Some(&mut thing));
try_increment_twice(None);
}
Notes:
The Option<> holds the only extant mutable reference to the Thing
try_increment_twice() takes ownership of the Option<>
try_increment() must take the Option<> as &mut so that the compiler knows that it has the only mutable reference to the Option<>, during the call
If the compiler knows that try_increment() has the only mutable reference to the Option<> which holds the unique mutable reference to the Thing, the compiler knows that the borrow rules have not been violated.
Another Experiment
The problem of the mutability of Option<> remains because one can call take() et al. on a mutable Option<>, breaking everything following.
To implement the pattern that I wanted, I need something that is like an Option<> but, even if it is mutable, it cannot be mutated. Something like this:
struct Handle<'a> {
value: Option<&'a mut Thing>,
}
impl<'a> Handle<'a> {
fn new(value: &'a mut Thing) -> Self {
Self {
value: Some(value),
}
}
fn empty() -> Self {
Self {
value: None,
}
}
fn try_mutate<T, F: Fn(&mut Thing) -> T>(&mut self, mutation: F) -> Option<T> {
if let Some(ref mut v) = self.value {
Some(mutation(v))
}
else {
None
}
}
}
Now, I thought, I can pass around &mut Handle's all day long and know that someone who has a Handle can only mutate its contents, not the handle itself. (See Playground)
Unfortunately, even this gains nothing because, if you have a mutable reference, you can always reassign it with the dereferencing operator:
fn try_increment(handle: &mut Handle) -> bool {
if let Some(_) = handle.try_mutate(|t| { t.increment_self() }) {
// This breaks future calls:
(*handle) = Handle::empty();
true
}
else {
println!("warning: increment failed");
false
}
}
Which is all fine and well.
Bottom line conclusion: just use &mut Option<&mut T>
I'm writing a library that should read from something implementing the BufRead trait; a network data stream, standard input, etc. The first function is supposed to read a data unit from that reader and return a populated struct filled mostly with &'a str values parsed from a frame from the wire.
Here is a minimal version:
mod mymod {
use std::io::prelude::*;
use std::io;
pub fn parse_frame<'a, T>(mut reader: T)
where
T: BufRead,
{
for line in reader.by_ref().lines() {
let line = line.expect("reading header line");
if line.len() == 0 {
// got empty line; done with header
break;
}
// split line
let splitted = line.splitn(2, ':');
let line_parts: Vec<&'a str> = splitted.collect();
println!("{} has value {}", line_parts[0], line_parts[1]);
}
// more reads down here, therefore the reader.by_ref() above
// (otherwise: use of moved value).
}
}
use std::io;
fn main() {
let stdin = io::stdin();
let locked = stdin.lock();
mymod::parse_frame(locked);
}
An error shows up which I cannot fix after trying different solutions:
error: `line` does not live long enough
--> src/main.rs:16:28
|
16 | let splitted = line.splitn(2, ':');
| ^^^^ does not live long enough
...
20 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the body at 8:4...
--> src/main.rs:8:5
|
8 | / {
9 | | for line in reader.by_ref().lines() {
10 | | let line = line.expect("reading header line");
11 | | if line.len() == 0 {
... |
22 | | // (otherwise: use of moved value).
23 | | }
| |_____^
The lifetime 'a is defined on a struct and implementation of a data keeper structure because the &str requires an explicit lifetime. These code parts were removed as part of the minimal example.
BufReader has a lines() method which returns Result<String, Err>. I handle errors using expect or match and thus unpack the Result so that the program now has the bare String. This will then be done multiple times to populate a data structure.
Many answers say that the unwrap result needs to be bound to a variable otherwise it gets lost because it is a temporary value. But I already saved the unpacked Result value in the variable line and I still get the error.
How to fix this error - could not get it working after hours trying.
Does it make sense to do all these lifetime declarations just for &str in a data keeper struct? This will be mostly a readonly data structure, at most replacing whole field values. String could also be used, but have found articles saying that String has lower performance than &str - and this frame parser function will be called many times and is performance-critical.
Similar questions exist on Stack Overflow, but none quite answers the situation here.
For completeness and better understanding, following is an excerpt from complete source code as to why lifetime question came up:
Data structure declaration:
// tuple
pub struct Header<'a>(pub &'a str, pub &'a str);
pub struct Frame<'a> {
pub frameType: String,
pub bodyType: &'a str,
pub port: &'a str,
pub headers: Vec<Header<'a>>,
pub body: Vec<u8>,
}
impl<'a> Frame<'a> {
pub fn marshal(&'a self) {
//TODO
println!("marshal!");
}
}
Complete function definition:
pub fn parse_frame<'a, T>(mut reader: T) -> Result<Frame<'a>, io::Error> where T: BufRead {
Your problem can be reduced to this:
fn foo<'a>() {
let thing = String::from("a b");
let parts: Vec<&'a str> = thing.split(" ").collect();
}
You create a String inside your function, then declare that references to that string are guaranteed to live for the lifetime 'a. Unfortunately, the lifetime 'a isn't under your control — the caller of the function gets to pick what the lifetime is. That's how generic parameters work!
What would happen if the caller of the function specified the 'static lifetime? How would it be possible for your code, which allocates a value at runtime, to guarantee that the value lives longer than even the main function? It's not possible, which is why the compiler has reported an error.
Once you've gained a bit more experience, the function signature fn foo<'a>() will jump out at you like a red alert — there's a generic parameter that isn't used. That's most likely going to mean bad news.
return a populated struct filled mostly with &'a str
You cannot possibly do this with the current organization of your code. References have to point to something. You are not providing anywhere for the pointed-at values to live. You cannot return an allocated String as a string slice.
Before you jump to it, no you cannot store a value and a reference to that value in the same struct.
Instead, you need to split the code that creates the String and that which parses a &str and returns more &str references. That's how all the existing zero-copy parsers work. You could look at those for inspiration.
String has lower performance than &str
No, it really doesn't. Creating lots of extraneous Strings is a bad idea, sure, just like allocating too much is a bad idea in any language.
Maybe the following program gives clues for others who also also having their first problems with lifetimes:
fn main() {
// using String und &str Slice
let my_str: String = "fire".to_owned();
let returned_str: MyStruct = my_func_str(&my_str);
println!("Received return value: {ret}", ret = returned_str.version);
// using Vec<u8> und &[u8] Slice
let my_vec: Vec<u8> = "fire".to_owned().into_bytes();
let returned_u8: MyStruct2 = my_func_vec(&my_vec);
println!("Received return value: {ret:?}", ret = returned_u8.version);
}
// using String -> str
fn my_func_str<'a>(some_str: &'a str) -> MyStruct<'a> {
MyStruct {
version: &some_str[0..2],
}
}
struct MyStruct<'a> {
version: &'a str,
}
// using Vec<u8> -> & [u8]
fn my_func_vec<'a>(some_vec: &'a Vec<u8>) -> MyStruct2<'a> {
MyStruct2 {
version: &some_vec[0..2],
}
}
struct MyStruct2<'a> {
version: &'a [u8],
}
I'm trying to execute a function on chunks of a vector and then send the result back using the message passing library.
However, I get a strange error about the lifetime of the vector that isn't even participating in the thread operations:
src/lib.rs:153:27: 154:25 error: borrowed value does not live long enough
src/lib.rs:153 let extended_segments = (segment_size..max_val)
error: src/lib.rs:154 .collect::<Vec<_>>()borrowed value does not live long enough
note: reference must be valid for the static lifetime...:153
let extended_segments = (segment_size..max_val)
src/lib.rs:153:3: 155:27: 154 .collect::<Vec<_>>()
note: but borrowed value is only valid for the statement at 153:2:
reference must be valid for the static lifetime...
src/lib.rs:
let extended_segments = (segment_size..max_val)
consider using a `let` binding to increase its lifetime
I tried moving around the iterator and adding lifetimes to different places, but I couldn't get the checker to pass and still stay on type.
The offending code is below, based on the concurrency chapter in the Rust book. (Complete code is at github.)
use std::sync::mpsc;
use std::thread;
fn sieve_segment(a: &[usize], b: &[usize]) -> Vec<usize> {
vec![]
}
fn eratosthenes_sieve(val: usize) -> Vec<usize> {
vec![]
}
pub fn segmented_sieve_parallel(max_val: usize, mut segment_size: usize) -> Vec<usize> {
if max_val <= ((2 as i64).pow(16) as usize) {
// early return if the highest value is small enough (empirical)
return eratosthenes_sieve(max_val);
}
if segment_size > ((max_val as f64).sqrt() as usize) {
segment_size = (max_val as f64).sqrt() as usize;
println!("Segment size is larger than √{}. Reducing to {} to keep resource use down.",
max_val,
segment_size);
}
let small_primes = eratosthenes_sieve((max_val as f64).sqrt() as usize);
let mut big_primes = small_primes.clone();
let (tx, rx): (mpsc::Sender<Vec<usize>>, mpsc::Receiver<Vec<usize>>) = mpsc::channel();
let extended_segments = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
for this_segment in extended_segments.clone() {
let small_primes = small_primes.clone();
let tx = tx.clone();
thread::spawn(move || {
let sieved_segment = sieve_segment(&small_primes, this_segment);
tx.send(sieved_segment).unwrap();
});
}
for _ in 1..extended_segments.count() {
big_primes.extend(&rx.recv().unwrap());
}
big_primes
}
fn main() {}
How do I understand and avoid this error? I'm not sure how to make the lifetime of the thread closure static as in this question and still have the function be reusable (i.e., not main()). I'm not sure how to "consume all things that come into [the closure]" as mentioned in this question. And I'm not sure where to insert .map(|s| s.into()) to ensure that all references become moves, nor am I sure I want to.
When trying to reproduce a problem, I'd encourage you to create a MCVE by removing all irrelevant code. In this case, something like this seems to produce the same error:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
}
fn main() {}
Let's break that down:
Create an iterator between numbers.
Collect all of them into a Vec<usize>.
Return an iterator that contains references to the vector.
Since the vector isn't bound to any variable, it's dropped at the end of the statement. This would leave the iterator pointing to an invalid region of memory, so that's disallowed.
Check out the definition of slice::chunks:
fn chunks(&self, size: usize) -> Chunks<T>
pub struct Chunks<'a, T> where T: 'a {
// some fields omitted
}
The lifetime marker 'a lets you know that the iterator contains a reference to something. Lifetime elision has removed the 'a from the function, which looks like this, expanded:
fn chunks<'a>(&'a self, size: usize) -> Chunks<'a, T>
Check out this line of the error message:
help: consider using a let binding to increase its lifetime
You can follow that as such:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>();
let bar = foo.chunks(segment_size);
}
fn main() {}
Although I'd write it as
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo: Vec<_> = (segment_size..max_val).collect();
let bar = foo.chunks(segment_size);
}
fn main() {}
Re-inserting this code back into your original problem won't solve the problem, but it will be much easier to understand. That's because you are attempting to pass a reference to thread::spawn, which may outlive the current thread. Thus, everything passed to thread::spawn must have the 'static lifetime. There are tons of questions that detail why that must be prevented and a litany of solutions, including scoped threads and cloning the vector.
Cloning the vector is the easiest, but potentially inefficient:
for this_segment in extended_segments.clone() {
let this_segment = this_segment.to_vec();
// ...
}