I am trying to store into a HashMap the result of a parsing operation on a text file (parsed with nom). The result is comprised of a Vec buffer and some slices over that buffer. The goal is to store those together in a tuple or struct as a value in the hash map (with String key). But I can't work around the lifetime issues.
Context
The parsing itself takes an &[u8] and returns some data structure containing slices over that same input, e.g.:
struct Cmd<'a> {
pub name: &'a str
}
fn parse<'a>(input: &'a [u8]) -> Vec<Cmd<'a>> {
[...]
}
Now, because the parsing operates on slices without storage, I need to first store the input text in a Vec so that the output slices remain valid, so something like:
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
Then I would ideally store this Entry into a HashMap. This is were troubles arise. I tried two different approaches:
Attempt A: Store then parse
Create the HashMap entry first with the input, parse referencing the HashMap entry directly, and then update it.
pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let mut entry = Entry{ input_data: buffer, parsed_result: vec![] };
let cmds = parse(&entry.input_data[..]);
entry.parsed_result = cmds;
map.insert(filename.to_string(), entry);
}
This doesn't work because the borrow checker complains that &entry.input_data[..] borrows with the same lifetime as entry, and therefore cannot be moved into map as there's an active borrow.
error[E0597]: `entry.input_data` does not live long enough
--> src\main.rs:26:23
|
23 | pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
...
26 | let cmds = parse(&entry.input_data[..]);
| ^^^^^^^^^^^^^^^^ borrowed value does not live long enough
27 | entry.parsed_result = cmds;
28 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `entry.input_data` is borrowed for `'1`
29 | }
| - `entry.input_data` dropped here while still borrowed
error[E0505]: cannot move out of `entry` because it is borrowed
--> src\main.rs:28:38
|
26 | let cmds = parse(&entry.input_data[..]);
| ---------------- borrow of `entry.input_data` occurs here
27 | entry.parsed_result = cmds;
28 | map.insert(filename.to_string(), entry);
| ------ ^^^^^ move out of `entry` occurs here
| |
| borrow later used by call
Attempt B: Parse then store
Parse first, then try to store both the Vec buffer and the data slices into it all together into the HashMap.
pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
map.insert(filename.to_string(), entry);
}
This doesn't work because the borrow checker complains that cmds has same lifetime as &buffer[..] but buffer will be dropped by the end of the function. It ignores the fact that cmds and buffer have the same lifetime, and are both (I wish) moved into entry, which is itself moved into map, so there should be no lifetime issue here.
error[E0597]: `buffer` does not live long enough
--> src\main.rs:33:21
|
31 | pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
32 | let buffer: Vec<u8> = load_from_file(filename);
33 | let cmds = parse(&buffer[..]);
| ^^^^^^ borrowed value does not live long enough
34 | let entry = Entry{ input_data: buffer, parsed_result: cmds };
35 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `buffer` is borrowed for `'1`
36 | }
| - `buffer` dropped here while still borrowed
error[E0505]: cannot move out of `buffer` because it is borrowed
--> src\main.rs:34:34
|
31 | pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
32 | let buffer: Vec<u8> = load_from_file(filename);
33 | let cmds = parse(&buffer[..]);
| ------ borrow of `buffer` occurs here
34 | let entry = Entry{ input_data: buffer, parsed_result: cmds };
| ^^^^^^ move out of `buffer` occurs here
35 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `buffer` is borrowed for `'1`
Minimal (non-)working example
use std::collections::HashMap;
#[derive(Debug, PartialEq)]
struct Cmd<'a> {
name: &'a str
}
fn parse<'a>(input: &'a [u8]) -> Vec<Cmd<'a>> {
Vec::new()
}
fn load_from_file(filename: &str) -> Vec<u8> {
Vec::new()
}
#[derive(Debug, PartialEq)]
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
// pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
// let buffer: Vec<u8> = load_from_file(filename);
// let mut entry = Entry{ input_data: buffer, parsed_result: vec![] };
// let cmds = parse(&entry.input_data[..]);
// entry.parsed_result = cmds;
// map.insert(filename.to_string(), entry);
// }
pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
map.insert(filename.to_string(), entry);
}
fn main() {
println!("Hello, world!");
}
Edit: Attempt with 2 maps
As Kevin pointed, and this is what threw me off the first time (above attempts), the borrow checker doesn't understand that moving a Vec doesn't invalidate the slices because the heap buffer of the Vec is not touched. Fair enough.
Side note: I am ignoring the parts of Kevin's answer related to using indexes (the Rust documentation explicitly states slices are a better replacement for indices, so I feel this is working against the language) and the use of external crates (which also are explicitly working against the language). I am trying to learn and understand how to do this "the Rust way", not at all costs.
So my immediate reaction to that was to change the data structure: first insert the storage Vec into a first HashMap, and once it's there call the parse() function to create the slices directly pointing into the HashMap value. Then store those into a second HashMap, which would naturally dissociate the two. However that also doesn't work as soon as I put all of that in a loop, which is the broader goal of this code:
fn two_maps<'a>(
filename: &str,
input_map: &'a mut HashMap<String, Vec<u8>>,
cmds_map: &mut HashMap<String, Vec<Cmd<'a>>>,
queue: &mut Vec<String>) {
{
let buffer: Vec<u8> = load_from_file(filename);
input_map.insert(filename.to_string(), buffer);
}
{
let buffer = input_map.get(filename).unwrap();
let cmds = parse(&buffer[..]);
for cmd in &cmds {
// [...] Find further dependencies to load and parse
queue.push("...".to_string());
}
cmds_map.insert(filename.to_string(), cmds);
}
}
fn main() {
let mut input_map = HashMap::new();
let mut cmds_map = HashMap::new();
let mut queue = Vec::new();
queue.push("file1.txt".to_string());
while let Some(path) = queue.pop() {
println!("Loading file: {}", path);
two_maps(&path[..], &mut input_map, &mut cmds_map, &mut queue);
}
}
The problem here is that once the input buffer is in the first map input_map, referencing it binds the lifetime of each new parsed result to the entry of that HashMap, and therefore the &'a mut reference (the 'a lifetime added). Without this, the compiler complains that data flows from input_map into cmds_map with unrelated lifetimes, which is fair enough. But with this, the &'a mut reference to input_map becomes locked on the first loop iteration and never released, and the borrow checker chokes on the second iteration, quite rightfully so.
So I am stuck again. Is what I am trying to do completely unreasonable and impossible in Rust? How can I approach the problem (algorithms, data structures) to make things work lifetime-wise? I really don't see what's the "Rust way" here to store a collection of buffers and slices over those buffers. Is the only solution (that I want to avoid) to first load all files, and then parse them? This is very impractical in my case because most files contain references to other files, and I want to load the minimum chain of dependencies (likely < 10 files), not the entire collection (which is something like 3000+ files), and I can only access dependencies by parsing each file.
It seems the core of the issue is that storing the input buffers into any kind of data structure requires a mutable reference to said data structure for the duration of the insert operation, which is incompatible with having long-lived immutable references to each single buffer (for the slices) because those references need to have the same lifetime as per the HashMap definition. Is there any other data structure (maybe immutable ones) that lifts this? Or am I completely on the wrong track?
Now, because the parsing operates on slices without storage, I need to first store the input text in a Vec so that the output slices remain valid, so something like:
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
What you are attempting here is a “self-referential structure”, where parsed_result refers to input_data. There is an incidental and a fundamental reason why this cannot work as written.
The incidental reason is that this struct declaration contains the lifetime parameter 'a, but actually the lifetime you're attempting to give parsed_result is the lifetime of the struct itself, and there is no Rust syntax to specify that lifetime.
The fundamental reason is that Rust allows structs (and other values) to be moved to other locations in memory, and references are just statically checked pointers. So, when you write
map.insert(filename.to_string(), entry);
you're causing the value of entry to be moved from the stack frame to the HashMap's storage. That move invalidates any references into entry, whether or not entry contains those references itself. That's what the error "cannot move out of entry because it is borrowed" means; the borrow checker is not allowing the move to happen.
In your Attempt B,
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
the problem is that you're moving buffer (into the Entry) while cmds borrows it. Again, that means the references (just fancy pointers!) into buffer would become invalid, so it's not allowed.
(Now, since Vec stores its actual data in a heap-allocated vector that will stay put while the Vec is moved, this might actually be safe, but the Rust borrow checker doesn't care about that.)
Solutions
The simplest solution (from a language perspective) is to have each Cmd store indices into input_data instead of references. Indices don't become invalid when the object is moved since they're relative. The disadvantage of this is of course that you have to slice the input data every time — code has to carry around the Entry as well as the Cmd.
However, there are tools available to make self-referential structures, without even needing to write any unsafe code. The crates ouroboros and rental both allow you to define self-referential structs, at the price of having to use special functions to access the struct fields.
For example, your code might look something like this using ouroboros (I haven't tested this):
use ouroboros::self_referencing;
#[self_referencing]
struct Entry {
input_data: Vec<u8>,
#[borrows(input_data)]
parsed_result: Vec<Cmd<'this>> // 'this is a special lifetime name provided by ouroboros
}
fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let entry = EntryBuilder { // EntryBuilder is defined by ouroboros to help construct Entry
input_data: load_from_file(filename),
// Note that instead of giving a value for parsed_result, we give
// a function to compute it.
parsed_result_builder: |input_data: &[u8]| parse(input_data),
}.build();
map.insert(filename.to_string(), entry);
}
fn do_something_with_entry(entry: &Entry) {
entry.with_parsed_result(|cmds| {
// cmds is a reference to `self.parsed_result` which only lives as
// long as this lambda and therefore can't be invalidated by a move.
});
}
ouroboros (and rental) provide a fairly odd interface for accessing fields. If, like me, you don't want to expose that interface to your users (or the rest of your code), you can write a wrapper struct around the self-referential struct whose impl contains methods designed for how you want the structure to be used, so all of the odd field access methods can remain private.
Related
I have a struct which maps ids to indices and vice versa.
struct IdMapping<'a> {
external_2_internal: HashMap<&'a str, usize>,
internal_2_external: HashMap<usize, String>,
}
impl<'a> IdMapping<'a> {
fn new() -> IdMapping<'a> {
IdMapping {
external_2_internal: HashMap::new(),
internal_2_external: HashMap::new(),
}
}
fn insert(&'a mut self, internal: usize, external: String) {
self.internal_2_external.insert(internal, external);
let mapped_external = self.internal_2_external.get(&internal).unwrap();
self.external_2_internal.insert(mapped_external, internal);
}
}
If I am using this structure the following way
fn map_ids<'a>(ids: Vec<String>) -> IdMapping<'a> {
let mut mapping = IdMapping::new();
for (i, id) in ids.iter().enumerate() {
mapping.insert(i, id.clone());
}
mapping
}
I receive the following compiler error:
error[E0499]: cannot borrow `mapping` as mutable more than once at a time
--> src/lib.rs:28:9
|
24 | fn map_ids<'a>(ids: Vec<String>) -> IdMapping<'a> {
| -- lifetime `'a` defined here
...
28 | mapping.insert(i, id.clone());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `mapping` was mutably borrowed here in the previous iteration of the loop
...
31 | mapping
| ------- returning this value requires that `mapping` is borrowed for `'a`
Playground link
Why can't I mutably borrow the mapping for its insert method each iteration of the loop? How should I implement this use case?
The problem is due to the self reference in your struct.
Let's first look at whether this is theoretically sound (assuming we're writing unsafe code):
HashMap uses a flat array (quadratic probing), so objects in the HashMap aren't address stable under insertion of a new element. This means that an insertion into internal_2_external may move the existing Strings around in memory.
String stores its content in a separate heap allocation, the heap allocation remains at the same location. So even if the String is moved around, a &str referencing it will remain pointing to valid memory.
So this would actually work if implemented in unsafe code.
While it's logically sound to do those operations, the type system is unable to recognise that you can move a String while keeping borrowed ranges of it valid. This means that if you borrow a &str from a String, the type system will prevent you from doing any mutation operation on your String, including moving it. As such, you can't do any mutation operation on the hash map either, giving rise to your error.
I can see two safe ways to work around this:
Make external_2_internal keep its own copy of the strings, i.e. useHashMap<String, usize>.
Keep the strings in a separate Vec:
struct IdMapping {
strings: Vec<String>,
external_2_internal: HashMap<usize /* index */, usize>,
internal_2_external: HashMap<usize, usize /* index */>,
}
Alternatively you could work with unsafe code, if you don't want to change the layout of your struct.
Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.
This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.
I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.
If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.
Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.
I'm writing a library that should read from something implementing the BufRead trait; a network data stream, standard input, etc. The first function is supposed to read a data unit from that reader and return a populated struct filled mostly with &'a str values parsed from a frame from the wire.
Here is a minimal version:
mod mymod {
use std::io::prelude::*;
use std::io;
pub fn parse_frame<'a, T>(mut reader: T)
where
T: BufRead,
{
for line in reader.by_ref().lines() {
let line = line.expect("reading header line");
if line.len() == 0 {
// got empty line; done with header
break;
}
// split line
let splitted = line.splitn(2, ':');
let line_parts: Vec<&'a str> = splitted.collect();
println!("{} has value {}", line_parts[0], line_parts[1]);
}
// more reads down here, therefore the reader.by_ref() above
// (otherwise: use of moved value).
}
}
use std::io;
fn main() {
let stdin = io::stdin();
let locked = stdin.lock();
mymod::parse_frame(locked);
}
An error shows up which I cannot fix after trying different solutions:
error: `line` does not live long enough
--> src/main.rs:16:28
|
16 | let splitted = line.splitn(2, ':');
| ^^^^ does not live long enough
...
20 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the body at 8:4...
--> src/main.rs:8:5
|
8 | / {
9 | | for line in reader.by_ref().lines() {
10 | | let line = line.expect("reading header line");
11 | | if line.len() == 0 {
... |
22 | | // (otherwise: use of moved value).
23 | | }
| |_____^
The lifetime 'a is defined on a struct and implementation of a data keeper structure because the &str requires an explicit lifetime. These code parts were removed as part of the minimal example.
BufReader has a lines() method which returns Result<String, Err>. I handle errors using expect or match and thus unpack the Result so that the program now has the bare String. This will then be done multiple times to populate a data structure.
Many answers say that the unwrap result needs to be bound to a variable otherwise it gets lost because it is a temporary value. But I already saved the unpacked Result value in the variable line and I still get the error.
How to fix this error - could not get it working after hours trying.
Does it make sense to do all these lifetime declarations just for &str in a data keeper struct? This will be mostly a readonly data structure, at most replacing whole field values. String could also be used, but have found articles saying that String has lower performance than &str - and this frame parser function will be called many times and is performance-critical.
Similar questions exist on Stack Overflow, but none quite answers the situation here.
For completeness and better understanding, following is an excerpt from complete source code as to why lifetime question came up:
Data structure declaration:
// tuple
pub struct Header<'a>(pub &'a str, pub &'a str);
pub struct Frame<'a> {
pub frameType: String,
pub bodyType: &'a str,
pub port: &'a str,
pub headers: Vec<Header<'a>>,
pub body: Vec<u8>,
}
impl<'a> Frame<'a> {
pub fn marshal(&'a self) {
//TODO
println!("marshal!");
}
}
Complete function definition:
pub fn parse_frame<'a, T>(mut reader: T) -> Result<Frame<'a>, io::Error> where T: BufRead {
Your problem can be reduced to this:
fn foo<'a>() {
let thing = String::from("a b");
let parts: Vec<&'a str> = thing.split(" ").collect();
}
You create a String inside your function, then declare that references to that string are guaranteed to live for the lifetime 'a. Unfortunately, the lifetime 'a isn't under your control — the caller of the function gets to pick what the lifetime is. That's how generic parameters work!
What would happen if the caller of the function specified the 'static lifetime? How would it be possible for your code, which allocates a value at runtime, to guarantee that the value lives longer than even the main function? It's not possible, which is why the compiler has reported an error.
Once you've gained a bit more experience, the function signature fn foo<'a>() will jump out at you like a red alert — there's a generic parameter that isn't used. That's most likely going to mean bad news.
return a populated struct filled mostly with &'a str
You cannot possibly do this with the current organization of your code. References have to point to something. You are not providing anywhere for the pointed-at values to live. You cannot return an allocated String as a string slice.
Before you jump to it, no you cannot store a value and a reference to that value in the same struct.
Instead, you need to split the code that creates the String and that which parses a &str and returns more &str references. That's how all the existing zero-copy parsers work. You could look at those for inspiration.
String has lower performance than &str
No, it really doesn't. Creating lots of extraneous Strings is a bad idea, sure, just like allocating too much is a bad idea in any language.
Maybe the following program gives clues for others who also also having their first problems with lifetimes:
fn main() {
// using String und &str Slice
let my_str: String = "fire".to_owned();
let returned_str: MyStruct = my_func_str(&my_str);
println!("Received return value: {ret}", ret = returned_str.version);
// using Vec<u8> und &[u8] Slice
let my_vec: Vec<u8> = "fire".to_owned().into_bytes();
let returned_u8: MyStruct2 = my_func_vec(&my_vec);
println!("Received return value: {ret:?}", ret = returned_u8.version);
}
// using String -> str
fn my_func_str<'a>(some_str: &'a str) -> MyStruct<'a> {
MyStruct {
version: &some_str[0..2],
}
}
struct MyStruct<'a> {
version: &'a str,
}
// using Vec<u8> -> & [u8]
fn my_func_vec<'a>(some_vec: &'a Vec<u8>) -> MyStruct2<'a> {
MyStruct2 {
version: &some_vec[0..2],
}
}
struct MyStruct2<'a> {
version: &'a [u8],
}
I'm learning Rust and I'm trying to cargo-cult this code into compiling:
use std::vec::Vec;
use std::collections::BTreeMap;
struct Occ {
docnum: u64,
weight: f32,
}
struct PostWriter<'a> {
bytes: Vec<u8>,
occurrences: BTreeMap<&'a [u8], Vec<Occ>>,
}
impl<'a> PostWriter<'a> {
fn new() -> PostWriter<'a> {
PostWriter {
bytes: Vec::new(),
occurrences: BTreeMap::new(),
}
}
fn add_occurrence(&'a mut self, term: &[u8], occ: Occ) {
let occurrences = &mut self.occurrences;
match occurrences.get_mut(term) {
Some(x) => x.push(occ),
None => {
// Add the term bytes to the big vector of all terms
let termstart = self.bytes.len();
self.bytes.extend(term);
// Create a new occurrences vector
let occs = vec![occ];
// Take the appended term as a slice to use as a key
// ERROR: cannot borrow `*occurrences` as mutable more than once at a time
occurrences.insert(&self.bytes[termstart..], occs);
}
}
}
}
fn main() {}
I get an error:
error[E0499]: cannot borrow `*occurrences` as mutable more than once at a time
--> src/main.rs:34:17
|
24 | match occurrences.get_mut(term) {
| ----------- first mutable borrow occurs here
...
34 | occurrences.insert(&self.bytes[termstart..], occs);
| ^^^^^^^^^^^ second mutable borrow occurs here
35 | }
36 | }
| - first borrow ends here
I don't understand... I'm just calling a method on a mutable reference, why would that line involve borrowing?
I'm just calling a method on a mutable reference, why would that line involve borrowing?
When you call a method on an object that's going to mutate the object, you can't have any other references to that object outstanding. If you did, your mutation could invalidate those references and leave your program in an inconsistent state. For example, say that you had gotten a value out of your hashmap and then added a new value. Adding the new value hits a magic limit and forces memory to be reallocated, your value now points off to nowhere! When you use that value... bang goes the program!
In this case, it looks like you want to do the relatively common "append or insert if missing" operation. You will want to use entry for that:
use std::collections::BTreeMap;
fn main() {
let mut map = BTreeMap::new();
{
let nicknames = map.entry("joe").or_insert(Vec::new());
nicknames.push("shmoe");
// Using scoping to indicate that we are done with borrowing `nicknames`
// If we didn't, then we couldn't borrow map as
// immutable because we could still change it via `nicknames`
}
println!("{:?}", map)
}
Because you're calling a method that borrows as mutable
I had a similar question yesterday about Hash, until I noticed something in the docs. The docs for BTreeMap show a method signature for insert starting with fn insert(&mut self..
So when you call .insert, you're implicitly asking that function to borrow the BTreeMap as mutable.