value dropped when trying to assign a slice - rust

I have a very simple case where I have some function that takes a Option<Vec>, it then needs to look at that option, and if it is a None, then have a empty byte string, but if it is a Some, then call a function that does some transofmration of it's input.
Sketched out, it looks like this:
pub fn transform(ad: &[u8]) -> Vec<u8> {
ad.to_vec()
}
pub fn func(plaintext: Option<Vec<u8>>) {
let out = "".as_bytes();
if plaintext != None {
let out = transform(&plaintext.unwrap());
}
}
Doing the unwrapping and the if like this is really ugly though,and I would much like to do this in a safer way, maybe with pattern matching:
pub fn transform(ad: &[u8]) -> Vec<u8> {
ad.to_vec()
}
pub fn func(plaintext: Option<Vec<u8>>) {
let out = match plaintext {
Some(x) => &transform(&x),
None => "".as_bytes()
};
}
But this gives the error:
|
16 | let out = match plaintext {
| --- borrow later stored here
17 | Some(x) => &return_smth(&x),
| ^^^^^^^^^^^^^^-
| | |
| | temporary value is freed at the end of this statement
| creates a temporary which is freed while still in use
|
= note: consider using a `let` binding to create a longer lived value
I am unsure about which value that is being talked about here. How do I call my function, and get a slice returned?

I am unsure about which value that is being talked about here.
The one that's returned by transform (or return_smth, or whatever else you call it): that returns a Vec (which is an owned value), you immediately borrow it to a slice, but you never actually store the vec, so at the end of the expression it gets dropped and you have a dangling reference.
How do I call my function, and get a slice returned?
There are two main ways:
You don't, would be my first recommendation here. Rust has a container called Cow which stores "an owned value or a reference", which you can use like a normal reference, or convert to an owned value:
let out = match plaintext {
Some(x) => transform(&x).into(),
None => b"".into(),
};
The second possibility is to store the owned at the highest level you need it, then create a local borrow to that e.g.
let mut _v;
let out = match plaintext {
Some(x) => {
_v = transform(&x);
&*_v
},
None => b"",
};
This is a relatively common pattern for non-trivial code paths where you need to borrow part of the owned value rather than the entirety of it.
Incidentally,
as you can see above Rust has bytes literals (b""), no need to create a string then byteify it
as jmb commented, Rust has high-level methods for common generic tasks. Both "apply transformation to option's value" and "get value or return default" are such
Vec::new() is guaranteed not to allocate, so creating an empty Vec (or String) is not a big deal.
Hence if you don't have a very strong reason to favor a slice
let out = plaintext
.as_deref()
.map_or_else(Vec::new, transform);
would be perfectly fine here, or even
let out = plaintext.map_or_else(Vec::new, transform);
if you change transform to take and return a Vec.

Related

How to produce static references from append-only arena?

In my application (a compiler), I'd like to create data cyclic data structures of various kinds throughout my program's execution that all have the same lifetime (in my case, lasting until the end of compilation). In addition,
I don't need to worry about multi-threading
I only need to append information - no need to delete or garbage collect
I only need immutable references to my data
This seemed like a good use case for an Arena, but I saw that this would require passing the arena around to every function in my program, which seemed like a large overhead.
So instead I found a macro called thread_local! that I can use to define global data. Using this, I thought I might be able to define a custom type that wraps an index into the array, and implement Deref on that type:
use std::cell::RefCell;
enum Floop {
CaseA,
CaseB,
CaseC(FloopRef),
CaseD(FloopRef),
CaseE(Vec<FloopRef>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
pub struct FloopRef(usize);
impl std::ops::Deref for FloopRef {
type Target = Floop;
fn deref(&self) -> &Self::Target {
return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
}
}
pub fn main() {
// initialize some data
FLOOP_ARRAY.with(|floops| {
floops.borrow_mut().push(Box::new(Floop::CaseA));
let idx = floops.borrow_mut().len();
floops.borrow_mut().push(Box::new(Floop::CaseC(FloopRef(idx))));
});
}
Unfortunately I run into lifetime errors:
error: lifetime may not live long enough
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ------- ^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is &'2 Box<Floop>
| has type `&'1 RefCell<Vec<Box<Floop>>>`
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ^---------------^^^^^^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
What I'd like to tell the compiler is that I promise I'm never going to remove entries from the Array and that I'm not going to share values across threads and that the array will last until the end of the program so that I can in essence just return a &'static reference to a Floop object. But Rust doesn't seem to be convinced this is safe.
Is there any kind of Rust helper library that would let me do something like this? Or are there safety holes even when I guarantee I only append / only use data with a single thread?
If you would have a reference, you could send the data to another thread, then watch it after it has been dropped because the creating thread was finished.
Even if you would solve this problem, this would still require unsafe code, as the compiler can't be convinced that growing the Vec won't invalidate existing references. This is true in this case since you're using Box, but the compiler cannot know that.
If you pinky promise to never touch the data after the creating thread has finished, you can use the following code. Note that this code is technically UB as when the Vec will grow, we will move all Boxes, and at least currently, moving a Box invalidates all references deriven from it:
enum Floop {
CaseA,
CaseB,
CaseC(&'static Floop),
CaseD(&'static Floop),
CaseE(Vec<&'static Floop>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
fn alloc_floop(floop: Floop) -> &'static mut Floop {
FLOOP_ARRAY.with(|floops| {
let mut floops = floops.borrow_mut();
floops.push(Box::new(floop));
let floop = &mut **floops.last_mut().unwrap() as *mut Floop;
// SAFETY: We never access the data after it has been dropped, and we are
// the only who access this `Box` as we access a `Box` only immediately
// after pushing it.
unsafe { &mut *floop }
})
}
fn main() {
let floop_a = alloc_floop(Floop::CaseA);
let floop_b = alloc_floop(Floop::CaseC(floop_a));
}
A better solution would be something like a thread-safe arena that you can use in a static, but sadly, I found no crate that implements that.

Rust error :Cannot return value referencing temporary value

I'm trying to make a code that returns the mode of a list of given numbers.
Here's the code :
use std::collections::HashMap;
fn mode (vector: &Vec<i32>) -> Vec<&&i32> {
let mut occurrences = HashMap::new();
let mut n= Vec::new();
let mut mode = Vec::new();
for i in vector {
let j= occurrences.entry(i).or_insert(0);
*j+=1;
}
for (num, occ) in occurrences.clone().iter() {
if occ> n[0] {
n.clear();
mode.clear();
n.push(occ);
mode.push(num);
} else if occ== n[0] {
mode.push(num);
}
}
mode
}
fn main () {
let mut numbers: Vec<i32>= vec![1,5,2,2,5,3]; // 2 and 5 are the mode
numbers.sort();
println!("the mode is {:?}:", mode(&numbers));
}
I used a vector for the mode since a dataset could be multimodal.
Anyway, I'm getting the following error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:26:5
|
13 | for (num, occ) in occurrences.clone().iter() {
| ------------------- temporary value created here
...
26 | mode
| ^^^^ returns a value referencing data owned by the current function
When you return from the current function, any owned values are destroyed (other than the ones being returned from the function), and any data referencing that destroyed data therefore cannot be returned, e.g.:
fn example() -> &str {
let s = String::from("hello"); // owned data
&s // error: returns a value referencing data owned by the current function
// you can imagine this is added by the compiler
drop(s);
}
The issue you have comes from iter(). iter() returns an iterator of shared references:
let values: Vec<i32> = vec![1, 2, 3];
for i in values.iter() {
// i is a &i32
}
for i in values {
// i is an i32
}
So when you call occurrences.clone().iter() you're creating a temporary value (via clone()) which is owned by the current function, then iterating over that data via shared reference. When you destructure the tuple in (num, occ), these are also shared references.
Because you later call mode.push(num), Rust realizes that mode has the type Vec<&i32>. However, there is an implicit lifetime here. The lifetime of num is essentially the lifetime of the current function (let's call that 'a), so the full type of mode is Vec<&'a i32>.
Because of that, you can't return it from the current function.
To fix
Removing iter() should work, since then you will be iterating over owned values. You might also find that you can remove .clone() too, I haven't looked too closely but it seems like it's redundant.
A couple of other points while you're here:
It's rare to interact with &Vec<Foo>, instead it's much more usual to use slices: &[Foo]. They're more general, and in almost all cases more performant (you can still pass your data in like: &numbers)
Check out clippy, it has a bunch of linter rules that can catch a bunch of errors much earlier, and usually does a good job explaining them: https://github.com/rust-lang/rust-clippy

Do I need to use a `let` binding to create a longer lived value?

I've very recently started studying Rust, and while working on a test program, I wrote this method:
pub fn add_transition(&mut self, start_state: u32, end_state: u32) -> Result<bool, std::io::Error> {
let mut m: Vec<Page>;
let pages: &mut Vec<Page> = match self.page_cache.get_mut(&start_state) {
Some(p) => p,
None => {
m = self.index.get_pages(start_state, &self.file)?;
&mut m
}
};
// omitted code that mutates pages
// ...
Ok(true)
}
it does work as expected, but I'm not convinced about the m variable. If I remove it, the code looks more elegant:
pub fn add_transition(&mut self, start_state: u32, end_state: u32) -> Result<bool, std::io::Error> {
let pages: &mut Vec<Page> = match self.page_cache.get_mut(&start_state) {
Some(p) => p,
None => &mut self.index.get_pages(start_state, &self.file)?
};
// omitted code that mutates pages
// ...
Ok(true)
}
but I get:
error[E0716]: temporary value dropped while borrowed
--> src\module1\mod.rs:28:29
|
26 | let pages: &mut Vec<Page> = match self.page_cache.get_mut(&start_state) {
| _____________________________________-
27 | | Some(p) => p,
28 | | None => &mut self.index.get_pages(start_state, &self.file)?
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-
| | | |
| | | temporary value is freed at the end of this statement
| | creates a temporary which is freed while still in use
29 | | };
| |_________- borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
I fully understand the error, which directed me to the working snippet, but I'm wondering if there's a more elegant and/or idiomatic way of writing this code. I am declaring m at the beginning of the function, only to prevent a temporary variable from being freed too early. Is there a way of telling the compiler that the lifetime of the return value of self.index.get_pages should be the whole add_transition function?
Further details:
Page is a relatively big struct, so I'd rather not implement the Copy trait nor I'd clone it.
page_cache is of type HashMap<u32, Vec<Page>>
self.index.get_pages is relatively slow and I'm using page_cache to cache results
The return type of self.index.get_pages is Result<Vec<Page>, std::io::Error>
This is normal, your 'cleaner' code basically comes down to do something as follows:
let y = {
let x = 42;
&x
};
Here it should be obvious that you cannot return a reference to x because x is dropped at the end of the block. Those rules don't change when working with temporary values: self.index.get_pages(start_state, &self.file)? creates a temporary value that is dropped at the end of the block (line 29) and thus you can't return a reference to it.
The workaround via m now moves that temporary into the m binding one block up which will live long enough for pages to work with it.
Now for alternatives, I guess page_cache is a HashMap? Then you could alternatively do something like let pages = self.page_cache.entry(start_state).or_insert_with(||self.index.get_pages(...))?;. The only problem with that approach is that get_pages returns a Result while the current cache stores Vec<Page> (the Ok branch only). You could adapt the cache to actually store Result instead, which I think is semantically also better since you want to cache the results of that function call, so why not do that for Err? But if you have a good reason to not cache Err, the approach you have should work just fine.
Yours is probably the most efficient way, but in theory not necessary, and one can be more elegant.
Another way of doing it is to use a trait object in this case — have the variable be of the type dyn DerefMut<Vec<Page>>. This basically means that this variable can hold any type that implements the trait DerefMut<Vec<Page>>>, two types that do so are &mut Vec<Page> and Vec<Page>, in that case the variable can hold either of these, but the contents can only be referenced via DerefMut.
So the following code works as an illustration:
struct Foo {
inner : Option<Vec<i32>>,
}
impl Foo {
fn new () -> Self {
Foo { inner : None }
}
fn init (&mut self) {
self.inner = Some(Vec::new())
}
fn get_mut_ref (&mut self) -> Option<&mut Vec<i32>> {
self.inner.as_mut()
}
}
fn main () {
let mut foo : Foo = Foo::new();
let mut m : Box<dyn AsMut<Vec<i32>>> = match foo.get_mut_ref() {
Some(r) => Box::new(r),
None => Box::new(vec![1,2,3]),
};
m.as_mut().as_mut().push(4);
}
The key here is the type Box<dyn AsMut<Vec<i32>>; this means that it can be a box that holds any type, so long the type implement AsMut<Vec<i32>>, because it's boxed in we also need .as_mut().as_mut() to get the actual &mut <Vec<i32>> out of it.
Because different types can have different sizes; they also cannot be allocated on the stack, so they must be behind some pointer, a Box is typically chosen therefore, and in this case necessary, a normal pointer that is sans ownership of it's pointee will face similar problems to those you face.
One might argue that this code is more elegant, but yours is certainly more efficient and does not require further heap allocation.

Get value out of optional HashMap only when present

I have a bit of code that loads a HashMap and then retrieves a value with the map.get(...) method. In my case, it's possible that I may not be able to return a HashMap, so in reality I'm dealing with an Option<HashMap>.
I've managed to isolate my problem in the following snippet:
use std::collections::HashMap;
type MyMap = HashMap<String, String>;
fn get_map() -> Option<MyMap> {
// In the real case, we may or may not be able to return a map
Some(MyMap::new())
}
fn main() {
let res = get_map().and_then(|h| h.get("foo"));
println!("{:?}", res)
}
I get the following compilation error:
error[E0597]: `h` does not live long enough
--> src/main.rs:11:38
|
11 | let res = get_map().and_then(|h| h.get("foo"));
| ^ - `h` dropped here while still borrowed
| |
| borrowed value does not live long enough
12 | println!("{:?}", res)
13 | }
| - borrowed value needs to live until here
(Playground)
I think that I get what's going on here:
The HashMap owns all of its key-value pairs.
When I call h.get(...) it lends me the value.
Because of that the HashMap needs to exist as long as the value exists.
There are really two questions here:
Am I understanding this correctly?
How do I fix this?
Call Option::as_ref. It converts an Option<T> to Option<&T>:
use std::collections::HashMap;
type MyMap = HashMap<String, String>;
fn get_map() -> Option<MyMap> {
// In the real case, we may or may not be able to return a map
Some(MyMap::new())
}
fn main() {
let map = get_map();
let res = map.as_ref().and_then(|h| h.get("foo"));
println!("{:?}", res)
}
What happened is that and_then consumes the Option; so you were trying to hold a reference to the consumed data.
The same rule applies for the returned value of get_map(): if it is not stored in its own variable, it remains a temporary value, to which you cannot hold a reference.

Value getting dropped too early inside closure and combinator while borrow exists

I'm facing a problem with a value being dropped while it is still borrowed inside an Option, in a closure, but I'm having a hard time grasping exactly what's going on. To illustrate, here is a working example of what I'm actually trying to achieve:
fn foo() -> Option<String> {
let hd = match std::env::home_dir() {
Some(d) => d,
None => return None,
};
let fi = match hd.file_name() {
Some(f) => f,
None => return None,
};
let st = match fi.to_str() {
Some(s) => s,
None => return None,
};
Some(String::from(st))
}
The return value is the base name of the current user's home directory inside Option<String>.
I thought I'd try refactoring this with combinators to get rid of the lines None => return None,.
std::env::home_dir()
.and_then(|d| d.file_name())
.and_then(|f| f.to_str())
.map(String::from)
But rustc detects a reference that outlives its value.
error: `d` does not live long enough
--> src/main.rs:33:35
|
33 | .and_then(|d| d.file_name())
| - ^ `d` dropped here while still borrowed
| |
| borrow occurs here
34 | .and_then(|f| f.to_str())
35 | .map(String::from)
| - borrowed value needs to live until here
I think this is because the reference in Option<&OsStr> is outliving the value of type PathBuf. However I'm still having a hard time figuring out how to approach this without having the value go out of scope too soon.
To further illustrate what I'm trying to achieve, here is a similar example with a type that implements the Copy trait.
let x = 42u16.checked_add(1234)
.and_then(|i| i.checked_add(5678))
.and_then(|i| i.checked_sub(90))
.map(|i| i.to_string());
println!("{:?}", x); // Some("6864")
So I'm definitely overlooking a few things related to ownership in the prior example. Is this possible with Option<PathBuf>?
You're right that you're consuming the PathBuf returned from home_dir() but still trying to use references.
I would keep it in a variable, and work from there:
fn foo() -> Option<String> {
let path = std::env::home_dir();
path.as_ref()
.and_then(|d| d.file_name())
.and_then(|f| f.to_str())
.map(String::from)
}
(Playground)
The call to path.as_ref() makes an Option<&PathBuf> as the starting point for the chain of and_then, without consuming the original owned PathBuf which is needed at least until String::from.
Expanding on Chris's answer: You can also fix the issue by nesting the chain starting from the second and_then into the closure passed to the first and_then. This works because it keeps d (which owns a PathBuf) alive until the borrows on it are released.
fn foo() -> Option<String> {
std::env::home_dir().and_then(|d| {
d.file_name()
.and_then(|f| f.to_str())
.map(String::from)
})
}

Resources