I need a closure to refer to parts of an object in its enclosing environment. The object is created within the environment and is scoped to it, but once created it could be safely moved to the closure.
The use case is a function that does some preparatory work and returns a closure that will do the rest of the work. The reason for this design are execution constraints: the first part of the work involves allocation, and the remainder must do no allocation. Here is a minimal example:
fn stage_action() -> Box<Fn() -> ()> {
// split a freshly allocated string into pieces
let string = String::from("a:b:c");
let substrings = vec![&string[0..1], &string[2..3], &string[4..5]];
// the returned closure refers to the subtrings vector of
// slices without any further allocation or modification
Box::new(move || {
for sub in substrings.iter() {
println!("{}", sub);
}
})
}
fn main() {
let action = stage_action();
// ...executed some time later:
action();
}
This fails to compile, correctly stating that &string[0..1] and others must not outlive string. But if string were moved into the closure, there would be no problem. Is there a way to force that to happen, or another approach that would allow the closure to refer to parts of an object created just outside of it?
I've also tried creating a struct with the same functionality to make the move fully explicit, but that doesn't compile either. Again, compilation fails with the error that &later[0..1] and others only live until the end of function, but "borrowed value must be valid for the static lifetime".
Even completely avoiding a Box doesn't appear to help - the compiler complains that the object doesn't live long enough.
There's nothing specific to closures here; it's the equivalent of:
fn main() {
let string = String::from("a:b:c");
let substrings = vec![&string[0..1], &string[2..3], &string[4..5]];
let string = string;
}
You are attempting to move the String while there are outstanding borrows. In my example here, it's to another variable; in your example it's to the closure's environment. Either way, you are still moving it.
Additionally, you are trying to move the substrings into the same closure environment as the owning string. That's makes the entire problem equivalent to Why can't I store a value and a reference to that value in the same struct?:
struct Environment<'a> {
string: String,
substrings: Vec<&'a str>,
}
fn thing<'a>() -> Environment<'a> {
let string = String::from("a:b:c");
let substrings = vec![&string[0..1], &string[2..3], &string[4..5]];
Environment {
string: string,
substrings: substrings,
}
}
The object is created within the environment and is scoped to it
I'd disagree; string and substrings are created outside of the closure's environment and moved into it. It's that move that's tripping you up.
once created it could be safely moved to the closure.
In this case that's true, but only because you, the programmer, can guarantee that the address of the string data inside the String will remain constant. You know this for two reasons:
String is internally implemented with a heap allocation, so moving the String doesn't move the string data.
The String will never be mutated, which could cause the string to reallocate, invalidating any references.
The easiest solution for your example is to simply convert the slices to Strings and let the closure own them completely. This may even be a net benefit if that means you can free a large string in favor of a few smaller strings.
Otherwise, you meet the criteria laid out under "There is a special case where the lifetime tracking is overzealous" in Why can't I store a value and a reference to that value in the same struct?, so you can use crates like:
owning_ref
use owning_ref::RcRef; // 0.4.1
use std::rc::Rc;
fn stage_action() -> impl Fn() {
let string = RcRef::new(Rc::new(String::from("a:b:c")));
let substrings = vec![
string.clone().map(|s| &s[0..1]),
string.clone().map(|s| &s[2..3]),
string.clone().map(|s| &s[4..5]),
];
move || {
for sub in &substrings {
println!("{}", &**sub);
}
}
}
fn main() {
let action = stage_action();
action();
}
ouroboros
use ouroboros::self_referencing; // 0.2.3
fn stage_action() -> impl Fn() {
#[self_referencing]
struct Thing {
string: String,
#[borrows(string)]
substrings: Vec<&'this str>,
}
let thing = ThingBuilder {
string: String::from("a:b:c"),
substrings_builder: |s| vec![&s[0..1], &s[2..3], &s[4..5]],
}
.build();
move || {
thing.with_substrings(|substrings| {
for sub in substrings {
println!("{}", sub);
}
})
}
}
fn main() {
let action = stage_action();
action();
}
Note that I'm no expert user of either of these crates, so these examples may not be the best use of it.
Related
I have problems with understanding the behavior and availability of structs with multiple lifetime parameters. Consider the following:
struct My<'a,'b> {
first: &'a String,
second: &'b String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = My{
first: &first,
second: &second
}
}
println!("{}", my.first)
}
The error message says that
|
13 | second: &second
| ^^^^^^^ borrowed value does not live long enough
14 | }
15 | }
| - `second` dropped here while still borrowed
16 | println!("{}", my.first)
| -------- borrow later used here
First, I do not access the .second element of the struct. So, I do not see the problem.
Second, the struct has two life time parameters. I assume that compiler tracks the fields of struct seperately.
For example the following compiles fine:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
std::mem::drop(my.second);
println!("{}", my.first)
}
Which means that even though, .second of the struct is dropped that does not invalidate the whole struct. I can still access the non-dropped elements.
Why doesn't the same the same work for structs with references?
The struct has two independent lifetime parameters. Just like a struct with two type parameters are independent of each other, I would expect that these two lifetimes are independent as well. But the error message suggest that in the case of lifetimes these are not independent. The resultant struct does not have two lifetime parameters but only one that is the smaller of the two.
If the validity of struct containing two references limited to the lifetime of reference with the smallest lifetime, then my question is what is the difference between
struct My1<'a,'b>{
f: &'a X,
s: &'b Y,
}
and
struct My2<'a>{
f: &'a X,
s: &'a Y
}
I would expect that structs with multiple lifetime parameters to behave similar to functions with multiple lifetime parameters. Consider these two functions
fn fun_single<'a>(x:&'a str, y: &'a str) -> &'a str {
if x.len() <= y.len() {&x[0..1]} else {&y[0..1]}
}
fn fun_double<'a,'b>(x: &'a str, y:&'b str) -> &'a str {
&x[0..1]
}
fn main() {
let first = "first".to_string();
let second = "second".to_string();
let ref_first = &first;
let ref_second = &second;
let result_ref = fun_single(ref_first, ref_second);
std::mem::drop(second);
println!("{result_ref}")
}
In this version we get the result from a function with single life time parameter. Compiler thinks that two function parameters are related so it picks the smallest lifetime for the reference we return from the function. So it does not compile this version.
But if we just replace the line
let result_ref = fun_single(ref_first, ref_second);
with
let result_ref = fun_double(ref_first, ref_second);
the compiler sees that two lifetimes are independent so even when you drop second result_ref is still valid, the lifetime of the return reference is not the smallest but independent from second parameter and it compiles.
I would expect that structs with multiple lifetimes and functions with multiple lifetimes to behave similarly. But they don't.
What am I missing here?
I assume that compiler tracks the fields of struct seperately.
I think that's the core of your confusion. The compiler does track each lifetime separately, but only statically at compile time, not during runtime. It follows from this that Rust generally can not allow structs to be partially valid.
So, while you do specify two lifetime parameters, the compiler figures that the struct can only be valid as long as both of them are alive: that is, until the shorter-lived one lives.
But then how does the second example work? It relies on an exceptional feature of the compiler, called Partial Moving. That means that whenever you move out of a struct, it allows you to move disjoint parts separately.
It is essentially a syntax sugar for the following:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
let Own{
first: my_first,
second: my_second,
} = my;
std::mem::drop(my_second);
println!("{}", my_first);
}
Note that this too is a static feature, so the following will not compile (even though it would work when run):
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
if false {
std::mem::drop(my.first);
}
println!("{}", my.first)
}
The struct may not be moved as a whole once it has been partially moved, so not even this allows you to have partially valid structs.
A local variable may be partially initialized, such as in your second example. Rust can track this for local variables and give you an error if you attempt to access the uninitialized parts.
However in your first example the variable isn't actually partially initialized, it's fully initialized (you give it both the first and second field). Then, when second goes out of scope, my is still fully initialized, but it's second field is now invalid (but initialized). Thus it doesn't even let the variable exist past when second is dropped to avoid an invalid reference.
Rust could track this since you have 2 lifetimes and name the second lifetime a special 'never that would signal the reference is always invalid, but it currently doesn't.
There have been a fair number of questions around this, and the solution is mostly "use Entry".
However this is an issue because HashMap::entry() requires an owned value meaning possibly expensive copies / allocations even when the key is already present and we just want to update the value in-place, hence the use of get_mut. However the use of get_mut on a reference to a local leads rustc to assume that said reference gets stored into the hashmap, and thus that returning the hashmap is an error:
use std::borrow::Cow;
use std::collections::HashMap;
fn get_string() -> String { String::from("xxxxxxx") }
fn foo() -> HashMap<Cow<'static, str>, usize> {
let mut v = HashMap::new();
// stand-in for "get a string slice as key",
// real case is getting a String from an
// mpsc and the key being a segment of that string
let s = get_string();
// stand-in for a structure which contains an `Option<Cow>`
let k = Cow::from(&s[2..3]);
// because of get_mut, `&s` is apparently considered to be stored in `v`?
if let Some(e) = v.get_mut(&k) {
*e += 1;
} else {
v.insert(Cow::from(k.into_owned()), 0);
}
v
}
Note that the manipulations at lines 9~13 are there to clarify the point of the pattern, but get_mut alone is sufficient to trigger the issue
Is there a way around without the efficiency hit, or is an eager allocation the only way? (note: because this is a static issue, dynamic gates like contains_key or get obviously don't do anything).
According to the docs, HashSet::get_mut() requires a value of type &Q such that the key of the hash implements Borrow<Q>.
The key of your hash is Cow<'static, str>, that implements Borrow<str>. This means that you can use either a &Cow<'static, str> or a &str. But you are passing a &Cow<'local, str> for some 'local lifetime. The compiler tries to match that 'local with 'static and issues a somewhat confusing error message about lifetimes.
The solution is actually easy, because you can get an &str from the Cow either calling k.as_ref() or doing &*k, and the lifetime of the &str is unrestricted: (playground)
let k = Cow::from(&s[2..3]);
if let Some(e) = v.get_mut(k.as_ref()) { /* ...*/ }
I'm trying to make a read-only map of environment variables.
fn os_env_hashmap() -> HashMap<&'static str, &'static str> {
let mut map = HashMap::new();
use std::env;
for (key,val) in env::vars_os() {
let k = key.to_str();
if k.is_none() { continue }
let v = val.to_str();
if v.is_none() { continue }
k.unwrap();
//map.insert( k.unwrap(), v.unwrap() );
}
return map;
}
Can't seem to uncomment the "insert" line near the bottom without compiler errors about key,val,k, and v being local.
I might be able to fix the compiler error by using String instead of str, but str seems perfect for a read-only result.
Feel free to suggest a more idiomatic way to do this.
This is unfortunately not straightforward using only the facilities of the Rust standard library.
If env::vars_os() returned an iterator over &'static OsStr instead of OsString, this would be trivial. Unfortunately, not all platforms allow creating an &OsStr to the contents of an environment variable. In particular, on Windows, the native encoding is UTF-16 but the encoding needed by OsStr is WTF-8. For this reason, there really is no OsStr anywhere you could take a reference to, until you create an OsString by iterating over env::vars_os().
The simplest thing, as the question comments mention, is to return owned Strings:
fn os_env_hashmap() -> HashMap<String, String> {
let mut map = HashMap::new();
use std::env;
for (key, val) in env::vars_os() {
// Use pattern bindings instead of testing .is_some() followed by .unwrap()
if let (Ok(k), Ok(v)) = (key.into_string(), val.into_string()) {
map.insert(k, v);
}
}
return map;
}
The result is not "read-only", but it is not shared, so you cannot cause data races or other weird bugs by mutating it.
See also
Is there any way to return a reference to a variable created in a function?
Return local String as a slice (&str)
Given the following code (which does not compile):
fn main() {
let mut v = vec!();
{
let name = "Bob the Builder".to_string();
v.push(&name);
}
for m in &v{
println!("{}", m);
}
}
I have created a variable binding to a Rust String type which will go out of scope within the first set of curly braces. Is there a way to somehow move the ownership of the String such that the vector itself owns it?
This is an arbitrary example however I'm just trying to understand if this concept is possible.
I already know that if I use a string literal this will be regarded as a static string which will exist for the lifetime of the entire app and therefore this code would compile but I'm just trying to understand if a collection in Rust can own data. I know Rust is not Objective-C but Objective-C has collections with the ability to retain their data.
The vector will own it.. as long as you don't pass a reference to it.
Changing your code to this:
fn main() {
let mut v = vec!();
{
let name = "Bob the Builder".to_string();
v.push(name); // <--- no ampersand
println!("{}", name); // <---- error, use of moved value
}
for m in &v {
println!("{}", m);
}
}
..throws an error because name is now owned by the Vector. If you allow for the fact that the Vector now owns the string.. your code compiles (by removing my println! call):
fn main() {
let mut v = vec!();
{
let name = "Bob the Builder".to_string();
v.push(name); // <--- no ampersand
}
for m in &v {
println!("{}", m); // <--- works fine
}
}
So your problem is that you're passing a reference to your string into the vector. Essentially, at the end of the block your name value will be dropped and your &name reference in the Vector could potentially point to invalid memory.. making v[0].something_here() potentially dangerous. So the compiler stops you. But, if you transfer ownership of the name variable into the vector (by not passing a reference.. but passing the whole thing) then Rust knows to clean the string up when it cleans the Vector up.
My code:
enum MyEnum1 {
//....
}
struct Struct1 {
field1: MyEnum1,
field2: String
}
fn fn1(a: Struct1, b: String, c: String) -> String {
let let1 = fn2(a.field1);
let let2 = fn3(let1, b, c);
format!("{} something 123 {}", let1, let2)
}
fn fn2(a: MyEnum1) -> String {
//....
}
fn fn3(a: MyEnum1, b: Struct1) -> String {
//....
}
error: use of moved value: `a.field1`
error: use of moved value: `let1`
How can I fix them? Should I add & to the parameters of 'fn2andfn3? Ormut`? I can't understand the idea of how to fix these kind of errors.
These errors come from the most important concept in Rust - ownership. You should read the official book, especially the chapter on ownership - this would help you understand "how tho fix this kind of errors".
In short, specifically in your code, the problem is that String is a non-copyable type, that is, String values are not copied when passed to functions or assigned to local variables, they are moved. This means that wherever they were before, they are not accessible from there anymore.
Let's look at your function:
enum MyEnum1 {
//....
}
struct Struct1 {
field1: MyEnum1,
field2: String
}
fn fn1(a: Struct1, b: String, c: String) -> String {
let let1 = fn2(a.field1);
let let2 = fn3(let1, b, c);
format!("{} something 123 {}", let1, let2)
}
fn fn2(a: MyEnum1) -> String {
//....
}
All types here are not automatically copyable (they don't implement Copy trait). String is not copyable because it is a heap-allocated string and copying would need a fresh allocation (an expensive operation which better be not implicit), MyEnum1 is not copyable because it does not implement Copy (with #[deriving(Copy, Clone)], for example; and it is unclear if it can be made copyable because you didn't provide its variants), and Struct1 is not copyable because it contains non-copyable types.
In fn1 you invoke fn2, passing it field1 and getting a String back. Then you immediately passes this String to fn3. Because String is not copyable, whatever is stored in let1 is moved into the called function, making let1 inaccessible. This is what "use of moved value" error is about. (The code you provided can't cause "use of moved value: a.field1" error, so it probably came from the parts you omitted, but the basic idea is absolutely the same)
There are several ways to fix these errors, but the most natural and common one is indeed to use borrowed references. In general if you only want to read some non-copyable value in a function you should pass it there by reference:
fn use_myenum(e: &MyEnum1)
For strings and arrays, however, the better way would be to pass slices:
fn use_str(s: &str) { ... }
let s: String = ...;
use_str(&s); // here &String is automatically converted to &str
You can find more on slices in the book, here.