Returning a struct containing mutable values

Returning a struct containing mutable values - rust

I have the following code, where I'm trying to return the struct Foo with a set of default values for the field values. These values may be changed later. But the compiler complains:
error: `initial` does not live long enough
How this could be achieved? Any alternatives?
struct Foo <'a> {
values: &'a mut Vec<i32>,
}
impl <'a> Foo <'a> {
fn new() -> Foo <'a> {
let initial = vec![1, 2];
Foo { values: &mut initial }
}
}
let my_foo = Foo::new();
my_foo.values.push(3);

There are two problems here.
The first is that you don't need to use &mut to make a structure field mutable. Mutability is inherited in Rust. That is, if you have a Foo stored in a mutable variable (let mut f: Foo), its fields are mutable; if it's in an immutable variable (let f: Foo), its fields are immutable. The solution is to just use:
struct Foo {
values: Vec<i32>,
}
and return a Foo by value.
The second problem (and the source of the actual compilation error) is that you're trying to return a borrow to something you created in the function. This is impossible. No, there is no way around it; you can't somehow extend the lifetime of initial, returning initial as well as the borrow won't work. Really. This is one of the things Rust was specifically designed to absolutely forbid.
If you want to transfer something out of a function, one of two things must be true:
It is being stored somewhere outside the function that will outlive the current call (as in, you were given a borrow as an argument; returning doesn't count), or
You are returning ownership, not just a borrowed reference.
The corrected Foo works because it owns the Vec<i32>.

Related

Rust multiple lifetimes in structs

I have problems with understanding the behavior and availability of structs with multiple lifetime parameters. Consider the following:
struct My<'a,'b> {
first: &'a String,
second: &'b String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = My{
first: &first,
second: &second
}
}
println!("{}", my.first)
}
The error message says that
|
13 | second: &second
| ^^^^^^^ borrowed value does not live long enough
14 | }
15 | }
| - `second` dropped here while still borrowed
16 | println!("{}", my.first)
| -------- borrow later used here
First, I do not access the .second element of the struct. So, I do not see the problem.
Second, the struct has two life time parameters. I assume that compiler tracks the fields of struct seperately.
For example the following compiles fine:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
std::mem::drop(my.second);
println!("{}", my.first)
}
Which means that even though, .second of the struct is dropped that does not invalidate the whole struct. I can still access the non-dropped elements.
Why doesn't the same the same work for structs with references?
The struct has two independent lifetime parameters. Just like a struct with two type parameters are independent of each other, I would expect that these two lifetimes are independent as well. But the error message suggest that in the case of lifetimes these are not independent. The resultant struct does not have two lifetime parameters but only one that is the smaller of the two.
If the validity of struct containing two references limited to the lifetime of reference with the smallest lifetime, then my question is what is the difference between
struct My1<'a,'b>{
f: &'a X,
s: &'b Y,
}
and
struct My2<'a>{
f: &'a X,
s: &'a Y
}
I would expect that structs with multiple lifetime parameters to behave similar to functions with multiple lifetime parameters. Consider these two functions
fn fun_single<'a>(x:&'a str, y: &'a str) -> &'a str {
if x.len() <= y.len() {&x[0..1]} else {&y[0..1]}
}
fn fun_double<'a,'b>(x: &'a str, y:&'b str) -> &'a str {
&x[0..1]
}
fn main() {
let first = "first".to_string();
let second = "second".to_string();
let ref_first = &first;
let ref_second = &second;
let result_ref = fun_single(ref_first, ref_second);
std::mem::drop(second);
println!("{result_ref}")
}
In this version we get the result from a function with single life time parameter. Compiler thinks that two function parameters are related so it picks the smallest lifetime for the reference we return from the function. So it does not compile this version.
But if we just replace the line
let result_ref = fun_single(ref_first, ref_second);
with
let result_ref = fun_double(ref_first, ref_second);
the compiler sees that two lifetimes are independent so even when you drop second result_ref is still valid, the lifetime of the return reference is not the smallest but independent from second parameter and it compiles.
I would expect that structs with multiple lifetimes and functions with multiple lifetimes to behave similarly. But they don't.
What am I missing here?

I assume that compiler tracks the fields of struct seperately.
I think that's the core of your confusion. The compiler does track each lifetime separately, but only statically at compile time, not during runtime. It follows from this that Rust generally can not allow structs to be partially valid.
So, while you do specify two lifetime parameters, the compiler figures that the struct can only be valid as long as both of them are alive: that is, until the shorter-lived one lives.
But then how does the second example work? It relies on an exceptional feature of the compiler, called Partial Moving. That means that whenever you move out of a struct, it allows you to move disjoint parts separately.
It is essentially a syntax sugar for the following:
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
let Own{
first: my_first,
second: my_second,
} = my;
std::mem::drop(my_second);
println!("{}", my_first);
}
Note that this too is a static feature, so the following will not compile (even though it would work when run):
struct Own {
first: String,
second: String
}
fn main() {
let my;
let first = "first".to_string();
{
let second = "second".to_string();
my = Own{
first: first,
second: second
}
}
if false {
std::mem::drop(my.first);
}
println!("{}", my.first)
}
The struct may not be moved as a whole once it has been partially moved, so not even this allows you to have partially valid structs.

A local variable may be partially initialized, such as in your second example. Rust can track this for local variables and give you an error if you attempt to access the uninitialized parts.
However in your first example the variable isn't actually partially initialized, it's fully initialized (you give it both the first and second field). Then, when second goes out of scope, my is still fully initialized, but it's second field is now invalid (but initialized). Thus it doesn't even let the variable exist past when second is dropped to avoid an invalid reference.
Rust could track this since you have 2 lifetimes and name the second lifetime a special 'never that would signal the reference is always invalid, but it currently doesn't.

Struct property as key and the struct itself as value in HashMaps - Rust [duplicate]

This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Closed 8 months ago.
The following is a snippet of a more complicated code, the idea is loading a SQL table and setting a hashmap with one of the table struct fields as the key and keeping the structure as the value (implementation details are not important since the code works fine if I clone the String, however, the Strings in the DB can be arbitrarily long and cloning can be expensive).
The following code will fail with
error[E0382]: use of partially moved value: `foo`
--> src/main.rs:24:35
|
24 | foo_hashmap.insert(foo.a, foo);
| ----- ^^^ value used here after partial move
| |
| value partially moved here
|
= note: partial move occurs because `foo.a` has type `String`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0382`.
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
}
fn main() {
let foo_1 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashMap::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(foo.a, foo); // foo.a.clone() will make this compile
});
}
The struct Foo cannot implement Copy since its fields are String. I tried wrapping foo.a with Rc::new(RefCell::new()) but later went down the pitfall of missing the trait Hash for RefCell<String>, so currently I'm not certain in either using something else for the struct fields (will Cow work?), or to handle that logic within the for_each loop.

There are at least two problems here: First, the resulting HashMap<K, V> would be a self-referential struct, as the K borrows V; there are many questions and answers on SA about the pitfalls of this. Second, even if you could construct such a HashMap, you'd easily break the guarantees provided by HashMap, which allows you to modify V while assuming that K always stays constant: There is no way to get a &mut K for a HashMap, but you can get a &mut V; if K is actually a &V, one could easily modify K through V (by ways of mutating Foo.a ) and break the map.
One possibility is to change Foo.a from a String to a Rc<str>, which you can clone with minimal runtime cost in order to put the value both in the K and into V. As Rc<str> is Borrow<str>, you can still look up values in the map by means of &str. This still has the - theoretical - downside that you can break the map by getting a &mut Foo from the map and std::mem::swap the a, which makes it impossible to look up the correct value from its keys; but you'd have to do that deliberately.
Another option is to actually use a HashSet instead of a HashMap, and use a newtype for Foo which behaves like a Foo.a. You'd have to implement PartialEq, Eq, Hash (and Borrow<str> for good measure) like this:
use std::collections::HashSet;
#[derive(Debug)]
struct Foo {
a: String,
b: String,
}
/// A newtype for `Foo` which behaves like a `str`
#[derive(Debug)]
struct FooEntry(Foo);
/// `FooEntry` compares to other `FooEntry` only via `.a`
impl PartialEq<FooEntry> for FooEntry {
fn eq(&self, other: &FooEntry) -> bool {
self.0.a == other.0.a
}
}
impl Eq for FooEntry {}
/// It also hashes the same way as a `Foo.a`
impl std::hash::Hash for FooEntry {
fn hash<H>(&self, hasher: &mut H)
where
H: std::hash::Hasher,
{
self.0.a.hash(hasher);
}
}
/// Due to the above, we can implement `Borrow`, so now we can look up
/// a `FooEntry` in the Set using &str
impl std::borrow::Borrow<str> for FooEntry {
fn borrow(&self) -> &str {
&self.0.a
}
}
fn main() {
let foo_1 = Foo {
a: "foo".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "foobar".to_string(),
b: "barfoo".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashSet::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(FooEntry(foo));
});
// Look up `Foo` using &str as keys...
println!("{:?}", foo_hashmap.get("foo").unwrap().0);
println!("{:?}", foo_hashmap.get("foobar").unwrap().0);
}
Notice that HashSet provides no way to get a &mut FooEntry due to the reasons described above. You'd have to use RefCell (and read what the docs of HashSet have to say about this).
The third option is to simply clone() the foo.a as you described. Given the above, this is probably the most simple solution. If using an Rc<str> doesn't bother you for other reasons, this would be my choice.
Sidenote: If you don't need to modify a and/or b, a Box<str> instead of String is smaller by one machine word.

How do I return a reference to an iterator using conservative_impl_trait?

I have a petgraph::Graph structure onto which I have imposed a tree structure by giving every node weight a parent_edge_idx which is an Option<EdgeIdx> of the edge that connects from its parent to itself.
I need to iterate over a node's children. I need the edge weight of the connecting edge and the node weight of the child.
I wanted to factor that iteration into a helper function that returns a reference to an Iterator<Item = (EdgeIdx, NodeIdx)>. I want to do this cost-free; since I have to borrow self.search_tree in order to do this, the iterator is only valid for the lifetime of self.
Is this a reasonable function to want to write?
Is it possible to write this function?
Any gated features are ok; I'm on nightly.
fn children<'a>(
&'a mut self,
node_idx: NodeIdx,
) -> &'a impl Iterator<Item = (EdgeIdx, NodeIdx)> {
&self.search_tree.neighbors(node_idx).map(|child_idx| {
let node = self.search_tree.node_weight(child_idx).unwrap();
let edge_idx = node.parent_edge_idx.unwrap();
(edge_idx, child_idx)
})
}

How to return an iterator is already covered in this question.
Note that you don't need to return a reference: you want to return an iterator value directly, so if we remove the first & in both the method body and the return type, that's closer to what we need.
We will use impl Iterator so that we don't have to name the actual iterator type exactly. Just note (code below) that we need to use the impl Iterator<..> + 'a syntax, which means that the (anonymous) iterator contains references valid to use for at least the lifetime 'a.
We can't use &mut self here! Note that we need to borrow self.search_tree twice: once for the .neighbors() iterator and once for the self.search_tree that's used in the map closure. Multiple borrowing is incompatible with mutable references.
We put move as the capture mode on the closure, so that it captures the self reference directly, and not by reference (this is important so that we can return the iterator and the closure.
Petgraph specific, but we replace g.node_weight(node_index).unwrap() with just &g[node_index] which is equivalent, but the latter is easier to read.
Here is a reproduction of your code, but with modifications along 1-5 to make it compile:
#![feature(conservative_impl_trait)]
extern crate petgraph;
use petgraph::Graph;
use petgraph::graph::{NodeIndex, EdgeIndex};
struct Foo {
search_tree: Graph<Node, i32>,
}
struct Node {
parent_edge_idx: Option<EdgeIndex>,
}
impl Foo {
fn children<'a>(&'a self, node_idx: NodeIndex)
-> impl Iterator<Item = (EdgeIndex, NodeIndex)> + 'a
{
self.search_tree.neighbors(node_idx).map(move |child_idx| {
let node = &self.search_tree[child_idx];
let edge_idx = node.parent_edge_idx.unwrap();
(edge_idx, child_idx)
})
}
}

Can a type know when a mutable borrow to itself has ended?

I have a struct and I want to call one of the struct's methods every time a mutable borrow to it has ended. To do so, I would need to know when the mutable borrow to it has been dropped. How can this be done?

Disclaimer: The answer that follows describes a possible solution, but it's not a very good one, as described by this comment from Sebastien Redl:
[T]his is a bad way of trying to maintain invariants. Mostly because dropping the reference can be suppressed with mem::forget. This is fine for RefCell, where if you don't drop the ref, you will simply eventually panic because you didn't release the dynamic borrow, but it is bad if violating the "fraction is in shortest form" invariant leads to weird results or subtle performance issues down the line, and it is catastrophic if you need to maintain the "thread doesn't outlive variables in the current scope" invariant.
Nevertheless, it's possible to use a temporary struct as a "staging area" that updates the referent when it's dropped, and thus maintain the invariant correctly; however, that version basically amounts to making a proper wrapper type and a kind of weird way to use it. The best way to solve this problem is through an opaque wrapper struct that doesn't expose its internals except through methods that definitely maintain the invariant.
Without further ado, the original answer:
Not exactly... but pretty close. We can use RefCell<T> as a model for how this can be done. It's a bit of an abstract question, but I'll use a concrete example to demonstrate. (This won't be a complete example, but something to show the general principles.)
Let's say you want to make a Fraction struct that is always in simplest form (fully reduced, e.g. 3/5 instead of 6/10). You write a struct RawFraction that will contain the bare data. RawFraction instances are not always in simplest form, but they have a method fn reduce(&mut self) that reduces them.
Now you need a smart pointer type that you will always use to mutate the RawFraction, which calls .reduce() on the pointed-to struct when it's dropped. Let's call it RefMut, because that's the naming scheme RefCell uses. You implement Deref<Target = RawFraction>, DerefMut, and Drop on it, something like this:
pub struct RefMut<'a>(&'a mut RawFraction);
impl<'a> Deref for RefMut<'a> {
type Target = RawFraction;
fn deref(&self) -> &RawFraction {
self.0
}
}
impl<'a> DerefMut for RefMut<'a> {
fn deref_mut(&mut self) -> &mut RawFraction {
self.0
}
}
impl<'a> Drop for RefMut<'a> {
fn drop(&mut self) {
self.0.reduce();
}
}
Now, whenever you have a RefMut to a RawFraction and drop it, you know the RawFraction will be in simplest form afterwards. All you need to do at this point is ensure that RefMut is the only way to get &mut access to the RawFraction part of a Fraction.
pub struct Fraction(RawFraction);
impl Fraction {
pub fn new(numerator: i32, denominator: i32) -> Self {
// create a RawFraction, reduce it and wrap it up
}
pub fn borrow_mut(&mut self) -> RefMut {
RefMut(&mut self.0)
}
}
Pay attention to the pub markings (and lack thereof): I'm using those to ensure the soundness of the exposed interface. All three types should be placed in a module by themselves. It would be incorrect to mark the RawFraction field pub inside Fraction, since then it would be possible (for code outside the module) to create an unreduced Fraction without using new or get a &mut RawFraction without going through RefMut.
Supposing all this code is placed in a module named frac, you can use it something like this (assuming Fraction implements Display):
let f = frac::Fraction::new(3, 10);
println!("{}", f); // prints 3/10
f.borrow_mut().numerator += 3;
println!("{}", f); // prints 3/5
The types encode the invariant: Wherever you have Fraction, you can know that it's fully reduced. When you have a RawFraction, &RawFraction, etc., you can't be sure. If you want, you may also make RawFraction's fields non-pub, so that you can't get an unreduced fraction at all except by calling borrow_mut on a Fraction.

Basically the same thing is done in RefCell. There you want to reduce the runtime borrow-count when a borrow ends. Here you want to perform an arbitrary action.
So let's re-use the concept of writing a function that returns a wrapped reference:
struct Data {
content: i32,
}
impl Data {
fn borrow_mut(&mut self) -> DataRef {
println!("borrowing");
DataRef { data: self }
}
fn check_after_borrow(&self) {
if self.content > 50 {
println!("Hey, content should be <= {:?}!", 50);
}
}
}
struct DataRef<'a> {
data: &'a mut Data
}
impl<'a> Drop for DataRef<'a> {
fn drop(&mut self) {
println!("borrow ends");
self.data.check_after_borrow()
}
}
fn main() {
let mut d = Data { content: 42 };
println!("content is {}", d.content);
{
let b = d.borrow_mut();
//let c = &d; // Compiler won't let you have another borrow at the same time
b.data.content = 123;
println!("content set to {}", b.data.content);
} // borrow ends here
println!("content is now {}", d.content);
}
This results in the following output:
content is 42
borrowing
content set to 123
borrow ends
Hey, content should be <= 50!
content is now 123
Be aware that you can still obtain an unchecked mutable borrow with e.g. let c = &mut d;. This will be silently dropped without calling check_after_borrow.

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

If I have a struct that encapsulates two members, and updates one based on the other, that's fine as long as I do it this way:
struct A {
value: i64
}
impl A {
pub fn new() -> Self {
A { value: 0 }
}
pub fn do_something(&mut self, other: &B) {
self.value += other.value;
}
pub fn value(&self) -> i64 {
self.value
}
}
struct B {
pub value: i64
}
struct State {
a: A,
b: B
}
impl State {
pub fn new() -> Self {
State {
a: A::new(),
b: B { value: 1 }
}
}
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(&self.b);
self.a.value()
}
pub fn get_b(&self) -> &B {
&self.b
}
}
fn main() {
let mut state = State::new();
println!("{}", state.do_stuff());
}
That is, when I directly refer to self.b. But when I change do_stuff() to this:
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(self.get_b());
self.a.value()
}
The compiler complains: cannot borrow `*self` as immutable because `self.a` is also borrowed as mutable.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()? Must I make a function that returns b by value and store it in a binding, then pass that binding to do_something()? What if b is complex?
More importantly to my understanding, what kind of memory-unsafety is the compiler saving me from here?

A key aspect of mutable references is that they are guaranteed to be the only way to access a particular value while they exist (unless they're reborrowed, which "disables" them temporarily).
When you write
self.a.do_something(&self.b);
the compiler is able to see that the borrow on self.a (which is taken implicitly to perform the method call) is distinct from the borrow on self.b, because it can reason about direct field accesses.
However, when you write
self.a.do_something(self.get_b());
then the compiler doesn't see a borrow on self.b, but rather a borrow on self. That's because lifetime parameters on method signatures cannot propagate such detailed information about borrows. Therefore, the compiler cannot guarantee that the value returned by self.get_b() doesn't give you access to self.a, which would create two references that can access self.a, one of them being mutable, which is illegal.
The reason field borrows don't propagate across functions is to simplify type checking and borrow checking (for machines and for humans). The principle is that the signature should be sufficient for performing those tasks: changing the implementation of a function should not cause errors in its callers.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()?
I would move get_b from State to B and call get_b on self.b. This way, the compiler can see the distinct borrows on self.a and self.b and will accept the code.
self.a.do_something(self.b.get_b());

Yes, the compiler isolates functions for the purposes of the safety checks it makes. If it didn't, then every function would essentially have to be inlined everywhere. No one would appreciate this for at least two reasons:
Compile times would go through the roof, and many opportunities for parallelization would have to be discarded.
Changes to a function N calls away could affect the current function. See also Why are explicit lifetimes needed in Rust? which touches on the same concept.
what kind of memory-unsafety is the compiler saving me from here
None, really. In fact, it could be argued that it's creating false positives, as your example shows.
It's really more of a benefit for preserving programmer sanity.
The general advice that I give and follow when I encounter this problem is that the compiler is guiding you to discovering a new type in your existing code.
Your particular example is a bit too simplified for this to make sense, but if you had struct Foo(A, B, C) and found that a method on Foo needed A and B, that's often a good sign that there's a hidden type composed of A and B: struct Foo(Bar, C); struct Bar(A, B).
This isn't a silver bullet as you can end up with methods that need each pair of data, but in my experience it works the majority of the time.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Returning a struct containing mutable values - rust

Related

Rust multiple lifetimes in structs

Struct property as key and the struct itself as value in HashMaps - Rust [duplicate]

How do I return a reference to an iterator using conservative_impl_trait?

Can a type know when a mutable borrow to itself has ended?

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

Categories

Resources