I want to develop a library for boolean formulas with Rust and I'm pretty new to Rust.
The idea is to have immutable formulas which are created and cached by a (obviously mutable) formula factory. So a user would first create a formula factory and then use it to create formulas which are returned as references.
The problem is that the compiler basically does not let me create more than one formula, because this would mean that there is more than one mutable borrow of the formula factory object.
let mut f = FormulaFactory::new();
let a = f.variable("a");
let b = f.variable("b"); // error: cannot borrow `f` as mutable more than once at a time
let ab = f.and(a, b);
I understand this violation of rules, but on the other hand I think that in this case everything would be ok (at least in a single-threaded setting). Is there a simple way to get around this problem or do I rather have to think about a different, more rust-compatible approach?
Some more information: 'static lifetime is not an option in the targeted scenario. The user might want to create multiple formula factories and especially drop them if the formulas are no longer needed.
Just for reference a minimal example (strongly simplified – obviously a Formula will also have a formula type, in this example there are only variables and conjunctions):
#![feature(hash_set_entry)]
use std::collections::HashSet;
#[derive(PartialEq, Eq, Hash)]
pub struct Formula<'a> {
variable: Option<&'a str>,
operands: Vec<&'a Formula<'a>>,
}
pub struct FormulaFactory<'a> {
variables: HashSet<Formula<'a>>,
conjunctions: HashSet<Formula<'a>>,
}
impl<'a> FormulaFactory<'a> {
pub fn new() -> FormulaFactory<'a> {
FormulaFactory {
variables: HashSet::new(),
conjunctions: HashSet::new(),
}
}
pub fn variable(&mut self, name: &'a str) -> &Formula<'a> {
(&mut self.variables).get_or_insert(Formula{variable: Some(name), operands: vec![]})
}
pub fn and(&mut self, op1: &'a Formula<'a>, op2: &'a Formula<'a>) -> &Formula<'a> {
(&mut self.conjunctions).get_or_insert(Formula{variable: None, operands: vec![op1, op2]})
}
}
fn main() {
let mut f = FormulaFactory::new();
let a = f.variable("a");
let b = f.variable("b"); // error: cannot borrow `f` as mutable more than once at a time
let ab = f.and(a, b);
println!("{}", ab.operands[0].variable.unwrap())
}
The variable a is a reference contained in the f object. As long as you have references of a you cannot modify f as it already aliased by a. I think the best way to approach this would be to have a Vec of Formula in the FormulaFactory struct called formulas (for the sake of simplicity) and give out only FormulaIndex objects which is just a usize representing the index of the Formula within the formulas field. This is the same approach that petgraph takes where the nodes field in a Graph contains a Vec of Node but the Graph api only gives out NodeIndex objects.
Related
This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Closed 8 months ago.
The following is a snippet of a more complicated code, the idea is loading a SQL table and setting a hashmap with one of the table struct fields as the key and keeping the structure as the value (implementation details are not important since the code works fine if I clone the String, however, the Strings in the DB can be arbitrarily long and cloning can be expensive).
The following code will fail with
error[E0382]: use of partially moved value: `foo`
--> src/main.rs:24:35
|
24 | foo_hashmap.insert(foo.a, foo);
| ----- ^^^ value used here after partial move
| |
| value partially moved here
|
= note: partial move occurs because `foo.a` has type `String`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0382`.
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
}
fn main() {
let foo_1 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashMap::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(foo.a, foo); // foo.a.clone() will make this compile
});
}
The struct Foo cannot implement Copy since its fields are String. I tried wrapping foo.a with Rc::new(RefCell::new()) but later went down the pitfall of missing the trait Hash for RefCell<String>, so currently I'm not certain in either using something else for the struct fields (will Cow work?), or to handle that logic within the for_each loop.
There are at least two problems here: First, the resulting HashMap<K, V> would be a self-referential struct, as the K borrows V; there are many questions and answers on SA about the pitfalls of this. Second, even if you could construct such a HashMap, you'd easily break the guarantees provided by HashMap, which allows you to modify V while assuming that K always stays constant: There is no way to get a &mut K for a HashMap, but you can get a &mut V; if K is actually a &V, one could easily modify K through V (by ways of mutating Foo.a ) and break the map.
One possibility is to change Foo.a from a String to a Rc<str>, which you can clone with minimal runtime cost in order to put the value both in the K and into V. As Rc<str> is Borrow<str>, you can still look up values in the map by means of &str. This still has the - theoretical - downside that you can break the map by getting a &mut Foo from the map and std::mem::swap the a, which makes it impossible to look up the correct value from its keys; but you'd have to do that deliberately.
Another option is to actually use a HashSet instead of a HashMap, and use a newtype for Foo which behaves like a Foo.a. You'd have to implement PartialEq, Eq, Hash (and Borrow<str> for good measure) like this:
use std::collections::HashSet;
#[derive(Debug)]
struct Foo {
a: String,
b: String,
}
/// A newtype for `Foo` which behaves like a `str`
#[derive(Debug)]
struct FooEntry(Foo);
/// `FooEntry` compares to other `FooEntry` only via `.a`
impl PartialEq<FooEntry> for FooEntry {
fn eq(&self, other: &FooEntry) -> bool {
self.0.a == other.0.a
}
}
impl Eq for FooEntry {}
/// It also hashes the same way as a `Foo.a`
impl std::hash::Hash for FooEntry {
fn hash<H>(&self, hasher: &mut H)
where
H: std::hash::Hasher,
{
self.0.a.hash(hasher);
}
}
/// Due to the above, we can implement `Borrow`, so now we can look up
/// a `FooEntry` in the Set using &str
impl std::borrow::Borrow<str> for FooEntry {
fn borrow(&self) -> &str {
&self.0.a
}
}
fn main() {
let foo_1 = Foo {
a: "foo".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "foobar".to_string(),
b: "barfoo".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashSet::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(FooEntry(foo));
});
// Look up `Foo` using &str as keys...
println!("{:?}", foo_hashmap.get("foo").unwrap().0);
println!("{:?}", foo_hashmap.get("foobar").unwrap().0);
}
Notice that HashSet provides no way to get a &mut FooEntry due to the reasons described above. You'd have to use RefCell (and read what the docs of HashSet have to say about this).
The third option is to simply clone() the foo.a as you described. Given the above, this is probably the most simple solution. If using an Rc<str> doesn't bother you for other reasons, this would be my choice.
Sidenote: If you don't need to modify a and/or b, a Box<str> instead of String is smaller by one machine word.
How does one take an existing data structure (vec, hashmap, set) and extend the methods on it via an external library?
fn main() {
let vec = vec![1, 2, 3];
vec.my_new_method(...)
}
You can take advantage of the fact that a crate defining a trait can implement that trait on whatever type it wants. Here's a simple example of a combined "shift and then push" function. First it will shift the first element out of the vector (if there is one), then it will push the argument on to the end of the vector. If there was a shifted element, it is returned. (This is a bit of a silly operation, but works to demonstrate this technique.)
First we declare the trait, with the signature of the method(s) we want to add:
trait VecExt<T> {
fn shift_and_push(&mut self, v: T) -> Option<T>;
}
Now we can implement the trait for Vec<T>:
impl<T> VecExt<T> for Vec<T> {
fn shift_and_push(&mut self, v: T) -> Option<T> {
let r = if !self.is_empty() { Some(self.remove(0)) } else { None };
self.push(v);
r
}
}
Now, anywhere that VecExt is brought into scope with use (or by being in the same source file as its declaration) this extension method can be used on any vector.
Why can I have multiple mutable references to a static type in the same scope?
My code:
static mut CURSOR: Option<B> = None;
struct B {
pub field: u16,
}
impl B {
pub fn new(value: u16) -> B {
B { field: value }
}
}
struct A;
impl A {
pub fn get_b(&mut self) -> &'static mut B {
unsafe {
match CURSOR {
Some(ref mut cursor) => cursor,
None => {
CURSOR= Some(B::new(10));
self.get_b()
}
}
}
}
}
fn main() {
// first creation of A, get a mutable reference to b and change its field.
let mut a = A {};
let mut b = a.get_b();
b.field = 15;
println!("{}", b.field);
// second creation of A, a the mutable reference to b and change its field.
let mut a_1 = A {};
let mut b_1 = a_1.get_b();
b_1.field = 16;
println!("{}", b_1.field);
// Third creation of A, get a mutable reference to b and change its field.
let mut a_2 = A {};
let b_2 = a_2.get_b();
b_2.field = 17;
println!("{}", b_1.field);
// now I can change them all
b.field = 1;
b_1.field = 2;
b_2.field = 3;
}
I am aware of the borrowing rules
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
In the above code, I have a struct A with the get_b() method for returning a mutable reference to B. With this reference, I can mutate the fields of struct B.
The strange thing is that more than one mutable reference can be created in the same scope (b, b_1, b_2) and I can use all of them to modify B.
Why can I have multiple mutable references with the 'static lifetime shown in main()?
My attempt at explaining this is behavior is that because I am returning a mutable reference with a 'static lifetime. Every time I call get_b() it is returning the same mutable reference. And at the end, it is just one identical reference. Is this thought right? Why am I able to use all of the mutable references got from get_b() individually?
There is only one reason for this: you have lied to the compiler. You are misusing unsafe code and have violated Rust's core tenet about mutable aliasing. You state that you are aware of the borrowing rules, but then you go out of your way to break them!
unsafe code gives you a small set of extra abilities, but in exchange you are now responsible for avoiding every possible kind of undefined behavior. Multiple mutable aliases are undefined behavior.
The fact that there's a static involved is completely orthogonal to the problem. You can create multiple mutable references to anything (or nothing) with whatever lifetime you care about:
fn foo() -> (&'static i32, &'static i32, &'static i32) {
let somewhere = 0x42 as *mut i32;
unsafe { (&*somewhere, &*somewhere, &*somewhere) }
}
In your original code, you state that calling get_b is safe for anyone to do any number of times. This is not true. The entire function should be marked unsafe, along with copious documentation about what is and is not allowed to prevent triggering unsafety. Any unsafe block should then have corresponding comments explaining why that specific usage doesn't break the rules needed. All of this makes creating and using unsafe code more tedious than safe code, but compared to C where every line of code is conceptually unsafe, it's still a lot better.
You should only use unsafe code when you know better than the compiler. For most people in most cases, there is very little reason to create unsafe code.
A concrete reminder from the Firefox developers:
My goal was to implement the suggested improvement on the cacher struct of the rust book chapter 13.1, that is creating a struct which takes a function and uses memoization to reduce the number of calls of the given function. To do this, I created a struct with an HashMap
struct Cacher<T, U, V>
where T: Fn(&U) -> V, U: Eq + Hash
{
calculation: T,
map: HashMap<U,V>,
}
and two methods, one constructor and one which is resposible of the memoization.
impl<T, U, V> Cacher<T, U, V>
where T: Fn(&U) -> V, U: Eq + Hash
{
fn new(calculation: T) -> Cacher<T,U,V> {
Cacher {
calculation,
map: HashMap::new(),
}
}
fn value(&mut self, arg: U) -> &V {
match self.map.entry(arg){
Entry::Occupied(occEntry) => occEntry.get(),
Entry::Vacant(vacEntry) => {
let argRef = vacEntry.key();
let result = (self.calculation)(argRef);
vacEntry.insert(result)
}
}
}
}
I used the Entry enum, because I didn't found a better way of deciding if the HashMap contains a key and - if it doesn't - calculating the value and inserting it into the HashMap as well as returning a reference to it.
If I want to compile the code above, I get an error which says that occEntry is borrowed by it's .get() method (which is fine by me) and that .get() "returns a value referencing data owned by the current function".
My understanding is that the compiler thinks that the value which occEntry.get() is referencing to is owned by the function value(...). But shouldn't I get a reference of the value of type V, which is owned by the HashMap? Is the compiler getting confused because the value is owned by the function and saved as result for a short moment?
let result = (self.calculation)(argRef);
vacEntry.insert(result)
Please note that it is necessary to save the result temporarily because the insert method consumes the key and such argRef is not valid anymore. Also I acknowledge that the signature of value can be problematic (see Mutable borrow from HashMap and lifetime elision) but I tried to avoid a Copy Trait Bound.
For quick reproduction of the problem I append the use statements necessary. Thanks for your help.
use std::collections::HashMap;
use std::cmp::Eq;
use std::hash::Hash;
use std::collections::hash_map::{OccupiedEntry, VacantEntry, Entry};
Let's take a look at OccupiedEntry::get()'s signature:
pub fn get(&self) -> &V
What this signature is telling us is that the reference obtained from the OccupiedEntry can only live as long as the OccupiedEntry itself. However, the OccupiedEntry is a local variable, thus it's dropped when the function returns.
What we want is a reference whose lifetime is bound to the HashMap's lifetime. Both Entry and OccupiedEntry have a lifetime parameter ('a), which is linked to the &mut self parameter in HashMap::entry. We need a method on OccupiedEntry that returns a &'a V. There's no such method, but there's one that returns a '&a mut V: into_mut. A mutable reference can be implicitly coerced to a shared reference, so all we need to do to make your method compile is to replace get() with into_mut().
fn value(&mut self, arg: U) -> &V {
match self.map.entry(arg) {
Entry::Occupied(occ_entry) => occ_entry.into_mut(),
Entry::Vacant(vac_entry) => {
let arg_ref = vac_entry.key();
let result = (self.calculation)(arg_ref);
vac_entry.insert(result)
}
}
}
I have a struct representing a grid of data, and accessors for the rows and columns. I'm trying to add accessors for the rows and columns which return iterators instead of Vec.
use std::slice::Iter;
#[derive(Debug)]
pub struct Grid<Item : Copy> {
raw : Vec<Vec<Item>>
}
impl <Item : Copy> Grid <Item>
{
pub fn new( data: Vec<Vec<Item>> ) -> Grid<Item> {
Grid{ raw : data }
}
pub fn width( &self ) -> usize {
self.rows()[0].len()
}
pub fn height( &self ) -> usize {
self.rows().len()
}
pub fn rows( &self ) -> Vec<Vec<Item>> {
self.raw.to_owned()
}
pub fn cols( &self ) -> Vec<Vec<Item>> {
let mut cols = Vec::new();
for i in 0..self.height() {
let col = self.rows().iter()
.map( |row| row[i] )
.collect::<Vec<Item>>();
cols.push(col);
}
cols
}
pub fn rows_iter( &self ) -> Iter<Vec<Item>> {
// LIFETIME ERROR HERE
self.rows().iter()
}
pub fn cols_iter( &self ) -> Iter<Vec<Item>> {
// LIFETIME ERROR HERE
self.cols().iter()
}
}
Both functions rows_iter and cols_iter have the same problem: error: borrowed value does not live long enough. I've tried a lot of things, but pared it back to the simplest thing to post here.
You can use the method into_iter which returns std::vec::IntoIter. The function iter usually only borrows the data source iterated over. into_iter has ownership of the data source. Thus the vector will live as long as the actual data.
pub fn cols_iter( &self ) -> std::vec::IntoIter<Vec<Item>> {
self.cols().intoiter()
}
However, I think that the design of your Grid type could be improved a lot. Always cloning a vector is not a good thing (to name one issue).
Iterators only contain borrowed references to the original data structure; they don't take ownership of it. Therefore, a vector must live longer than an iterator on that vector.
rows and cols allocate and return a new Vec. rows_iter and cols_iter are trying to return an iterator on a temporary Vec. This Vec will be deallocated before rows_iter or cols_iter return. That means that an iterator on that Vec must be deallocated before the function returns. However, you're trying to return the iterator from the function, which would make the iterator live longer than the end of the function.
There is simply no way to make rows_iter and cols_iter compile as is. I believe these methods are simply unnecessary, since you already provide the public rows and cols methods.