The following C# code compiles fine:
static readonly List<int> list = new List<int>();
static void Main(string[] args)
{
list.Add(1);
list.Add(2);
list.Add(3);
}
If I write similar code in Rust, it won't compile because it cannot borrow immutable v as mutable:
let v = Vec::new();
v.push(1);
v.push(2);
v.push(3);
How does the push function know v is immutable?
All variables are immutable by default. You must explicitly tell the compiler which variables are mutable though the mut keyword:
let mut v = Vec::new();
v.push(1);
v.push(2);
v.push(3);
Vec::push is defined to require a mutable reference to the vector (&mut self):
fn push(&mut self, value: T)
This uses method syntax, but is conceptually the same as:
fn push(&mut Vec<T>, value: T)
I highly recommend that you read The Rust Programming Language, second edition. It covers this beginner question as well as many other beginner questions you will have.
In Rust, bindings are immutable by default. So you might think that the following are equivalent:
readonly List<int> list = new List<int>(); // C#
let list = Vec::new(); // Rust
And these:
List<int> list = new List<int>(); // C#
let mut list = Vec::new(); // Rust
But, as you found out, this isn't quite the case.
Inside the Add method in the C# version, there is no information about the binding that you used to invoke it. The Add method isn't able to declare that it mutates its data so there is no way for the C# compiler to prevent you passing a reference to a readonly binding. The readonly keyword prevents you overwriting the list binding with a completely new List, but it doesn't prevent you changing data inside the one you have. C# prevents you from changing the value of a readonly binding, but the value in this case is a pointer to your data, not the data itself.
In Rust, if a method needs to mutate the underlying data, it must declare its first argument to be self or &mut self.
In the case of self, then the data is moved into the method, and you can no longer use the original binding. It doesn't matter if the method changes the data because the caller can't use that binding any more.
In the case of a mutable reference, &mut self, Rust will only let you create it if the original binding is also mutable. If the original binding is immutable then this will produce a compile error. It is impossible to call v.push if v is immutable, because push expects &mut self.
This can be restrictive, so Rust provides tools that let you fine-tune this behaviour to encode exactly the safety guarantees that you need. If you want to get something close to the C# behaviour, you can use a RefCell wrapper (or one of the several other wrapper types). A RefCell<Vec<T>> doesn't itself have to be mutable for functions to be able to unwrap it and modify the Vec inside.
Related
I'm writing my first rust program and as expected I'm having problems making the borrow checker happy. Here is what I'm trying to do:
I would like to have a function that allocates some array, stores the array in some global data structure, and returns a reference to it. Example:
static mut global_data = ...
fn f() -> &str {
let s = String::new();
global.my_string = s;
return &s;
};
Is there any way to make something like this work? If not, what is "the rust way"(tm) to get an array and a pointer into it?
Alternatively, is there any documentation I could read? The rust book is unfortunately very superficial on most topics.
There are a couple things wrong with your code:
Using global state is very unidiomatic in rust. It can be done in some specific scenarios, but it should never be a go to method. You cold try wrapping your state in Rc or Arc and share it this way in your program. If you also want to mutate this state (as you show in your example) you must to wrap it also in some kind of interior mutability type. So try Rc<RefCell<State>> if you want to use state in only one thread or Arc<Mutex<State>> if you want to use it from multiple different threads.
Accessing mutable static memory is unsafe. So even the following code won't compile:
static mut x: i32 = 0;
// neither of this lines work!
println!("{}", x);
x = 42;
You must use unsafe to access or modify any static mutable variables, because you must de facto prove to the compiler that you assure it that no data races (from accessing this data from different threads) will occur.
I can't be sure, since you didn't show what type is global_data, but I assume, that my_string is a field of type String. When you write
let s = String::new();
global.my_string = s;
You move ownership of that string to the global. You therefore cannot return (or even create) reference to it. You must do this though it's new owner. &global.my_string could work, but not if you do what I written in 1. You could try to return RefMut of MutexGuard, but that is probably not what you want.
Okay, just in case someone else is having the same question, the following code seems to work:
struct foo {
b : Option<Box<u32>>,
}
static mut global : foo = foo { b : None };
fn f<'a>() -> &'a u32 {
let b : Box<u32> = Box::new(5);
unsafe {
global.b = Some(b);
match &global.b {
None => panic!(""),
Some(a) => return &a,
}
}
}
At least it compiles. Hopefully it will also do the right thing when run.
I'm aware that this is not how you are supposed to do things in rust. But I'm currently trying to figure out how to implement various data structures from scratch, and the above is just a reduced example of one of the problems I encountered.
Ownership Tree
Hi,
I was trying to understand ownership concepts in Rust and came across this image (attached in this post) in "Programming Rust" book.
In particular am concerned about the "Borrowing a shared reference" part. In the book, the author says
Values borrowed by shared references are read-only. Across the
lifetime of a shared reference, neither its referent, nor anything
reachable from that referent, can be changed by anything. There exist
no live mutable references to anything in that structure, its owner is
held read-only, and so on. It’s really frozen
In the image, he goes on to highlight the path along the ownership tree that becomes immutable once a shared reference is taken to a particular section of the ownership tree. But what confused me is that the author also mentions that certain other parts of the ownership tree are not read only.
So I tried to test out with this code:
fn main(){
let mut v = Vec::new();
v.push(Vec::new());
v[0].push(vec!["alpha".to_string()]);
v[0].push(vec!["beta".to_string(), "gamma".to_string()]);
let r2 = &(v[0][1]); //Taking a shared reference here
v[0][0].push("pi".to_string());
println!("{:?}", r2)
}
I understand that v[0][0] cannot be mutable because v itself is a immutable shared reference (as a consequence of the shared reference to v[0][1]) and the Rust compiler helpfully points it out. My question is that when the author marks certain parts along the ownership tree as "not read only", how can we access these parts to change them?
If my code snippet is not a correct example for what the author intended to convey, kindly help me with an example that demonstrates what the author is trying to imply here. Thanks.
There are particular cases where you can split borrows, creating simultaneously existing references that can be any mix of mutable and immutable as long as they don't overlap. These are:
Anything where the compiler can statically track the lack of overlap: that is, fields in a struct, tuple, or enum.
Specifically written unsafe code which provides this feature, such as mutable-reference iterators over collections.
Your code as written does not compile because the compiler does not attempt to understand what indexing a Vec does, so it does not possess and cannot use the fact that v[0][0] does not overlap v[0][1].
Here is program which works with a direct translation of the tree shown in the figure:
#[derive(Debug)]
struct Things {
label: &'static str,
a: Option<Box<Things>>,
b: Option<Box<Things>>,
c: Option<Box<Things>>,
}
fn main() {
// Construct depicted structure
let mut root = Box::new(Things {
label: "root",
a: None,
b: None,
c: Some(Box::new(Things {
label: "root.c",
a: None,
b: None,
c: None,
})),
});
// "Borrowing a shared reference"
// .as_ref().unwrap() gets `&Things` out of `&Option<Things>`
// (there are several other ways this could be done)
let shared_reference = &root.c.as_ref().unwrap();
let mutable_reference = &mut root.a;
// Now, root and root.a are in the "inaccessible" state because they are
// borrowed. (We could still create an &root.b reference).
// Mutate while the shared reference must still exist
dbg!(shared_reference);
*mutable_reference = Some(Box::new(Things {
label: "new",
a: None,
b: None,
c: None,
}));
dbg!(shared_reference);
// Now the references are not used any more, so we can access the root.
// Let's look at the change we made.
dbg!(root);
}
This program is accepted by the compiler because it understands that struct fields do not overlap, so the root may be split.
It is possible to split borrows of vectors — just not with the indexing operator. You can do it with pattern matching, mutable iteration, or with .split_at_mut(). Here's that last option, which is the most “random access” capable one:
fn main() {
let mut v = Vec::new();
v.push(Vec::new());
v[0].push(vec!["alpha".to_string()]);
v[0].push(vec!["beta".to_string(), "gamma".to_string()]);
let (half1, half2): (&mut [Vec<String>], &mut [Vec<String>]) =
v[0].split_at_mut(1);
let r1 = &mut half1[0];
let r2 = &half2[0];
r1.push("pi".to_string());
println!("{:?}", r2);
}
This program works because split_at_mut() contains unsafe code that specifically creates two non-overlapping slices. This is one of the fundamental tools of Rust: using unsafe inside of libraries to create sound abstractions that wouldn't be possible using just the concepts the compiler understands.
With a pattern match instead, it would be:
if let [r1, r2] = &mut *v[0] {
r1.push("pi".to_string());
println!("{:?}", r2);
} else {
// Pattern failed because the length did not match
panic!("oops, v was not two elements long");
}
This compiles because the compiler understands that pattern-matching a slice (or a struct, or anything else matchable) creates non-overlapping references to each element. (Pattern matching is implemented by the compiler and never runs Rust code to make decisions about the structure being matched.)
(This version has an explicit failure branch; the previous version would panic on the split_at_mut() or on half2[0] if v[0] was too short.)
Someone should probably check my answer, as I am fairly new to Rust myself.
But...
I think this is because a Vec doesn't uphold the same invariance as, say, a tuple or nested structs.
Here's a tuple version of the example you gave (Although tuples don't support pushing, so I'm just incrementing an integer):
fn main() {
let mut v = (((1, 3), (5)));
let r2 = &v.0.1; //Taking a shared reference here
let v2 = &mut v.0.0;
*v2 += 1;
println!("{:?}", r2);
}
The above compiles. But if you attempt to borrow: let r2 = &v.0.0;, you'll get the same error as before.
Now, if you want to actually use nested vectors for trees. There are some crates to help with that, which do not incur runtime costs. Namely token_cell (or its inspiration, ghost_cell):
https://docs.rs/token-cell/1.1.0/token_cell/index.html
https://docs.rs/ghost-cell/latest/ghost_cell/
Here's the example with a token_cell wrapping the vec tree structure:
use token_cell::*;
generate_static_token!(Token);
fn main() {
let mut token = Token::new();
let token2 = Token::new();
let v = TokenCell::new(vec![vec![
vec!["beta".to_string()],
vec!["gamma".to_string()],
]]);
let r2 = &v.borrow(&token2)[0][1]; //Taking a shared reference here
v.borrow_mut(&mut token)[0][0].push("pi".to_string());
println!("{:?}", r2)
}
I hope this clears some confusion up at least.
I got some trouble to return the reference of the value in a HashMap<String,String> which is wrappered by Arc and Mutex for sharing between threads. The code is like this:
use std::sync::{Arc,Mutex};
use std::collections::HashMap;
struct Hey{
a:Arc<Mutex<HashMap<String, String>>>
}
impl Hey {
fn get(&self,key:&String)->&String{
self.a.lock().unwrap().get(key).unwrap()
}
}
As shown above, the code failed to compile because of returns a value referencing data owned by the current function. I know that lock() returns MutexGuard which is a local variable. But How could I achieve this approach to get a reference to the value in HashMap. If I can't, what is the motivation of Rust to forbidden this?
Let me explain why rustc thinks that your code is wrong.
You can interact with value protected by Mutex only when you have lock on it.
Lock handled by RAII guard.
So, I desugar your code:
fn get(&self,key:&String)->&String{
let lock = self.a.lock().unwrap();
let reference = lock.get(key).unwrap();
drop(lock); // release your lock
// We return reference to data which doesn't protected by Mutex!
// Someone can delete item from hashmap and you would read deleted data
// Use-After-Free is UB so rustc forbid that
return reference;
}
Probably you need to use Arcs as values:
#[derive(Default)]
struct Hey{
a:Arc<RwLock<HashMap<String, Arc<String>>>>
}
fn get(&self,key:&String)->Arc<String>{
self.a.lock().unwrap().get(key).unwrap().clone()
}
P.S.
Also, you can use Arc<str> (and I would recommend that), which would save you from extra pointer indirection. It can be built from String: let arc: Arc<str> = my_string.into(); or Arc::from(my_string)
TLDR;
Since, you made the design decision of wrapping your data i.e. HashMap<String, String> in Arc<Mutex<..>> I am assuming you need to share this data across threads/tasks in a thread safe manner. That is the primary use case for this design choice.
So, my suggestion for anyone reading this today isn't a direct answer(returning reference) to this question rather to change the design such that you return an owned data using something like .to_owned() method on the result from the get function call.
fn get(&self, key: &String) -> String {
let lock = self.a.lock().unwrap(); // #1 Returns MutexGuard
let val = lock.get(key).unwrap();
val.to_owned()
}
Long Form
In the original code snipped, there are actually 2 problems at hand, though only 1 is mentioned in the question.
cannot return value referencing temporary value
returns a value referencing data owned by the current function
Let's try to dig deeper into each of these one by one.
Problem 1
cannot return value referencing temporary value
The temporary value here is referring to MutexGuard. The lock method doesn't return us the reference to the HashMap rather a MutexGuard wrapped around the HashMap. The reason why .get() works on MutexGuard is because it implements DeRef::deref trait. Essentially, it means that MutexGuard can deref into the value it wraps when needed. This deref happens when we call the .get() method.
We can understand the temporary nature of the mutexguard by diving deeper into how the deref method is implemented under the hood.
fn deref<'a>(&'a self) -> &'a T
This means that MutexGuard can only return reference to HashMap for as long as it is alive. Notice the elided lifetime 'a. But since, we don't store the MutexGuard into any local variable rather directly dereference it the rust compiler thinks that it gets dropped right after the get call.
The lifetime of the HashMap will be same as MutexGuard. Any result will share the lifetime of the HashMap. Hence, the value/result from .get() method gets dropped instantly.
Solution 1: Store the MutexGuard locally
If we store the mutexguard in a local variable using the let binding. Then the HashMap also has the lifetime of the function scope and the reference/result also has the same lifetime.
let lock = self.a.lock().unwrap(); // storing MutexGuard in local binding
let val = lock.get(key).unwrap(); // val can live as long as lock is alive which is function's lifetime
With this issue fixed, there is just one problem left.
Problem 2
returns a value referencing data owned by the current function
Since, we take the lock in the current function scope, the reference returned from the get function will only be alive for as long as the lock is alive. When we return the reference from the function the compiler will start screaming back at us with the error of data ownership is only valid in the current function scope. It makes sense also, since we only asked(indirectly) for the lock to be active in this function scope. It is semantically wrong to expect the reference to be valid outside the scope of this function.
Solution 2: Change in approach
The whole idea of using Arc and Mutex is to add the capability to update the data between multiple threads safely. This thread safety is provided by the Mutex which enables locking mechanism on the wrapped data, in your case HashMap.
As pointed out by #Abhijit-K, It's not a good design to take the reference of any value outside the scope of the lock.
As explained very nicely in the post by #Angelico the lock is dropped within the scope of the function.
Case 1: modifying wrapped data
Only fetch the wrapped value where you have to make changes to the data. Basically, take the lock where you want to change the data, do it in the same scope.
Basically, you pass around the cloned Arc between functions to start with. That is the power of Arc, it can give you many cloned references pointing to the same data on the heap.
Case 2: reading wrapped data
Take a cloned value instead of the reference. Change the approach to return String from &String.
You need to clone the ARC and move the clone ARC to another thread/task. From the clone you can lock and access it. I suggest use RwLock instead of Mutex if there are more accesses than writes.
When you clone ARC you are not cloning the underlying object just the ARC. Also in your case you need to wrap the struct into ARC or change the design, as it is ARC that should be cloned and moved
Approach to share the object should be via guard I believe. With RWLock multiple can read map via the guards:
use async_std::task;
use std::sync::{Arc,RwLock, RwLockReadGuard, RwLockWriteGuard};
use std::collections::HashMap;
#[derive(Default)]
struct Hey{
a:Arc<RwLock<HashMap<String, String>>>
}
impl Hey {
pub fn read(&self) -> RwLockReadGuard<'_, HashMap<String, String>> {
self.a.read().unwrap()
}
pub fn write(&self) -> RwLockWriteGuard<'_, HashMap<String, String>> {
self.a.write().unwrap()
}
}
fn main() {
let h = Hey{..Default::default()};
h.write().insert("k1".to_string(), "v1".to_string());
println!("{:?}", h.read().get("k1"));
task::block_on(async move {
println!("{:?}", h.read().get("k1"));
});
}
To add the elements of two Vecs I wrote a function like
fn add_components(dest: &mut Vec<i32>, first: &Vec<i32>, second: &Vec<i32>){
for i in 0..first.len() {
dest[i] = first[i] + second[i];
}
}
And this works fine when dest is another Vec.
let mut new_components = Vec::with_capacity(components.len());
Vector::add_components(&mut new_comps, &components, &other_components);
But it blows up when I am trying to add in-place:
Vector::add_components(&mut components, &components, &other_components);
because now I borrow components as mutable and immutable at the same time. But this obviously is what I am trying to achieve.
Are there any conventional and general (meaning not only concerning Vecs) solutions to this problem which don't involve unsafe code and pointer magic?
Another example of this problem:
Suppose I want to overload AddAssign for a numeric type like
impl AddAssign<Output=&NumericType> for NumericType {
fn add_assign(&mut self, other: &NumericType) {
unimplemented!() // concrete implementation is not important
}
}
Notice that I want to take a reference as second argument to avoid copying. This works fine when adding two different objects, but adding an object to itself creates the exact same scenario:
let mut num = NumericType{};
num += &num
I am borrowing num mutably and immutably at the same time. So obviously this should work and is safe, but it also is against Rust's borrowing rules.
What are the best practices (apart from copying of course) to deal with this issue, which arises in many forms?
There is no generic solution to this. Rust can't generically abstract over mutability in borrow checking.
You will need to have two versions of the function for in-place and destination versions.
Rust has strict aliasing rules, so dest[i] = first[i] + second[i] actually compiles to different code depending on whether the compiler has a guarantee that dest and first are different. Don't try to fudge it with unsafe, because it will be Undefined Behavior and will get miscompiled.
I understand you're not allowed to create two mutable references to an object at once in Rust. I don't entirely understand why the following code works:
fn main() {
let mut string = String::from("test");
let mutable_reference: &mut String = &mut string;
mutable_reference.push_str(" test");
// as I understand it, this creates a new mutable reference (2nd?)
test(&mut *mutable_reference);
println!("{}", mutable_reference);
}
fn test(s: &mut String) {
s.push_str(" test");
}
The rule
There shall only be one usable mutable reference to a particular value at any point in time.
This is NOT a spatial exclusion (there CAN be multiple references to the same piece) but a temporal exclusion.
The mechanism
In order to enforce this, &mut T is NOT Copy; therefore calling:
test(mutable_reference);
should move the reference into test.
Actually doing this would make it unusable later on and not be very ergonomic, so the Rust compiler inserts an automatic reborrowing, much like you did yourself:
test(&mut *mutable_reference);
You can force the move if you wanted to:
test({ let x = mutable_reference; x });
The effect
Re-borrowing is, in essence, just borrowing:
mutable_reference is borrowed for as long as the unnamed temporary mutable reference exists (or anything that borrows from it),
the unnamed temporary mutable reference is moved into test,
at the of expression, the unnamed temporary mutable reference is destroyed, and therefore the borrow of mutable_reference ends.
Is there more than one mutable pointer somewhere in memory referring to the same location? Yes.
Is there more than one mutable pointer in the code to the same location which is usable? No.
Re-borrowing a mutable pointer locks out the one you're re-borrowing from.
I analyzed the differences in MIR between test(mutable_reference) and test(&mut *mutable_reference). It appears that the latter only introduces an additional level of indirection:
When you are using an extra dereference in the function call, it doesn't create a mutable reference that holds; in other words, it doesn't cause any actual difference outside the function call.