In the below example:
struct Foo {
a: [u64; 100000],
}
fn foo(mut f: Foo) -> Foo {
f.a[0] = 99999;
f.a[1] = 99999;
println!("{:?}", &mut f as *mut Foo);
for i in 0..f.a[0] {
f.a[i as usize] = 21444;
}
return f;
}
fn main(){
let mut f = Foo {
a:[0;100000]
};
println!("{:?}", &mut f as *mut Foo);
f = foo(f);
println!("{:?}", &mut f as *mut Foo);
}
I find that before and after passing into the function foo, the address of f is different. Why does Rust copy such a big struct everywhere but not actually move it (or achieve this optimization)?
I understand how stack memory works. But with the information provided by ownership in Rust, I think the copy can be avoided. The compiler unnecessarily copies the array twice. Can this be an optimization for the Rust compiler?
A move is a memcpy followed by treating the source as non-existent.
Your big array is on the stack. That's just the way Rust's memory model works: local variables are on the stack. Since the stack space of foo is going away when the function returns, there's nothing else the compiler can do except copy the memory to main's stack space.
In some cases, the compiler can rearrange things so that the move can be elided (source and destination are merged into one thing), but this is an optimization that cannot be relied on, especially for big things.
If you don't want to copy the huge array around, allocate it on the heap yourself, either via a Box<[u64]>, or simply by using Vec<u64>.
Related
I have a 2D Vec and want to get two mutable references from it at the same time, here is the demo code
use std::default::Default;
#[derive(Default, Clone, PartialEq)]
struct Ele {
foo: i32,
bar: f32,
}
fn main() {
let mut data:Vec<Vec<Ele>> = vec![vec![Default::default();100];100];
let a = &mut data[1][2];
let b = &mut data[2][4];
if a != b {
a.foo += b.foo;
b.bar += a.bar;
}
}
Use unsafe code is OK.
You shouldn't try to solve this problem using unsafe, but rather by understanding why the compiler doesn't allow you to do something that looks alright, and what are the available tools to convince it (without hiding it behind a black box and just saying "trust me") it's a genuine thing to do (usually these tools will themselves use unsafe code, but since it's behind a safe boundary it's the burden of the writer of these tools to ensure everything works fine even when the compiler can't figure it out on its own, which is better that having this burden yourself).
In particular, Rust doesn't understand that you are accessing two separate region of memory; being conservative, it just assumes that if you are using a single element of an array, it must consider you are using it all. To make it clear you are talking about two separate pieces of an array, the solution is simply to split the array into two distinct pieces. That way, you make it clear that they are different memory regions:
use std::default::Default;
#[derive(Default, Clone, PartialEq)]
struct Ele {
foo: i32,
bar: f32,
}
fn main() {
let mut data:Vec<Vec<Ele>> = vec![vec![Ele::default();100];100];
let (left, right) = data.split_at_mut(2);
let a = &mut left[1][2];
let b = &mut right[0][4];
if a != b {
a.foo += b.foo;
b.bar += a.bar;
}
}
Note that this will not actually split the vector, it will only give two views over the vector that are disjoint, so it's very efficient.
See the playground.
This question popped into my head (while I wasn't programming), and it actually made me question a lot of things about programming (like in C++, C#, Rust, in particular).
I want to point out, I'm aware there is a similar question on this issue:
Cannot borrow as mutable because it is also borrowed as immutable.
But I believe this question is aiming at a particular situation; a sub-problem. And I want to better understand how to resolve a thing like this in Rust.
The "thing" that I realised recently was that: "If I have a pointer/reference to an element in a dynamic array, and then I add an element, causing the array to expand and reallocate, that would break the pointer. Therefore, I need a special refererence that will always point to the same element even if it re-allocates".
This made me start thinking differently about a lot of things. But outside of that, I am aware that this problem is trivial to experienced c++ programmers. I have simply not come across this situation in my experiences, unfortunately.
So I wanted to see if Rust either had an existing 'special type' for this type of issue, and if not, what would happen if I made my own (for testing). The idea is that this "special pointer" would simply be a pointer to the Vector (List) itself, but also have a i32 field for the index; so it's all bundled under 1 variable that can be 'dereferenced' whenever you need.
Note: "VecPtr" is meant to be a immutable reference.
struct VecPtr<'a, T> {
vec: &'a Vec<T>,
index: usize
}
impl<T: Copy> VecPtr<'_, T> {
pub fn value(&self) -> T {
return self.vec[self.index];
}
}
fn main() {
let mut v = Vec::<i32>::with_capacity(6);
v.push(3);
v.push(1);
v.push(4);
v.push(1);
let r = VecPtr {vec: &v,index: 2};
let n = r.value();
println!("{}",n);
v.push(5); // error!
v.push(9); // error!
v.push(6); // re-allocation triggered // also error!
let n2 = r.value();
println!("{}",n2);
return;
}
So the above example code is showing that you can't have an existing immutable reference while also trying to have a mutable reference at the same time. good!
From what I've read from the other StackOverflow question, one of the reasons for the compiler error is that the Vector could re-allocate it's internal array at any time when it is calling "push". Which would invalidate all references to the internal array.
Which makes 100% sense. So as a programmer, you may desire to still have references to the array, but they are designed to be a bit more safer. Instead of a direct pointer to the internal array, you just have a pointer to the vector itself in question, and include an i32 index so you know the element you are looking at. Which means the dangling pointer issue that would occur at v.push(6); shouldn't happen any more. But yet the compiler still complains about the same issue. Which I understand.
I suppose it's still concerned about the reference to the vector itself, not the internals. Which makes things a bit confusing. Because there are different pointers here that the compiler is looking at and trying to protect. But to be honest, in the example code, the pointer to vec itself looks totally fine. That reference doesn't change at all (and it shouldn't, from what I can tell).
So my question is, is there a practice at which you can tell the compiler your intentions with certain references? So the compiler knows there isn't an issue (other than the unsafe keyword).
Or alternatively, is there a better way to do what I'm trying to do in the example code?
After some more research
It looks like one solution here would be to use reference counting Rc<T>, but I'm not sure that's 100% it.
I would normally not ask this question due to there being a similar existing question, but this one (I think) is investigating a slightly different situation, where someone (or me) would try to resolve an unsafe reference situation, but the compiler still insists there is an issue.
I guess the question comes down to this: would you find this acceptable?
fn main() {
let mut v = Vec::<i32>::with_capacity(6);
v.push(3);
v.push(1);
v.push(4);
v.push(1);
let r = VecPtr { vec: &v, index: 2 };
let n = r.value();
println!("{}",n);
v[2] = -1;
let n2 = r.value(); // This returned 4 just three lines ago and I was
// promised it wouldn't change! Now it's -1.
println!("{}",n2);
}
Or this
fn main() {
let mut v = Vec::<i32>::with_capacity(6);
v.push(3);
v.push(1);
v.push(4);
v.push(1);
let r = VecPtr { vec: &v, index: 2 };
let n = r.value();
println!("{}",n);
v.clear();
let n2 = r.value(); // This exact same thing that worked three lines ago will now panic.
println!("{}",n2);
}
Or, worst of all:
fn main() {
let mut v = Vec::<i32>::with_capacity(6);
v.push(3);
v.push(1);
v.push(4);
v.push(1);
let r = VecPtr { vec: &v, index: 2 };
let n = r.value();
println!("{}",n);
drop(v);
let n2 = r.value(); // Now you do actually have a dangling pointer.
println!("{}",n2);
}
Rust's answer is an emphatic "no" and that is enforced in the type system. It's not just about the unsoundness of dereferencing dangling pointers, it's a core design decision.
Can you tell the compiler your intentions with certain references? Yes! You can tell the compiler whether you want to share your reference, or whether you want to mutate through it. In your case, you've told the compiler that you want to share it. Which means you're not allowed to mutate it anymore. And as the examples above show, for good reason.
For the sake of this, the borrow checker has no notion of the stack or the heap, it doesn't know what types allocate and which don't, or when a Vec resizes. It only knows and cares about moving values and borrowing references: whether they're shared or mutable and for how long they live.
Now, if you want to make your structure work, Rust offers you some possibilities: One of those is RefCell. A RefCell allows you to borrow a mutable reference from an immutable one at the expense of runtime checking that nothing is aliased incorrectly. This together with an Rc can make your VecPtr:
use std::cell::RefCell;
use std::rc::Rc;
struct VecPtr<T> {
vec: Rc<RefCell<Vec<T>>>,
index: usize,
}
impl<T: Copy> VecPtr<T> {
pub fn value(&self) -> T {
return self.vec.borrow()[self.index];
}
}
fn main() {
let v = Rc::new(RefCell::new(Vec::<i32>::with_capacity(6)));
{
let mut v = v.borrow_mut();
v.push(3);
v.push(1);
v.push(4);
v.push(1);
}
let r = VecPtr {
vec: Rc::clone(&v),
index: 2,
};
let n = r.value();
println!("{}", n);
{
let mut v = v.borrow_mut();
v.push(5);
v.push(9);
v.push(6);
}
let n2 = r.value();
println!("{}", n2);
}
I'll leave it to you to look into how RefCell works.
In the below example:
struct Foo {
a: [u64; 100000],
}
fn foo(mut f: Foo) -> Foo {
f.a[0] = 99999;
f.a[1] = 99999;
println!("{:?}", &mut f as *mut Foo);
for i in 0..f.a[0] {
f.a[i as usize] = 21444;
}
return f;
}
fn main(){
let mut f = Foo {
a:[0;100000]
};
println!("{:?}", &mut f as *mut Foo);
f = foo(f);
println!("{:?}", &mut f as *mut Foo);
}
I find that before and after passing into the function foo, the address of f is different. Why does Rust copy such a big struct everywhere but not actually move it (or achieve this optimization)?
I understand how stack memory works. But with the information provided by ownership in Rust, I think the copy can be avoided. The compiler unnecessarily copies the array twice. Can this be an optimization for the Rust compiler?
A move is a memcpy followed by treating the source as non-existent.
Your big array is on the stack. That's just the way Rust's memory model works: local variables are on the stack. Since the stack space of foo is going away when the function returns, there's nothing else the compiler can do except copy the memory to main's stack space.
In some cases, the compiler can rearrange things so that the move can be elided (source and destination are merged into one thing), but this is an optimization that cannot be relied on, especially for big things.
If you don't want to copy the huge array around, allocate it on the heap yourself, either via a Box<[u64]>, or simply by using Vec<u64>.
I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.
I realize this is something of an obscure thing to want to do, but sometimes it's somewhat helpful to be able to store a value in a c library via ffi, and then make a callback to a rust function from c.
In these circumstances I've found situations where I want to have an owned pointer which is stored as a mut * c_void.
You can store a value in C using this method along with forget(), and you can relatively easily resurrect it as a mut * T on the other side, but it's useful to be able to move out of the unsafe pointer and back into 'safe' rust space in this situation.
However, the compiler will always complain about moving the reference out of a & pointer.
Is there a special way around that, to unsafely convert a &T or *T into a ~T?
That may be a little unclear, so here's a tangible example:
extern crate libc;
use libc::c_void;
use std::ops::Drop;
use std::intrinsics::forget;
struct Foo { x: int }
impl Drop for Foo {
fn drop(&mut self) {
println!("We discarded our foo thanks!");
}
}
fn main() {
let mut x = ~Foo { x: 10 };
let mut y = & mut *x as * mut Foo;
println!("Address Y: {:x}", y as uint);
let mut z = y as * mut c_void;
println!("Address Z: {:x}", z as uint);
// Forget x so we don't worry about it
unsafe { forget(x); }
// This would normally happen inside the ffi callback where
// the ffi code discards the void * it was holding and returns it.
unsafe {
let mut res_z = z as * mut Foo;
println!("Ressurected Z: {:x}", z as uint);
let mut res_y = & mut (*res_z);
println!("Ressurected Y: {:x}", y as uint);
let mut res_x:~Foo = ~(*res_y);
}
}
That final line won't compile because:
test.rs:34:28: 34:34 error: cannot move out of dereference of `&mut`-pointer
test.rs:34 let mut res_x:~Foo = ~(*res_y);
If you remove that line the destructor is never invoked; I'm looking for a way to move that memory block back into the 'safe' rust zone and have it collected correctly when the block scope ends.
edit: I suspect swap() is the way to do this, but the exact semantics of doing it right without leaking are still a bit obscure to me. For example in https://gist.github.com/shadowmint/11361488 a ~Foo leaks; the destructor is invoked for the forgotten ~Foo, but instead we leak the local ~Foo created in tmp. :/
Your error happens because ~ is an operator for boxing, not taking a pointer. You can't "take" an owned pointer to something, because creating an owned pointer always means allocation on the heap. Hence your error - ~ tries to move the value out of &mut, which is disallowed.
I believe that for such kind of really unsafe things you'll need cast::transmute():
use std::cast;
// ...
let mut res_x: ~Foo = cast::transmute(res_y);
Your code with this modification works fine and it prints We discarded our foo thanks!. However, you shoud really be careful with cast::transmute(), because it does completely no checks (except memory size of converted types) and will allow you to cast everything to everything.
Is the syntax correct? the line should look like as below, if I'm not wrong.
let mut res_x:~Foo = (*res_y);
I'm a newbie to rust and I just followed this [link]: http://cmr.github.io/blog/2013/08/13/rust-by-concept/