returning heap-allocated value from function? - rust

Rust beginner here. Why does this code work??
fn make_string() -> String {
let x = String::from("world");
x
}
fn main() {
let s = make_string();
}
The way I understand the rules, when the closing bracket of make_string is encountered, the value of x should be dropped, since x is the owner and goes out of scope. You could argue that ownership is transferred, but that seems to happen after x goes out of scope.
Is this a special case for functions, or am I fundamentally misunderstanding the rules?

When you return a value from a function, you move the value, essentially transferring ownership from that function to the parent function, which has a larger scope. Since the scope is now larger, the value won't be dropped.
See:
Rust by Example: Ownership and moves
The Rust Programming Language: What is Ownership?

Related

How does rust deal with the ownership of the returned string?

This is the code I have:
fn test_function() -> String {
String::from("")
}
fn main() {
test_function();
println!("Hello");
}
I was expecting rust to complain about test_function return value not being assigned, but it just works.
How are the rules of ownership applied here?
I was expeting rust to complain about test_function return value not being assigned, but it just works.
Why would it complain? Rust's type system is affine, meaning values can be used at most once. Using them 0 times is valid (you can mark types and functions as #[must_use] to trigger a warning, but even then a simple let _ = ... will silence it)
How are the rules of ownership applied here?
The string is moved out of the function, then dropped by the caller since nothing is keeping it alive.
Without optimizations, the return value of
fn test_function()
Is just immediately dropped, and it's memory freed.
Since it its ownership is relinquished.
Before none-lexical-lifetimes it would have been freed at the end of fn main()
Ownership is passed to the main function, which drops it when returning. The fact that the value is unassigned does not matter.

How can I swap out the value of a mutable reference, temporarily taking ownership? [duplicate]

This question already has answers here:
Temporarily move out of borrowed content
(3 answers)
Closed 1 year ago.
I have a function that takes ownership of some data, modifies it destructively, and returns it.
fn transform(s: MyData) -> MyData {
todo!()
}
In some situations, I have a &mut MyData reference. I would like to apply transform to &mut MyData.
fn transform_mut(data_ref: &mut MyData) {
*data_ref = transform(*data_ref);
}
Rust Playground
However, this causes a compiler error.
error[E0507]: cannot move out of `*data_ref` which is behind a mutable reference
--> src/lib.rs:10:27
|
10 | *data_ref = transform(*data_ref);
| ^^^^^^^^^ move occurs because `*data_ref` has type `MyData`, which does not implement the `Copy` trait
I considered using mem::swap and mem::replace, but they require that you already have some valid value to put into the reference before taking another one out.
Is there any way to accomplish this? MyData doesn't have a sensible default or dummy value to temporarily stash in the reference. It feels like because I have exclusive access the owner shouldn't care about the transformation, but my intuition might be wrong here.
It feels like because I have exclusive access the owner shouldn't care about the transformation, but my intuition might be wrong here.
The problem with this idea is that if this were allowed, and the function transform panicked, there is no longer a valid value in *data_ref, which is visible if the unwind is caught or inside of Drop handling for the memory data_ref points into.
For example, let's implement this in the obvious fashion, by just copying the data out of the referent and back in:
use std::ptr;
fn naive_modify_in_place<T>(place: &mut T, f: fn(T) -> T) {
let mut value = unsafe { ptr::read(place) };
value = f(value);
unsafe { ptr::write(place, value) };
}
fn main() {
let mut x = Box::new(1234);
naive_modify_in_place(&mut x, |x| panic!("oops"));
}
If you run this program (Rust Playground link), it will crash with a “double free” error. This is because unwinding from the panicking function dropped its argument, and then unwinding from main dropped x — which is the same box, already deallocated.
There are a couple of crates designed specifically to solve this problem:
take_mut, and replace_with intended to improve on it. (I haven't used either, particularly.) Both of these offer two options for dealing with panics:
Force an abort (program immediately exits with no ability to handle the panic or clean up anything else).
Replace the referent of the reference with a different freshly computed value, since the previous one was lost when the panic started.
Of course, if you have no valid alternative value then you can't take option 2. In that case, you might want to consider bypassing this situation entirely by adding a placeholder: if you can store an Option<MyData> and pass an &mut Option<MyData>, then your code can use Option::take to temporarily remove the value and leave None in its place. The None will only ever be visible if there was a panic, and if your code isn't catching panics then it will never matter. But it does mean that every access to the data requires retrieving it from the Option (e.g. using .as_ref().unwrap()).
You could derive (or impl) Clone for MyData, then pass the clone into your destructive transform.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e3a4d737a700ef6baa35160545f0e514
Cloning removes the problem of the data being behind a reference.

Why do we need Rc<T> when immutable references can do the job?

To illustrate the necessity of Rc<T>, the Book presents the following snippet (spoiler: it won't compile) to show that we cannot enable multiple ownership without Rc<T>.
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let a = Cons(5, Box::new(Cons(10, Box::new(Nil))));
let b = Cons(3, Box::new(a));
let c = Cons(4, Box::new(a));
}
It then claims (emphasis mine)
We could change the definition of Cons to hold references instead, but then we would have to specify lifetime parameters. By specifying lifetime parameters, we would be specifying that every element in the list will live at least as long as the entire list. The borrow checker wouldn’t let us compile let a = Cons(10, &Nil); for example, because the temporary Nil value would be dropped before a could take a reference to it.
Well, not quite. The following snippet compiles under rustc 1.52.1
enum List<'a> {
Cons(i32, &'a List<'a>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let a = Cons(5, &Cons(10, &Nil));
let b = Cons(3, &a);
let c = Cons(4, &a);
}
Note that by taking a reference, we no longer need a Box<T> indirection to hold the nested List. Furthermore, I can point both b and c to a, which gives a multiple conceptual owners (which are actually borrowers).
Question: why do we need Rc<T> when immutable references can do the job?
With "ordinary" borrows you can very roughly think of a statically proven order-by-relationship, where the compiler needs to prove that the owner of something always comes to life before any borrows and always dies after all borrows died (a owns String, it comes to life before b which borrows a, then b dies, then a dies; valid). For a lot of use-cases, this can be done, which is Rust's insight to make the borrow-system practical.
There are cases where this can't be done statically. In the example you've given, you're sort of cheating, because all borrows have a 'static-lifetime; and 'static items can be "ordered" before or after anything out to infinity because of that - so there actually is no constraint in the first place. The example becomes much more complex when you take different lifetimes (many List<'a>, List<'b>, etc.) into account. This issue will become apparent when you try to pass values into functions and those functions try to add items. This is because values created inside functions will die after leaving their scope (i.e. when the enclosing function returns), so we cannot keep a reference to them afterwards, or there will be dangling references.
Rc comes in when one can't prove statically who is the original owner, whose lifetime starts before any other and ends after any other(!). A classic example is a graph structure derived from user input, where multiple nodes can refer to one other node. They need to form a "born after, dies before" relationship with the node they are referencing at runtime, to guarantee that they never reference invalid data. The Rc is a very simple solution to that because a simple counter can represent these relationships. As long as the counter is not zero, some "born after, dies before" relationship is still active. The key insight here is that it does not matter in which order the nodes are created and die because any order is valid. Only the points on either end - where the counter gets to 0 - are actually important, any increase or decrease in between is the same (0=+1+1+1-1-1-1=0 is the same as 0=+1+1-1+1-1-1=0) The Rc is destroyed when the counter reaches zero. In the graph example this is when a node is not being referred to any longer. This tells the owner of that Rc (the last node referring) "Oh, it turns out I am the owner of the underlying node - nobody knew! - and I get to destroy it".
Even single-threaded, there are still times the destruction order is determined dynamically, whereas for the borrow checker to work, there must be a determined lifetime tree (stack).
fn run() {
let writer = Rc::new(std::io::sink());
let mut counters = vec![
(7, Rc::clone(&writer)),
(7, writer),
];
while !counters.is_empty() {
let idx = read_counter_index();
counters[idx].0 -= 1;
if counters[idx].0 == 0 {
counters.remove(idx);
}
}
}
fn read_counter_index() -> usize {
unimplemented!()
}
As you can see in this example, the order of destruction is determined by user input.
Another reason to use smart pointers is simplicity. The borrow checker does incur some code complexity. For example, using smart pointer, you are able to maneuver around the self-referential struct problem with a tiny overhead.
struct SelfRefButDynamic {
a: Rc<u32>,
b: Rc<u32>,
}
impl SelfRefButDynamic {
pub fn new() -> Self {
let a = Rc::new(0);
let b = Rc::clone(&a);
Self { a, b }
}
}
This is not possible with static (compile-time) references:
struct WontDo {
a: u32,
b: &u32,
}

What happens to the stack when a value is moved in Rust? [duplicate]

In Rust, there are two possibilities to take a reference
Borrow, i.e., take a reference but don't allow mutating the reference destination. The & operator borrows ownership from a value.
Borrow mutably, i.e., take a reference to mutate the destination. The &mut operator mutably borrows ownership from a value.
The Rust documentation about borrowing rules says:
First, any borrow must last for a scope no greater than that of the
owner. Second, you may have one or the other of these two kinds of
borrows, but not both at the same time:
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
I believe that taking a reference is creating a pointer to the value and accessing the value by the pointer. This could be optimized away by the compiler if there is a simpler equivalent implementation.
However, I don't understand what move means and how it is implemented.
For types implementing the Copy trait it means copying e.g. by assigning the struct member-wise from the source, or a memcpy(). For small structs or for primitives this copy is efficient.
And for move?
This question is not a duplicate of What are move semantics? because Rust and C++ are different languages and move semantics are different between the two.
Semantics
Rust implements what is known as an Affine Type System:
Affine types are a version of linear types imposing weaker constraints, corresponding to affine logic. An affine resource can only be used once, while a linear one must be used once.
Types that are not Copy, and are thus moved, are Affine Types: you may use them either once or never, nothing else.
Rust qualifies this as a transfer of ownership in its Ownership-centric view of the world (*).
(*) Some of the people working on Rust are much more qualified than I am in CS, and they knowingly implemented an Affine Type System; however contrary to Haskell which exposes the math-y/cs-y concepts, Rust tends to expose more pragmatic concepts.
Note: it could be argued that Affine Types returned from a function tagged with #[must_use] are actually Linear Types from my reading.
Implementation
It depends. Please keep in mind than Rust is a language built for speed, and there are numerous optimizations passes at play here which will depend on the compiler used (rustc + LLVM, in our case).
Within a function body (playground):
fn main() {
let s = "Hello, World!".to_string();
let t = s;
println!("{}", t);
}
If you check the LLVM IR (in Debug), you'll see:
%_5 = alloca %"alloc::string::String", align 8
%t = alloca %"alloc::string::String", align 8
%s = alloca %"alloc::string::String", align 8
%0 = bitcast %"alloc::string::String"* %s to i8*
%1 = bitcast %"alloc::string::String"* %_5 to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 24, i32 8, i1 false)
%2 = bitcast %"alloc::string::String"* %_5 to i8*
%3 = bitcast %"alloc::string::String"* %t to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 24, i32 8, i1 false)
Underneath the covers, rustc invokes a memcpy from the result of "Hello, World!".to_string() to s and then to t. While it might seem inefficient, checking the same IR in Release mode you will realize that LLVM has completely elided the copies (realizing that s was unused).
The same situation occurs when calling a function: in theory you "move" the object into the function stack frame, however in practice if the object is large the rustc compiler might switch to passing a pointer instead.
Another situation is returning from a function, but even then the compiler might apply "return value optimization" and build directly in the caller's stack frame -- that is, the caller passes a pointer into which to write the return value, which is used without intermediary storage.
The ownership/borrowing constraints of Rust enable optimizations that are difficult to reach in C++ (which also has RVO but cannot apply it in as many cases).
So, the digest version:
moving large objects is inefficient, but there are a number of optimizations at play that might elide the move altogether
moving involves a memcpy of std::mem::size_of::<T>() bytes, so moving a large String is efficient because it only copies a couple bytes whatever the size of the allocated buffer they hold onto
When you move an item, you are transferring ownership of that item. That's a key component of Rust.
Let's say I had a struct, and then I assign the struct from one variable to another. By default, this will be a move, and I've transferred ownership. The compiler will track this change of ownership and prevent me from using the old variable any more:
pub struct Foo {
value: u8,
}
fn main() {
let foo = Foo { value: 42 };
let bar = foo;
println!("{}", foo.value); // error: use of moved value: `foo.value`
println!("{}", bar.value);
}
how it is implemented.
Conceptually, moving something doesn't need to do anything. In the example above, there wouldn't be a reason to actually allocate space somewhere and then move the allocated data when I assign to a different variable. I don't actually know what the compiler does, and it probably changes based on the level of optimization.
For practical purposes though, you can think that when you move something, the bits representing that item are duplicated as if via memcpy. This helps explain what happens when you pass a variable to a function that consumes it, or when you return a value from a function (again, the optimizer can do other things to make it efficient, this is just conceptually):
// Ownership is transferred from the caller to the callee
fn do_something_with_foo(foo: Foo) {}
// Ownership is transferred from the callee to the caller
fn make_a_foo() -> Foo { Foo { value: 42 } }
"But wait!", you say, "memcpy only comes into play with types implementing Copy!". This is mostly true, but the big difference is that when a type implements Copy, both the source and the destination are valid to use after the copy!
One way of thinking of move semantics is the same as copy semantics, but with the added restriction that the thing being moved from is no longer a valid item to use.
However, it's often easier to think of it the other way: The most basic thing that you can do is to move / give ownership away, and the ability to copy something is an additional privilege. That's the way that Rust models it.
This is a tough question for me! After using Rust for a while the move semantics are natural. Let me know what parts I've left out or explained poorly.
Rust's move keyword always bothers me so, I decided to write my understanding which I obtained after discussion with my colleagues.
I hope this might help someone.
let x = 1;
In the above statement, x is a variable whose value is 1. Now,
let y = || println!("y is a variable whose value is a closure");
So, move keyword is used to transfer the ownership of a variable to the closure.
In the below example, without move, x is not owned by the closure. Hence x is not owned by y and available for further use.
let x = 1;
let y = || println!("this is a closure that prints x = {}". x);
On the other hand, in this next below case, the x is owned by the closure. x is owned by y and not available for further use.
let x = 1;
let y = move || println!("this is a closure that prints x = {}". x);
By owning I mean containing as a member variable. The example cases above are in the same situation as the following two cases. We can also assume the below explanation as to how the Rust compiler expands the above cases.
The formar (without move; i.e. no transfer of ownership),
struct ClosureObject {
x: &u32
}
let x = 1;
let y = ClosureObject {
x: &x
};
The later (with move; i.e. transfer of ownership),
struct ClosureObject {
x: u32
}
let x = 1;
let y = ClosureObject {
x: x
};
Please let me answer my own question. I had trouble, but by asking a question here I did Rubber Duck Problem Solving. Now I understand:
A move is a transfer of ownership of the value.
For example the assignment let x = a; transfers ownership: At first a owned the value. After the let it's x who owns the value. Rust forbids to use a thereafter.
In fact, if you do println!("a: {:?}", a); after the letthe Rust compiler says:
error: use of moved value: `a`
println!("a: {:?}", a);
^
Complete example:
#[derive(Debug)]
struct Example { member: i32 }
fn main() {
let a = Example { member: 42 }; // A struct is moved
let x = a;
println!("a: {:?}", a);
println!("x: {:?}", x);
}
And what does this move mean?
It seems that the concept comes from C++11. A document about C++ move semantics says:
From a client code point of view, choosing move instead of copy means that you don't care what happens to the state of the source.
Aha. C++11 does not care what happens with source. So in this vein, Rust is free to decide to forbid to use the source after a move.
And how it is implemented?
I don't know. But I can imagine that Rust does literally nothing. x is just a different name for the same value. Names usually are compiled away (except of course debugging symbols). So it's the same machine code whether the binding has the name a or x.
It seems C++ does the same in copy constructor elision.
Doing nothing is the most efficient possible.
Passing a value to function, also results in transfer of ownership; it is very similar to other examples:
struct Example { member: i32 }
fn take(ex: Example) {
// 2) Now ex is pointing to the data a was pointing to in main
println!("a.member: {}", ex.member)
// 3) When ex goes of of scope so as the access to the data it
// was pointing to. So Rust frees that memory.
}
fn main() {
let a = Example { member: 42 };
take(a); // 1) The ownership is transfered to the function take
// 4) We can no longer use a to access the data it pointed to
println!("a.member: {}", a.member);
}
Hence the expected error:
post_test_7.rs:12:30: 12:38 error: use of moved value: `a.member`
let s1:String= String::from("hello");
let s2:String= s1;
To ensure memory safety, rust invalidates s1, so instead of being shallow copy, this called a Move
fn main() {
// Each value in rust has a variable that is called its owner
// There can only be one owner at a time.
let s=String::from('hello')
take_ownership(s)
println!("{}",s)
// Error: borrow of moved value "s". value borrowed here after move. so s cannot be borrowed after a move
// when we pass a parameter into a function it is the same as if we were to assign s to another variable. Passing 's' moves s into the 'my_string' variable then `println!("{}",my_string)` executed, "my_string" printed out. After this scope is done, some_string gets dropped.
let x:i32 = 2;
makes_copy(x)
// instead of being moved, integers are copied. we can still use "x" after the function
//Primitives types are Copy and they are stored in stack because there size is known at compile time.
println("{}",x)
}
fn take_ownership(my_string:String){
println!('{}',my_string);
}
fn makes_copy(some_integer:i32){
println!("{}", some_integer)
}

How do I transform &str to ~str in Rust?

This is for the current 0.6 Rust trunk by the way, not sure the exact commit.
Let's say I want to for each over some strings, and my closure takes a borrowed string pointer argument (&str). I want my closure to add its argument to an owned vector of owned strings ~[~str] to be returned. My understanding of Rust is weak, but I think that strings are a special case where you can't dereference them with * right? How do I get my strings from &str into the vector's push method which takes a ~str?
Here's some code that doesn't compile
fn read_all_lines() -> ~[~str] {
let mut result = ~[];
let reader = io::stdin();
let util = #reader as #io::ReaderUtil;
for util.each_line |line| {
result.push(line);
}
result
}
It doesn't compile because it's inferring result's type to be [&str] since that's what I'm pushing onto it. Not to mention its lifetime will be wrong since I'm adding a shorter-lived variable to it.
I realize I could use ReaderUtil's read_line() method which returns a ~str. But this is just an example.
So, how do I get an owned string from a borrowed string? Or am I totally misunderstanding.
You should call the StrSlice trait's method, to_owned, as in:
fn read_all_lines() -> ~[~str] {
let mut result = ~[];
let reader = io::stdin();
let util = #reader as #io::ReaderUtil;
for util.each_line |line| {
result.push(line.to_owned());
}
result
}
StrSlice trait docs are here:
http://static.rust-lang.org/doc/core/str.html#trait-strslice
You can't.
For one, it doesn't work semantically: a ~str promises that only one thing owns it at a time. But a &str is borrowed, so what happens to the place you borrowed from? It has no way of knowing that you're trying to steal away its only reference, and it would be pretty rude to trash the caller's data out from under it besides.
For another, it doesn't work logically: ~-pointers and #-pointers are allocated in completely different heaps, and a & doesn't know which heap, so it can't be converted to ~ and still guarantee that the underlying data lives in the right place.
So you can either use read_line or make a copy, which I'm... not quite sure how to do :)
I do wonder why the API is like this, when & is the most restricted of the pointers. ~ should work just as well here; it's not like the iterated strings already exist somewhere else and need to be borrowed.
At first I thought it was possible to use copy line to create owning pointer from the borrowed pointer to the string but this apparently copies burrowed pointer.
So I found str::from_slice(s: &str) -> ~str. This is probably what you need.

Resources