I know that a String mainly consists of a pointer that contains the address to its allocated place in the heap memory. Rust prohibits any copies of Strings to avoid double free errors, so it introduced borrowing, where the code basically only copies the pointer value without copying the value in the heap.
However, integer types are stored in the stack and hence do not have a pointer. Yet it is still possible to create a reference to an integer:
let i: i64 = 42;
let j = &i;
Since an integer contains no reference to the heap, isn't a borrowed integer simply a regular copy of it? E.g. is there any difference between j = i and j = &i?
Consider the case if the reference were mutable:
let mut i: u64 = 42;
let j = &mut i;
*j = 5;
println!("{}", i);
5
It should be obvious from this demonstration that j is not simply a copy. It does reference and therefore modify the original i.
Integer types are stored in the stack and hence do not have a pointer.
Not sure where you got that idea from. If it exists in memory, then it has an address within that memory, and therefore you can have a pointer (or reference) that points to it. The properties of a u64 do not change depending on where it is.
The comparison to strings may be tripping you up:
let s = String::from("hello world");
let s_ref: &String = &s;
let str_ref: &str = s.as_str();
If you have a String variable s, and take a reference to it, s_ref, it does not point directly to the heap, it points to the variable s on the stack. There is a slice-type str that represents a region of utf8-encoded bytes, which a String holds on the heap. You can get a reference to that region of memory directly on the heap by getting it via .as_str()/.as_ref() or by converting the &String into a &str via deref coercion.
But in the case of u64 vs &u64, there isn't much of a practical difference between the two except the latter incurs an extra level of indirection in the generated code and you may have to worry about lifetime constraints. Because of that, its usually better to use copies of integer types if given the choice. You'd still see references to integers though if using them through some generic interface.
Yes, there is a difference between the two. The fact that an integer lives in the heap, or in the stack, doesn't change the fact that it's somewhere in the memory, so it has an address. A pointer being just that address, even integers can have pointers to. And, indeed, if you try using a pointer to an integer as an integer, you'll have a problem, because of type mismatch.
The difference between a String and a number type such as i64 is that i64: Copy, which means that you can turn a &i64 into a i64 just by "copying" the values (as opposed to calling a dedicated function that knows how to appropriately clone stuff, such as String::clone, which comes from Clone::clone). This means that Rust will allow implicit copying of an integer, so, from this perspective, a pointer to an integer is as permissive as an integer in itself.
Related
I'm trying to use a C library that requires me to give it strings (const char*) as function arguments, among other things.
I tried writing 2 functions for this (Rust -> C, C -> Rust), so I'm already aware CString and CStr exist, but couldn't get them working for the 1st conversion.
I tried using solutions from Stack Overflow and examples from the docs for both of them but the result always ends up garbage (mostly random characters and on one occasion the same result as in this post).
// My understanding is that C strings have a \0 terminator, which I need to append to input
// without having the variable created deallocated/destroyed/you name it by Rust right after
// I don't have any other explanation for the data becoming garbage if I clone it.
// Also, this conversion works if i manually append \0 to the end of the string at the constructor
pub unsafe fn convert_str(input: &str) -> *const c_char {
let c_str = input.as_ptr() as *const c_char;
return c_str
}
// Works, at least for now
pub unsafe fn cstr_to_str(c_buf: *const i8) -> &'static str {
let cstr = CStr::from_ptr(c_buf);
return cstr.to_str().expect("success");
}
The resulting implementation acts like this:
let pointer = convert_str("Hello");
let result = cstr_to_str(pointer);
println!("Hello becomes {}", result);
// Output:
// Hello becomes HelloHello becomescannot access a Thread Local Storage value during or after destruction/rustc/fe5b1...
// LOOKsrc/main.rsHello this is windowFailed to create GLFW window.
How do I fix this? Is there a better way to do this I'm not seeing?
Rust strings don't have a \0 terminator, so to append one, convert_str must necessarily allocate memory (since it can't modify input - even in unsafe code, it can't know whether there's space for one more byte in the memory allocated for input)
If you're gonna wrangle C strings, you're gonna have to do C style manual management, i.e. together with returning a char*, convert_str also has to return the implicit obligation to free the string to the caller. Said differently, convert_str can't deallocate the buffer it must allocate. (If it did, using the pointer it returns would be a use after free, which indeed results in garbage.)
So your code might look like this:
Allocate a new CString in convert_str with CString::new(input).unwrap() and make sure its internal buffer doesn't get dropped at the end of the function with .into_raw()
Deallocate the return value of convert_str when you're done using it with drop(CString::from_raw(pointer)).
Playground
The Rust language website claims move semantics as one of the features of the language. But I can't see how move semantics is implemented in Rust.
Rust boxes are the only place where move semantics are used.
let x = Box::new(5);
let y: Box<i32> = x; // x is 'moved'
The above Rust code can be written in C++ as
auto x = std::make_unique<int>(5);
auto y = std::move(x); // Note the explicit move
As far as I know (correct me if I'm wrong),
Rust doesn't have constructors at all, let alone move constructors.
No support for rvalue references.
No way to create functions overloads with rvalue parameters.
How does Rust provide move semantics?
I think it's a very common issue when coming from C++. In C++ you are doing everything explicitly when it comes to copying and moving. The language was designed around copying and references. With C++11 the ability to "move" stuff was glued onto that system. Rust on the other hand took a fresh start.
Rust doesn't have constructors at all, let alone move constructors.
You do not need move constructors. Rust moves everything that "does not have a copy constructor", a.k.a. "does not implement the Copy trait".
struct A;
fn test() {
let a = A;
let b = a;
let c = a; // error, a is moved
}
Rust's default constructor is (by convention) simply an associated function called new:
struct A(i32);
impl A {
fn new() -> A {
A(5)
}
}
More complex constructors should have more expressive names. This is the named constructor idiom in C++
No support for rvalue references.
It has always been a requested feature, see RFC issue 998, but most likely you are asking for a different feature: moving stuff to functions:
struct A;
fn move_to(a: A) {
// a is moved into here, you own it now.
}
fn test() {
let a = A;
move_to(a);
let c = a; // error, a is moved
}
No way to create functions overloads with rvalue parameters.
You can do that with traits.
trait Ref {
fn test(&self);
}
trait Move {
fn test(self);
}
struct A;
impl Ref for A {
fn test(&self) {
println!("by ref");
}
}
impl Move for A {
fn test(self) {
println!("by value");
}
}
fn main() {
let a = A;
(&a).test(); // prints "by ref"
a.test(); // prints "by value"
}
Rust's moving and copying semantics are very different from C++. I'm going to take a different approach to explain them than the existing answer.
In C++, copying is an operation that can be arbitrarily complex, due to custom copy constructors. Rust doesn't want custom semantics of simple assignment or argument passing, and so takes a different approach.
First, an assignment or argument passing in Rust is always just a simple memory copy.
let foo = bar; // copies the bytes of bar to the location of foo (might be elided)
function(foo); // copies the bytes of foo to the parameter location (might be elided)
But what if the object controls some resources? Let's say we are dealing with a simple smart pointer, Box.
let b1 = Box::new(42);
let b2 = b1;
At this point, if just the bytes are copied over, wouldn't the destructor (drop in Rust) be called for each object, thus freeing the same pointer twice and causing undefined behavior?
The answer is that Rust moves by default. This means that it copies the bytes to the new location, and the old object is then gone. It is a compile error to access b1 after the second line above. And the destructor is not called for it. The value was moved to b2, and b1 might as well not exist anymore.
This is how move semantics work in Rust. The bytes are copied over, and the old object is gone.
In some discussions about C++'s move semantics, Rust's way was called "destructive move". There have been proposals to add the "move destructor" or something similar to C++ so that it can have the same semantics. But move semantics as they are implemented in C++ don't do this. The old object is left behind, and its destructor is still called. Therefore, you need a move constructor to deal with the custom logic required by the move operation. Moving is just a specialized constructor/assignment operator that is expected to behave in a certain way.
So by default, Rust's assignment moves the object, making the old location invalid. But many types (integers, floating points, shared references) have semantics where copying the bytes is a perfectly valid way of creating a real copy, with no need to ignore the old object. Such types should implement the Copy trait, which can be derived by the compiler automatically.
#[derive(Copy)]
struct JustTwoInts {
one: i32,
two: i32,
}
This signals the compiler that assignment and argument passing do not invalidate the old object:
let j1 = JustTwoInts { one: 1, two: 2 };
let j2 = j1;
println!("Still allowed: {}", j1.one);
Note that trivial copying and the need for destruction are mutually exclusive; a type that is Copy cannot also be Drop.
Now what about when you want to make a copy of something where just copying the bytes isn't enough, e.g. a vector? There is no language feature for this; technically, the type just needs a function that returns a new object that was created the right way. But by convention this is achieved by implementing the Clone trait and its clone function. In fact, the compiler supports automatic derivation of Clone too, where it simply clones every field.
#[Derive(Clone)]
struct JustTwoVecs {
one: Vec<i32>,
two: Vec<i32>,
}
let j1 = JustTwoVecs { one: vec![1], two: vec![2, 2] };
let j2 = j1.clone();
And whenever you derive Copy, you should also derive Clone, because containers like Vec use it internally when they are cloned themselves.
#[derive(Copy, Clone)]
struct JustTwoInts { /* as before */ }
Now, are there any downsides to this? Yes, in fact there is one rather big downside: because moving an object to another memory location is just done by copying bytes, and no custom logic, a type cannot have references into itself. In fact, Rust's lifetime system makes it impossible to construct such types safely.
But in my opinion, the trade-off is worth it.
Rust supports move semantics with features like these:
All types are moveable.
Sending a value somewhere is a move, by default, throughout the language. For non-Copy types, like Vec, the following are all moves in Rust: passing an argument by value, returning a value, assignment, pattern-matching by value.
You don't have std::move in Rust because it's the default. You're really using moves all the time.
Rust knows that moved values must not be used. If you have a value x: String and do channel.send(x), sending the value to another thread, the compiler knows that x has been moved. Trying to use it after the move is a compile-time error, "use of moved value". And you can't move a value if anyone has a reference to it (a dangling pointer).
Rust knows not to call destructors on moved values. Moving a value transfers ownership, including responsibility for cleanup. Types don't have to be able to represent a special "value was moved" state.
Moves are cheap and the performance is predictable. It's basically memcpy. Returning a huge Vec is always fast—you're just copying three words.
The Rust standard library uses and supports moves everywhere. I already mentioned channels, which use move semantics to safely transfer ownership of values across threads. Other nice touches: all types support copy-free std::mem::swap() in Rust; the Into and From standard conversion traits are by-value; Vec and other collections have .drain() and .into_iter() methods so you can smash one data structure, move all the values out of it, and use those values to build a new one.
Rust doesn't have move references, but moves are a powerful and central concept in Rust, providing a lot of the same performance benefits as in C++, and some other benefits as well.
let s = vec!["udon".to_string(), "ramen".to_string(), "soba".to_string()];
this is how it is represented in memory
Then let's assign s to t
let t = s;
this is what happens:
let t = s MOVED the vector’s three header fields from s to t; now t is the owner of the vector. The vector’s elements stayed just
where they were, and nothing happened to the strings either. Every value still has a single owner.
Now s is freed, if I write this
let u = s
I get error: "use of moved value: s"
Rust applies move semantics to almost any use of a value (Except Copy types). Passing
arguments to functions moves ownership to the function’s parameters;
returning a value from a function moves ownership to the caller.
Building a tuple moves the values into the tuple. And so on.
Ref for example:Programming Rust by Jim Blandy, Jason Orendorff, Leonora F. S. Tindall
Primitive types cannot be empty and are fixed size while non primitives can grow and can be empty. since primitive types cannot be empty and are fixed size, therefore assigning memory to store them and handling them are relatively easy. however the handling of non primitives involves the computation of how much memory they will take as they grow and other costly operations.Wwith primitives rust will make a copy, with non primitive rust does a move
fn main(){
// this variable is stored in stack. primitive types are fixed size, we can store them on stack
let x:i32=10;
// s1 is stored in heap. os will assign memory for this. pointer of this memory will be stored inside stack.
// s1 is the owner of memory space in heap which stores "my name"
// if we dont clear this memory, os will have no access to this memory. rust uses ownership to free the memory
let s1=String::from("my name");
// s1 will be cleared from the stack, s2 will be added to the stack poniting the same heap memory location
// making new copy of this string will create extra overhead, so we MOVED the ownership of s1 into s2
let s2=s1;
// s3 is the pointer to s2 which points to heap memory. we Borrowed the ownership
// Borrowing is similar borrowing in real life, you borrow a car from your friend, but its ownership does not change
let s3=&s2;
// this is creating new "my name" in heap and s4 stored as the pointer of this memory location on the heap
let s4=s2.clone()
}
Same principle applies when we pass primitive or non-primitive type arguments to a function:
fn main(){
// since this is primitive stack_function will make copy of it so this will remain unchanged
let stack_num=50;
let mut heap_vec=vec![2,3,4];
// when we pass a stack variable to a function, function will make a copy of that and will use the copy. "move" does not occur here
stack_var_fn(stack_num);
println!("The stack_num inside the main fn did not change:{}",stack_num);
// the owner of heap_vec moved here and when function gets executed, it goes out of scope so the variable will be dropped
// we can pass a reference to reach the value in heap. so we use the pointer of heap_vec
// we use "&"" operator to indicate that we are passing a reference
heap_var_fn(&heap_vec);
println!("the heap_vec inside main is:{:?}",heap_vec);
}
// this fn that we pass an argument stored in stack
fn stack_var_fn(mut var:i32){
// we are changing the arguments value
var=56;
println!("Var inside stack_var_fn is :{}",var);
}
// this fn that we pass an arg that stored in heap
fn heap_var_fn(var:&Vec<i32>){
println!("Var:{:?}",var);
}
I would like to add that it is not necessary for move to memcpy. If the object on the stack is large enough, Rust's compiler may choose to pass the object's pointer instead.
In C++ the default assignment of classes and structs is shallow copy. The values are copied, but not the data referenced by pointers. So modifying one instance changes the referenced data of all copies. The values (f.e. used for administration) remain unchanged in the other instance, likely rendering an inconsistent state. A move semantic avoids this situation. Example for a C++ implementation of a memory managed container with move semantic:
template <typename T>
class object
{
T *p;
public:
object()
{
p=new T;
}
~object()
{
if (p != (T *)0) delete p;
}
template <typename V> //type V is used to allow for conversions between reference and value
object(object<V> &v) //copy constructor with move semantic
{
p = v.p; //move ownership
v.p = (T *)0; //make sure it does not get deleted
}
object &operator=(object<T> &v) //move assignment
{
delete p;
p = v.p;
v.p = (T *)0;
return *this;
}
T &operator*() { return *p; } //reference to object *d
T *operator->() { return p; } //pointer to object data d->
};
Such an object is automatically garbage collected and can be returned from functions to the calling program. It is extremely efficient and does the same as Rust does:
object<somestruct> somefn() //function returning an object
{
object<somestruct> a;
auto b=a; //move semantic; b becomes invalid
return b; //this moves the object to the caller
}
auto c=somefn();
//now c owns the data; memory is freed after leaving the scope
Forgive me if there is an obvious answer to the question I'm asking, but I just don't quite understand it.
The Dynamically Sized Types and the Sized Trait section in chapter 19.3 Advanced Types of the 《The Rust Programming Language》 mentions:
Rust needs to know how much memory to allocate for any value of a particular type, and all values of a type must use the same amount of memory. If Rust allowed us to write this code, these two str values would need to take up the same amount of space. But they have different lengths: s1 needs 12 bytes of storage and s2 needs 15. This is why it’s not possible to create a variable holding a dynamically sized type.
When it says "and all values of a type must use the same amount of memory", it is meant to refer to dynamically sized types, not types such as vectors or arrays, right? v1 and v2 are also unlikely to occupy the same amount of memory.
let v1 = vec![1, 2, 3];
let v2 = vec![1, 2, 3, 4, 5, 6];
It's correct and considers vectors as well. A Vec<T> is roughly just a pointer to a position on the heap, a capacity, and a length. It could be defined, more or less, as
pub struct Vec<T>(T*, usize, usize);
And every value of that structure clearly has the same size. When Rust says that every value of a type has to have the same size, it only refers to the size of the structure itself, not to the recursive size of all things it points to. Box<T> has a constant size, regardless of T, which is why Box can hold even things that are dynamically sized, such as trait objects. Likewise, String is basically just a pointer.
Likewise, if we define
pub enum MyEnum {
A(i32),
B(i32, i32),
}
Then MyEnum::A is no smaller than MyEnum::B, for similar reasons, despite the latter having more data than the former.
Every type that can be stored and accessed without the indirection of a reference or Box must have the Sized trait implemented. This means that every instance of the type will have the same size. A str is a DST, as the data it holds can be of a variable length, and thus, you can only access strs as references, or from String, which holds the str data on the heap, through a pointer.
Every Vec also takes the same space, which is 24 bytes on a 64-bit machine.
For example:
let vec = vec![1, 2, 3, 4];
println!("{}", std::mem::size_of_val(&vec)); // Prints '24'.
According to The Rust book:
Each value in Rust has a variable that’s called its owner. There can be only one owner at a time. When the owner goes out of scope, the value will be dropped.
According to rust-lang.org:
Static items do not call drop at the end of the program.
After reading this SO post, and given the code below, I understand that foo is a value whose variable y, equivalent to &y since "string literals are string slices", is called its owner. Is that correct? Or do static items have no owner?
let x = String::from("foo"); // heap allocated, mutable, owned
let y = "foo" // statically allocated to rust executable, immutable
I'm wondering because unlike an owned String, string literals are not moved, presumably because they're stored in .rodata in the executable.
fn main() {
let s1 = "foo"; // as opposed to String::from("foo")
let s2 = s1; // not moved
let s3 = s2; // no error, unlike String::from("foo")
}
UPDATE: According to The Rust book:
These ampersands are references, and they allow you to refer to some value without taking ownership of it...Another data type that does not have ownership is the slice.
Since string literals are string slices (&str) (see citation above), they, logically, do not have ownership. The rationale seems to be that the compiler requires a data structure with a known size: a reference:
let s1: str = "foo"; // [rustc E0277] the size for values of type `str` cannot be known at compilation time [E]
A string slice reference (&str) does not own the string slice that it points to, it borrows it. You can have several immutable references to an object, that's why your second code sample is correct and the borrow checker is happy to accept it.
I think you can say that types with the 'static lifetime have no owner, or that something outside of the main function owns it. The owner only matters when the lifetime of the owning object ends (if you own it, you need to free resources). For references only lifetimes matter.
I've got question about my code:
pub fn get_signals(path: &String) -> Vec<Vec<f64>> {
let mut rdr = csv::ReaderBuilder::new().delimiter(b';').from_path(&path).unwrap();
let mut signals: Vec<Vec<f64>> = Vec::new();
for record in rdr.records(){
let mut r = record.unwrap();
for (i, value) in r.iter().enumerate(){
match signals.get(i){
Some(_) => {},
None => signals.push(Vec::new())
}
signals[i].push(value.parse::<f64>().unwrap());
}
}
signals
}
How exactly does Rust handle return? When I, for example write let signals = get_signal(&"data.csv".to_string()); does Rust assume I want a new instance of Vec(copies all the data) or just pass a pointer to previously allocated(via Vec::new()) memory? What is the most efficient way to do this? Also, what happens with rdr? I assume, given Rusts memory safety, it's destroyed.
How exactly does Rust handle return?
The only guarantee Rust, the language, makes is that values are never cloned without an explicit .clone() in the code. Therefore, from a semantic point of view, the value is moved which will not require allocating memory.
does Rust assume I want a new instance of Vec(copies all the data) or just pass a pointer to previously allocated (via Vec::new()) memory?
This is implementation specific, and part of the ABI (Application Binary Interface). The Rust ABI is not formalized, and not stable, so there is no standard describing it and no guarantee about this holding up.
Furthermore, this will depend on whether the function call is inlined or not. If the function call is inlined, there is of course no return any longer yet the same behavior should be observed.
For small values, they should be returned via a register (or a couple of registers).
For larger values:
the caller should reserve memory on the stack (properly sized and aligned) and pass a pointer to this area to the callee,
the callee will then construct the return value at the place pointed to, so that by the time it returns the value exists there for the caller to use.
Note: by the size here is the size on the stack, as returned by std::mem::size_of; so size_of::<Vec<_>>() == 24 on 64-bits architecture.
What is the most efficient way to do this?
Returning is as efficient as it gets for a single call.
If however you find yourself in a situation where, say, you want to read a file line by line, then it makes sense to reuse the buffer from one call to the other which can be accomplished either by:
taking a &mut references to the buffer (String or Vec<u8> say),
or taking a buffer by value and returning it.
The point being to avoid memory allocations.