Consider this example:
fn main() {
let v: Vec<i32> = vec![1, 2, 3, 4, 5];
let b: i32 = (&v[2]) * 4.0;
println!("product of third value with 4 is {}", b);
}
This fails as expected as float can't be multiplied with &i32.
error[E0277]: cannot multiply `{float}` to `&i32`
--> src\main.rs:3:23
|
3 | let b: i32 = (&v[2]) * 4.0;
| ^ no implementation for `&i32 * {float}`
|
= help: the trait `std::ops::Mul<{float}>` is not implemented for `&i32`
But when I change the float to int, it works fine.
fn main() {
let v: Vec<i32> = vec![1, 2, 3, 4, 5];
let b: i32 = (&v[2]) * 4;
println!("product of third value with 4 is {}", b);
}
Did the compiler implement the operation between &i32 and i32?
If yes, how is this operation justified in such a type safe language?
Did the compiler implement the operation between &i32 and i32?
Yes. Well, not the compiler, but rather the standard library. You can see the impl in the documentation.
If yes, how is this operation justified in such a type safe language?
"Type safe" is not a Boolean property, but rather a spectrum. Most C++ programmers would say that C++ is type safe. Yet, C++ has many features that automatically cast between types (constructors, operator T, taking references of values, ...). When designing a programming language, one has to balance the risk of bugs (when introducing convenient type conversions) with the inconvenience (when not having them).
As an extreme example: consider if Option<T> would deref to T and panic if it was None. That's the behavior of most languages that have null. I think it's pretty clear that this "feature" has led to numerous bugs in the real world (search term "billion dollar mistake"). On the other hand, let's consider what bugs could be caused by having &i32 * i32 compile. I honestly can't think of any. Maaaybe someone wanted to multiply the raw pointer of one value with an integer? Rather unlikely in Rust. So since the chance of introducing bugs with this feature is very low, but it is convenient, it was decided to be implemented.
This is always something the designers have to balance. Different languages are in a different position on this spectrum. Rust would likely be considered "more typesafe" than C++, but not doubt, there are even "more typesafe" languages than Rust out there. In this context, "more typesafe" just meant: decisions leaned more towards "inconvenience instead of potential bugs".
I think you may be confusing &i32 from rust with &var from C.
In C,
int var = 5;
int newvar = &var * 4; /* this would be bad code,
should not multiply an address by an integer!
Of course, C will let you. */
the '&' operator returns the address of the variable 'var'.
However, in rust, the '&' operator borrows use of the variable var.
In Rust,
let var: i32 = 5;
assert_eq!(&var * 8, 40);
This works, because &var refers to 5, not to the address of var. Note that in C, the & is an operator. In Rust, the & is acting as part of the type of the variable. Hence, the type is &i32.
This is very confusing. If there were more characters left on standard keyboard, i am sure the designers would have used a different one.
Please see the book and carefully follow the diagrams. The examples in the book use String, which is allocated on the heap. Primitives, like i32 are normally allocated on the stack and may be completely optimized away by the compiler. Also, primitives are frequently copied even when reference notation is used, so that gets confusing. Still, I think it is easier to look at the heap examples using String first and then to consider how this would apply to primitives. The logic is the same, but the actual storage and optimization my be different.
It’s very simple actually: Rust will automatically dereference references for you. It’s not like C where you have to dereference a pointer yourself. Rust references are very similar to C++ references in this regard.
Related
Why does this work fine:
let items = [1, 2, 3];
let mut cumulator = 0;
for next in items.iter() {
cumulator += next;
}
println!("Final {}", cumulator);
But this fail?:
let items = [1, 2, 3];
let mut cumulator = 0;
for next in items.iter() {
cumulator += next.pow(2);
}
println!("Final {}", cumulator);
Error on .pow(2):
no method named `pow` found for reference `&{integer}` in the current scope
method not found in `&{integer}`rustc (E0599)
My IDE identifies next as i32 and the first code example works fine. But the compiler has an issue the moment I reference next.pow() or any function on next . The compiler complains that next is an ambiguous integer type.
Sure, I can fix this by either explicitly declaring the array as i32[]. Or I can also use an interim variable before cumulator which is also explicitly declared i32. But these seem unnecessary and a bit clunky.
So why is compiler happy in the first case and not in the second?
Calling methods on objects is kind of funny, because it conveys zero information. That is, if I write
a + b
Then, even if Rust knows nothing about a and b, it can now assume that a is Add where the Rhs type is the type of b. We can refine the types and, hopefully, get more information down the road. Similarly, if I write foobar(), where foobar is a local variable, then Rust knows it has to be at least FnOnce.
However, if foo is a variable, and I write
foo.frobnicate()
Then Rust has no idea what to do with that. Is it an inherent impl on some type? Is it a trait function? It could be literally anything. If it's inherent, then it could even be in a module that we haven't imported, so we can't simply check everything that's in scope.
In your case, pow isn't even a trait function, it's actually several different functions. Even if it was a trait function, we couldn't say anything, because we don't, a priori, know which trait. So Rust sees next.pow(2) and bails out immediately, rather than trying to do something unexpected.
In your other case, Rust is able to infer the type. At the end of the function, all it knows about the type is that it's an {integer} on which Add is defined, and Rust has integer defaulting rules that kick in to turn that into i32, in the absence of any other information.
Could they have applied integer defaulting to next.pow(2)? Possibly, but I'm glad they didn't. Integers are already a special case in Rust (integers and floats are the only types with polymorphic literals), so minimizing the amount of special casing required by the compiler is, at least in my mind, a good thing. The defaulting rules kicked in in the first example because nothing caused it to bail out, and they would have in the second if it hadn't already encountered the bigger error condition of "calling an impl function on an unknown type".
In Rust, there are two possibilities to take a reference
Borrow, i.e., take a reference but don't allow mutating the reference destination. The & operator borrows ownership from a value.
Borrow mutably, i.e., take a reference to mutate the destination. The &mut operator mutably borrows ownership from a value.
The Rust documentation about borrowing rules says:
First, any borrow must last for a scope no greater than that of the
owner. Second, you may have one or the other of these two kinds of
borrows, but not both at the same time:
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
I believe that taking a reference is creating a pointer to the value and accessing the value by the pointer. This could be optimized away by the compiler if there is a simpler equivalent implementation.
However, I don't understand what move means and how it is implemented.
For types implementing the Copy trait it means copying e.g. by assigning the struct member-wise from the source, or a memcpy(). For small structs or for primitives this copy is efficient.
And for move?
This question is not a duplicate of What are move semantics? because Rust and C++ are different languages and move semantics are different between the two.
Semantics
Rust implements what is known as an Affine Type System:
Affine types are a version of linear types imposing weaker constraints, corresponding to affine logic. An affine resource can only be used once, while a linear one must be used once.
Types that are not Copy, and are thus moved, are Affine Types: you may use them either once or never, nothing else.
Rust qualifies this as a transfer of ownership in its Ownership-centric view of the world (*).
(*) Some of the people working on Rust are much more qualified than I am in CS, and they knowingly implemented an Affine Type System; however contrary to Haskell which exposes the math-y/cs-y concepts, Rust tends to expose more pragmatic concepts.
Note: it could be argued that Affine Types returned from a function tagged with #[must_use] are actually Linear Types from my reading.
Implementation
It depends. Please keep in mind than Rust is a language built for speed, and there are numerous optimizations passes at play here which will depend on the compiler used (rustc + LLVM, in our case).
Within a function body (playground):
fn main() {
let s = "Hello, World!".to_string();
let t = s;
println!("{}", t);
}
If you check the LLVM IR (in Debug), you'll see:
%_5 = alloca %"alloc::string::String", align 8
%t = alloca %"alloc::string::String", align 8
%s = alloca %"alloc::string::String", align 8
%0 = bitcast %"alloc::string::String"* %s to i8*
%1 = bitcast %"alloc::string::String"* %_5 to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 24, i32 8, i1 false)
%2 = bitcast %"alloc::string::String"* %_5 to i8*
%3 = bitcast %"alloc::string::String"* %t to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 24, i32 8, i1 false)
Underneath the covers, rustc invokes a memcpy from the result of "Hello, World!".to_string() to s and then to t. While it might seem inefficient, checking the same IR in Release mode you will realize that LLVM has completely elided the copies (realizing that s was unused).
The same situation occurs when calling a function: in theory you "move" the object into the function stack frame, however in practice if the object is large the rustc compiler might switch to passing a pointer instead.
Another situation is returning from a function, but even then the compiler might apply "return value optimization" and build directly in the caller's stack frame -- that is, the caller passes a pointer into which to write the return value, which is used without intermediary storage.
The ownership/borrowing constraints of Rust enable optimizations that are difficult to reach in C++ (which also has RVO but cannot apply it in as many cases).
So, the digest version:
moving large objects is inefficient, but there are a number of optimizations at play that might elide the move altogether
moving involves a memcpy of std::mem::size_of::<T>() bytes, so moving a large String is efficient because it only copies a couple bytes whatever the size of the allocated buffer they hold onto
When you move an item, you are transferring ownership of that item. That's a key component of Rust.
Let's say I had a struct, and then I assign the struct from one variable to another. By default, this will be a move, and I've transferred ownership. The compiler will track this change of ownership and prevent me from using the old variable any more:
pub struct Foo {
value: u8,
}
fn main() {
let foo = Foo { value: 42 };
let bar = foo;
println!("{}", foo.value); // error: use of moved value: `foo.value`
println!("{}", bar.value);
}
how it is implemented.
Conceptually, moving something doesn't need to do anything. In the example above, there wouldn't be a reason to actually allocate space somewhere and then move the allocated data when I assign to a different variable. I don't actually know what the compiler does, and it probably changes based on the level of optimization.
For practical purposes though, you can think that when you move something, the bits representing that item are duplicated as if via memcpy. This helps explain what happens when you pass a variable to a function that consumes it, or when you return a value from a function (again, the optimizer can do other things to make it efficient, this is just conceptually):
// Ownership is transferred from the caller to the callee
fn do_something_with_foo(foo: Foo) {}
// Ownership is transferred from the callee to the caller
fn make_a_foo() -> Foo { Foo { value: 42 } }
"But wait!", you say, "memcpy only comes into play with types implementing Copy!". This is mostly true, but the big difference is that when a type implements Copy, both the source and the destination are valid to use after the copy!
One way of thinking of move semantics is the same as copy semantics, but with the added restriction that the thing being moved from is no longer a valid item to use.
However, it's often easier to think of it the other way: The most basic thing that you can do is to move / give ownership away, and the ability to copy something is an additional privilege. That's the way that Rust models it.
This is a tough question for me! After using Rust for a while the move semantics are natural. Let me know what parts I've left out or explained poorly.
Rust's move keyword always bothers me so, I decided to write my understanding which I obtained after discussion with my colleagues.
I hope this might help someone.
let x = 1;
In the above statement, x is a variable whose value is 1. Now,
let y = || println!("y is a variable whose value is a closure");
So, move keyword is used to transfer the ownership of a variable to the closure.
In the below example, without move, x is not owned by the closure. Hence x is not owned by y and available for further use.
let x = 1;
let y = || println!("this is a closure that prints x = {}". x);
On the other hand, in this next below case, the x is owned by the closure. x is owned by y and not available for further use.
let x = 1;
let y = move || println!("this is a closure that prints x = {}". x);
By owning I mean containing as a member variable. The example cases above are in the same situation as the following two cases. We can also assume the below explanation as to how the Rust compiler expands the above cases.
The formar (without move; i.e. no transfer of ownership),
struct ClosureObject {
x: &u32
}
let x = 1;
let y = ClosureObject {
x: &x
};
The later (with move; i.e. transfer of ownership),
struct ClosureObject {
x: u32
}
let x = 1;
let y = ClosureObject {
x: x
};
Please let me answer my own question. I had trouble, but by asking a question here I did Rubber Duck Problem Solving. Now I understand:
A move is a transfer of ownership of the value.
For example the assignment let x = a; transfers ownership: At first a owned the value. After the let it's x who owns the value. Rust forbids to use a thereafter.
In fact, if you do println!("a: {:?}", a); after the letthe Rust compiler says:
error: use of moved value: `a`
println!("a: {:?}", a);
^
Complete example:
#[derive(Debug)]
struct Example { member: i32 }
fn main() {
let a = Example { member: 42 }; // A struct is moved
let x = a;
println!("a: {:?}", a);
println!("x: {:?}", x);
}
And what does this move mean?
It seems that the concept comes from C++11. A document about C++ move semantics says:
From a client code point of view, choosing move instead of copy means that you don't care what happens to the state of the source.
Aha. C++11 does not care what happens with source. So in this vein, Rust is free to decide to forbid to use the source after a move.
And how it is implemented?
I don't know. But I can imagine that Rust does literally nothing. x is just a different name for the same value. Names usually are compiled away (except of course debugging symbols). So it's the same machine code whether the binding has the name a or x.
It seems C++ does the same in copy constructor elision.
Doing nothing is the most efficient possible.
Passing a value to function, also results in transfer of ownership; it is very similar to other examples:
struct Example { member: i32 }
fn take(ex: Example) {
// 2) Now ex is pointing to the data a was pointing to in main
println!("a.member: {}", ex.member)
// 3) When ex goes of of scope so as the access to the data it
// was pointing to. So Rust frees that memory.
}
fn main() {
let a = Example { member: 42 };
take(a); // 1) The ownership is transfered to the function take
// 4) We can no longer use a to access the data it pointed to
println!("a.member: {}", a.member);
}
Hence the expected error:
post_test_7.rs:12:30: 12:38 error: use of moved value: `a.member`
let s1:String= String::from("hello");
let s2:String= s1;
To ensure memory safety, rust invalidates s1, so instead of being shallow copy, this called a Move
fn main() {
// Each value in rust has a variable that is called its owner
// There can only be one owner at a time.
let s=String::from('hello')
take_ownership(s)
println!("{}",s)
// Error: borrow of moved value "s". value borrowed here after move. so s cannot be borrowed after a move
// when we pass a parameter into a function it is the same as if we were to assign s to another variable. Passing 's' moves s into the 'my_string' variable then `println!("{}",my_string)` executed, "my_string" printed out. After this scope is done, some_string gets dropped.
let x:i32 = 2;
makes_copy(x)
// instead of being moved, integers are copied. we can still use "x" after the function
//Primitives types are Copy and they are stored in stack because there size is known at compile time.
println("{}",x)
}
fn take_ownership(my_string:String){
println!('{}',my_string);
}
fn makes_copy(some_integer:i32){
println!("{}", some_integer)
}
I'm implementing Conway's game of life to teach myself Rust. The idea is to implement a single-threaded version first, optimize it as much as possible, then do the same for a multi-threaded version.
I wanted to implement an alternative data layout which I thought might be more cache-friendly. The idea is to store the status of two cells for each point on a board next to each other in memory in a vector, one cell for reading the current generation's status from and one for writing the next generation's status to, alternating the access pattern for each
generation's computation (which can be determined at compile time).
The basic data structures are as follows:
#[repr(u8)]
pub enum CellStatus {
DEAD,
ALIVE,
}
/** 2 bytes */
pub struct CellRW(CellStatus, CellStatus);
pub struct TupleBoard {
width: usize,
height: usize,
cells: Vec<CellRW>,
}
/** used to keep track of current pos with iterator e.g. */
pub struct BoardPos {
x_pos: usize,
y_pos: usize,
offset: usize,
}
pub struct BoardEvo {
board: TupleBoard,
}
The function that is causing me troubles:
impl BoardEvo {
fn evolve_step<T: RWSelector>(&mut self) {
for (pos, cell) in self.board.iter_mut() {
//pos: BoardPos, cell: &mut CellRW
let read: &CellStatus = T::read(cell); //chooses the right tuple half for the current evolution step
let write: &mut CellStatus = T::write(cell);
let alive_count = pos.neighbours::<T>(&self.board).iter() //<- can't borrow self.board again!
.filter(|&&status| status == CellStatus::ALIVE)
.count();
*write = CellStatus::evolve(*read, alive_count);
}
}
}
impl BoardPos {
/* ... */
pub fn neighbours<T: RWSelector>(&self, board: &BoardTuple) -> [CellStatus; 8] {
/* ... */
}
}
The trait RWSelector has static functions for reading from and writing to a cell tuple (CellRW). It is implemented for two zero-sized types L and R and is mainly a way to avoid having to write different methods for the different access patterns.
The iter_mut() method returns a BoardIter struct which is a wrapper around a mutable slice iterator for the cells vector and thus has &mut CellRW as Item type. It is also aware of the current BoardPos (x and y coordinates, offset).
I thought I'd iterate over all cell tuples, keep track of the coordinates, count the number of alive neighbours (I need to know coordinates/offsets for this) for each (read) cell, compute the cell status for the next generation and write to the respective another half of the tuple.
Of course, in the end, the compiler showed me the fatal flaw in my design, as I borrow self.board mutably in the iter_mut() method and then try to borrow it again immutably to get all the neighbours of the read cell.
I have not been able to come up with a good solution for this problem so far. I did manage to get it working by making all
references immutable and then using an UnsafeCell to turn the immutable reference to the write cell into a mutable one.
I then write to the nominally immutable reference to the writing part of the tuple through the UnsafeCell.
However, that doesn't strike me as a sound design and I suspect I might run into issues with this when attempting to parallelize things.
Is there a way to implement the data layout I proposed in safe/idiomatic Rust or is this actually a case where you actually have to use tricks to circumvent Rust's aliasing/borrow restrictions?
Also, as a broader question, is there a recognizable pattern for problems which require you to circumvent Rust's borrow restrictions?
When is it necessary to circumvent Rust's borrow checker?
It is needed when:
the borrow checker is not advanced enough to see that your usage is safe
you do not wish to (or cannot) write the code in a different pattern
As a concrete case, the compiler cannot tell that this is safe:
let mut array = [1, 2];
let a = &mut array[0];
let b = &mut array[1];
The compiler doesn't know what the implementation of IndexMut for a slice does at this point of compilation (this is a deliberate design choice). For all it knows, arrays always return the exact same reference, regardless of the index argument. We can tell that this code is safe, but the compiler disallows it.
You can rewrite this in a way that is obviously safe to the compiler:
let mut array = [1, 2];
let (a, b) = array.split_at_mut(1);
let a = &mut a[0];
let b = &mut b[0];
How is this done? split_at_mut performs a runtime check to ensure that it actually is safe:
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
let len = self.len();
let ptr = self.as_mut_ptr();
unsafe {
assert!(mid <= len);
(from_raw_parts_mut(ptr, mid),
from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
}
}
For an example where the borrow checker is not yet as advanced as it can be, see What are non-lexical lifetimes?.
I borrow self.board mutably in the iter_mut() method and then try to borrow it again immutably to get all the neighbours of the read cell.
If you know that the references don't overlap, then you can choose to use unsafe code to express it. However, this means you are also choosing to take on the responsibility of upholding all of Rust's invariants and avoiding undefined behavior.
The good news is that this heavy burden is what every C and C++ programmer has to (or at least should) have on their shoulders for every single line of code they write. At least in Rust, you can let the compiler deal with 99% of the cases.
In many cases, there's tools like Cell and RefCell to allow for interior mutation. In other cases, you can rewrite your algorithm to take advantage of a value being a Copy type. In other cases you can use an index into a slice for a shorter period. In other cases you can have a multi-phase algorithm.
If you do need to resort to unsafe code, then try your best to hide it in a small area and expose safe interfaces.
Above all, many common problems have been asked about (many times) before:
How to iterate over mutable elements inside another mutable iteration over the same elements?
Mutating an item inside of nested loops
How can a nested loop with mutations on a HashMap be achieved in Rust?
What's the Rust way to modify a structure within nested loops?
Nesting an iterator's loops
I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.
An example of partial application:
fn add(x: i32, y: i32) -> i32 {
x + y
}
fn main() {
let add7 = |x| add(7, x);
println!("{}", add7(35));
}
Is there overhead to this practice?
Here is the kind of thing I like to do (from a real code):
fn foo(n: u32, things: Vec<Things>) {
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
let new_things = things.clone().into_iter().flat_map(create_new_multiplier);
things.extend(new_things);
}
This is purely visual. I do not like to imbricate too much the stuff.
There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.
In code:
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)
will be the exact same as
things.clone().into_iter().flat_map(|thing| {
ThingMultiplier::new(thing, n)
})
In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.
The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.
In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.
But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.
I have the following simple program
fn main() {
let a = 10;
let b: i32;
let r: &i32;
b = a; // move?
r = &a; // borrow?
println!("{}", a);
println!("{}", b);
println!("{}", r);
println!("{}", &r);
println!("{}", *r);
}
The output is
10
10
10
10
10
The first print does not fail even when the value is moved. Is this because of primitive type or am I missing something?
The second print seems ok.
The third one prints a reference directly - shouldn't we get the memory address as this is a reference?
The fourth print is a reference to a reference, which should print a memory address, I think?
The fifth print seems reasonable as (I think) * is the value at operator that de-references the reference.
It seems I am not quite getting the whole thing.
Please explain in detail what's going on.
Related:
Move vs Copy in Rust
1, 2 => You are working with i32, which is Copy, so in practice b = a.clone()
3, 4, 5 => You're confused with the Deref trait. I find it easier to reason about ownership/borrowing than references in rust. r = &a means r borrows a so I can access its value later on, someone else will own it and take care of dropping it
Regarding 1: Yes, because it's a primitive variable, more specifically a type that implements the Copy trait. All those Copy-types work with copy semantics instead of move semantics.
Regarding 3: println! automatically dereferences it's arguments -- this is what the user wants in 99% of all cases.
Regarding 4: Again, automatically dereferences arguments... until it's a non-reference type.
The other answers are mostly right, but have some small errors.
1. i32 implements Copy, so when you assign it to a second variable binding, the first binding does not need to be invalidated. Any type that implements Copy will have this property.
3. You have asked to format the value with {} which corresponds to the Display trait. There is an implementation of this trait for references to types that implement Display:
impl<'a, T> Display for &'a T where T: Display + ?Sized {
fn fmt(&self, f: &mut Formatter) -> Result { Display::fmt(&**self, f) }
}
This simply delegates to the implementation of the referred-to type.
4. The same as #3 - a reference to a reference to a type that implements Display will just delegate twice. Deref does not come into play.
Here's the sneaky thing that no one else has mentioned. println! is a macro, which means it has more power than a regular function call. One of the things that it does is automatically take a reference to any arguments. That's what allows you to print out a value that doesn't implement Copy without losing ownership.
With this code:
let a = 10;
println!("{}", a);
The expanded version is actually something like this (slightly cleaned up):
let a = 10;
static __STATIC_FMTSTR: &'static [&'static str] = &["", "\n"];
::std::io::_print(::std::fmt::Arguments::new_v1(__STATIC_FMTSTR, &match (&a,) {
(__arg0,) => [::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt)],
}));
Therefore, everything passed to println! is a reference. It wouldn't be very useful if references printed out memory addresses.
Besides the usefulness, Rust focuses more on value semantics as opposed to reference semantics. When you have values moving and changing addresses frequently, the location of the value isn't very consistent or useful.
See also
Auto-dereference when printing a pointer, or did I miss something?
Reference to a vector still prints as a vector?