A very common operation in implementing algorithms is the cyclic rotate: given, say, 3 variables a, b, c change them to the effect of
t ⇽ c
c ⇽ b
b ⇽ a
a ⇽ t
Given that everything is bitwise swappable, cyclic rotation should be an area where Rust excels more than any other language I know of.
For comparison, in C++ the most efficient generic way to rotate N elements is performing n+1 std::move operations, which in turn roughly leads to (for a typical move constructor implementation) 3 (n+1) sizeof(T) word assignments (this can be improved for PODs via template specializing rotate, but requires work).
In Rust, the language makes it possible to implement rotate with only (n+1) size_of(T) word assignments. To my surprise, I could not find standard library support for rotation. (No rotate method in std::mem). It would probably look like this:
pub fn rotate<T>(x: &mut T, y: &mut T, z: &mut T) {
unsafe {
let mut t: T = std::mem::uninitialized();
std::ptr::copy_nonoverlapping(&*z, &mut t, 1);
std::ptr::copy_nonoverlapping(&*y, z, 1);
std::ptr::copy_nonoverlapping(&*x, y, 1);
std::ptr::copy_nonoverlapping(&t, x, 1);
std::mem::forget(t);
}
}
For clarification on why rotation cannot be implemented efficiently in C++, consider:
struct String {
char *data1;
char *data2;
String(String &&other) : data1(other.data1), data2(other.data2)
{ other.data1 = other.data2 = nullptr;}
String &operator=(String &&other)
{ std::swap(data1, other.data1); std::swap(data2, other.data2);
return *this; }
~String() { delete [] data1; delete [] data2; }
};
Here an operation like s2 = std::move(s1); will take 3 pointer assignments for each member field, totaling to 6 assignments since pointer swap requires 3 assignments (1 into temp, 1 out of temp, one across operands)
Is there a standard way of cyclically rotating mutable variables in Rust?
No.
I'd just swap the variables twice, no need for unsafe:
use std::mem;
pub fn rotate<T>(x: &mut T, y: &mut T, z: &mut T) {
mem::swap(x, y);
mem::swap(y, z);
}
fn main() {
let mut a = 1;
let mut b = 2;
let mut c = 3;
println!("{}, {}, {}", a, b, c);
// 1, 2, 3
rotate(&mut a, &mut b, &mut c);
println!("{}, {}, {}", a, b, c);
// 2, 3, 1
}
This produces 7 movl instructions (Rust 1.35.0, Release, x86_64, Linux)
playground::rotate:
movl (%rdi), %eax
movl (%rsi), %ecx
movl %ecx, (%rdi)
movl %eax, (%rsi)
movl (%rdx), %ecx
movl %ecx, (%rsi)
movl %eax, (%rdx)
retq
As opposed to the original 6 movl instructions:
playground::rotate_original:
movl (%rdx), %eax
movl (%rsi), %ecx
movl %ecx, (%rdx)
movl (%rdi), %ecx
movl %ecx, (%rsi)
movl %eax, (%rdi)
retq
I'm OK giving up that single instruction for purely safe code that is also easier to reason about.
In "real" code, I'd make use of the fact that all the variables are the same type and that slice::rotate_left and slice::rotate_right exist:
fn main() {
let mut vals = [1, 2, 3];
let [a, b, c] = &vals;
println!("{}, {}, {}", a, b, c);
// 1, 2, 3
vals.rotate_left(1);
let [a, b, c] = &vals;
println!("{}, {}, {}", a, b, c);
// 2, 3, 1
}
Related
I need an integral type that as a predefined limited range that includes 0, and want to implement like this:
#[repr(u8)]
pub enum X { A, B, C, D, E, F, G, H }
impl From<u8> for X {
fn from(x: u8) -> X {
unsafe { std::mem::transmute(x & 0b111) }
}
}
When I need the integer value, I would cast with as u8. Arithmetic ops would be implemented by casting to u8 then converting back into the enum using from. And because I limit the range with the bitand when converting from u8 to the enum, I'm always in range of the enum.
Some benefits I can see are that the range is known to the compiler so it can skip bounds checking, and enum optimizations such as representing Option<X> as 1 byte.
A drawback I can see via assembly is that I incur and al, 7 every time I convert to enum, but I can live with that.
Is this a sound transmutation of u8 into the enum? What are other drawbacks of representing a limited range integer this way, if any?
I don't think there is anything wrong with this transmutation, in that it is likely sound. However, I believe it is unnecessary.
If performance is critical for your application, you should test on your target arch, but I used the Rust playground to show the generated ASM (for whatever arch the playground runs on):
Your version:
#[repr(u8)]
#[derive(Debug)]
pub enum X { A, B, C, D, E, F, G, H }
impl From<u8> for X {
fn from(x: u8) -> X {
unsafe { std::mem::transmute(x & 0b111) }
}
}
#[no_mangle]
fn do_it_x(a: u8) -> X {
a.into()
}
Explicit match:
#[repr(u8)]
#[derive(Debug)]
pub enum Y { A, B, C, D, E, F, G, H }
impl From<u8> for Y {
fn from(y: u8) -> Y {
match y & 0b111 {
0 => Y::A,
1 => Y::B,
2 => Y::C,
3 => Y::D,
4 => Y::E,
5 => Y::F,
6 => Y::G,
7 => Y::H,
_ => unreachable!(),
}
}
}
#[no_mangle]
fn do_it_y(a: u8) -> Y {
a.into()
}
The resulting assembly (from the playground at least) is:
do_it_x:
pushq %rax
movb %dil, %al
movb %al, 7(%rsp)
movzbl %al, %edi
callq <T as core::convert::Into<U>>::into
movb %al, 6(%rsp)
movb 6(%rsp), %al
popq %rcx
retq
do_it_y:
pushq %rax
movb %dil, %al
movb %al, 7(%rsp)
movzbl %al, %edi
callq <T as core::convert::Into<U>>::into
movb %al, 6(%rsp)
movb 6(%rsp), %al
popq %rcx
retq
In the below code:
fn is_five(x: &i32) -> bool {
x as *const i32 == &5 as *const i32
}
fn main() {
let x = 5;
assert!(!is_five(&x));
assert!(is_five(&5));
assert!(!is_five(&6));
println!("Success!");
}
Why is_five(&x) is false, while is_five(&5) is true?
Code demo in Rust playground
Writing &5 to take the address of a constant could seem odd, but in this case the compiler decides to use a storage (in the constant section) and store this value in order to be able to take its address.
If this happens several times in the code, there is no need to have separate storages, all of them holding the same value.
If you try the following code in godbolt, you will see that the constant 5 is placed once in memory and the linker refers to it from several places.
In is_five(): lea rax, [rip + .L__unnamed_1]
In test2(): lea rdi, [rip + .L__unnamed_1]
Thus &5 refers to the same address, but &x, which is a local variable, refers to another storage for x initialised with the same value (5).
In test1(): lea rdi, [rsp + 4]
pub fn is_five(x: &i32) -> bool {
x as *const i32 == &5 as *const i32
}
pub fn test1() -> bool {
let x = 5;
is_five(&x)
}
pub fn test2() -> bool {
is_five(&5)
}
If distinction between reference and pointer comparisons is not clear, this documentation can help.
In my example below does cons.push(...) ever copy the self parameter?
Or is rustc intelligent enough to realize that the values coming from lines #a and #b can always use the same stack space and no copying needs to occur (except for the obvious i32 copies)?
In other words, does a call to Cons.push(self, ...) always create a copy of self as ownership is being moved? Or does the self struct always stay in place on the stack?
References to documentation would be appreciated.
#[derive(Debug)]
struct Cons<T, U>(T, U);
impl<T, U> Cons<T, U> {
fn push<V>(self, value: V) -> Cons<Self, V> {
Cons(self, value)
}
}
fn main() {
let cons = Cons(1, 2); // #a
let cons = cons.push(3); // #b
println!("{:?}", cons); // #c
}
The implication in my example above is whether or not the push(...) function grows more expensive to call each time we add a line like #b at the rate of O(n^2) (if self is copied each time) or at the rate of O(n) (if self stays in place).
I tried implementing the Drop trait and noticed that both #a and #b were dropped after #c. To me this seems to indicate that self stays in place in this example, but I'm not 100%.
In general, trust in the compiler! Rust + LLVM is a very powerful combination that often produces surprisingly efficient code. And it will improve even more in time.
In other words, does a call to Cons.push(self, ...) always create a copy of self as ownership is being moved? Or does the self struct always stay in place on the stack?
self cannot stay in place because the new value returned by the push method has type Cons<Self, V>, which is essentially a tuple of Self and V. Although tuples don't have any memory layout guarantees, I strongly believe they can't have their elements scattered arbitrarily in memory. Thus, self and value must both be moved into the new structure.
Above paragraph assumed that self was placed firmly on the stack before calling push. The compiler actually has enough information to know it should reserve enough space for the final structure. Especially with function inlining this becomes a very likely optimization.
The implication in my example above is whether or not the push(...) function grows more expensive to call each time we add a line like #b at the rate of O(n^2) (if self is copied each time) or at the rate of O(n) (if self stays in place).
Consider two functions (playground):
pub fn push_int(cons: Cons<i32, i32>, x: i32) -> Cons<Cons<i32, i32>, i32> {
cons.push(x)
}
pub fn push_int_again(
cons: Cons<Cons<i32, i32>, i32>,
x: i32,
) -> Cons<Cons<Cons<i32, i32>, i32>, i32> {
cons.push(x)
}
push_int adds a third element to a Cons and push_int_again adds a fourth element.
push_int compiles to the following assembly in Release mode:
movq %rdi, %rax
movl %esi, (%rdi)
movl %edx, 4(%rdi)
movl %ecx, 8(%rdi)
retq
And push_int_again compiles to:
movq %rdi, %rax
movl 8(%rsi), %ecx
movl %ecx, 8(%rdi)
movq (%rsi), %rcx
movq %rcx, (%rdi)
movl %edx, 12(%rdi)
retq
You don't need to understand assembly to see that pushing the fourth element requires more instructions than pushing the third element.
Note that this observation was made for these functions in isolation. Calls like cons.push(x).push(y).push(...) are inlined and the assembly grows linearly with one instruction per push.
The ownership of cons in #a type Cons will be transferred in push(). Again the ownership will be transferred to Cons<Cons,i32>(Cons<T,U>) type which is shadowed variable cons in #b.
If struct Cons implement Copy, Clone traits it will be copy. Otherwise no copy and you cannot use the original vars after they are moved (or owned) by someone else.
Move semantics:
let cons = Cons(1, 2); //Cons(1,2) as resource in memory being pointed by cons
let cons2 = cons; // Cons(1,2) now pointed by cons2. Problem! as cons also point it. Lets prevent access from cons
println!("{:?}", cons); //error because cons is moved
For shared references and mutable references the semantics are clear: as
long as you have a shared reference to a value, nothing else must have
mutable access, and a mutable reference can't be shared.
So this code:
#[no_mangle]
pub extern fn run_ref(a: &i32, b: &mut i32) -> (i32, i32) {
let x = *a;
*b = 1;
let y = *a;
(x, y)
}
compiles (on x86_64) to:
run_ref:
movl (%rdi), %ecx
movl $1, (%rsi)
movq %rcx, %rax
shlq $32, %rax
orq %rcx, %rax
retq
Note that the memory a points to is only read once, because the
compiler knows the write to b must not have modified the memory at
a.
Raw pointer are more complicated. Raw pointer arithmetic and casts are
"safe", but dereferencing them is not.
We can convert raw pointers back to shared and mutable references, and
then use them; this will certainly imply the usual reference semantics,
and the compiler can optimize accordingly.
But what are the semantics if we use raw pointers directly?
#[no_mangle]
pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = *a;
*b = 1.0;
let y = *a;
(x, y)
}
compiles to:
run_ptr_direct:
movl (%rdi), %ecx
movl $1065353216, (%rsi)
movl (%rdi), %eax
shlq $32, %rax
orq %rcx, %rax
retq
Although we write a value of different type, the second read still goes
to memory - it seems to be allowed to call this function with the same
(or overlapping) memory location for both arguments. In other words, a
const raw pointer does not forbid a coexisting mut raw pointer; and
its probably fine to have two mut raw pointers (of possibly different
types) to the same (or overlapping) memory location too.
Note that a normal optimizing C/C++-compiler would eliminate the second
read (due to the "strict aliasing" rule: modfying/reading the same
memory location through pointers of different ("incompatible") types is
UB in most cases):
struct tuple { int x; int y; };
extern "C" tuple run_ptr(int const* a, float* b) {
int const x = *a;
*b = 1.0;
int const y = *a;
return tuple{x, y};
}
compiles to:
run_ptr:
movl (%rdi), %eax
movl $0x3f800000, (%rsi)
movq %rax, %rdx
salq $32, %rdx
orq %rdx, %rax
ret
Playground with Rust code examples
godbolt Compiler Explorer with C example
So: What are the semantics if we use raw pointers directly: is it ok for
referenced data to overlap?
This should have direct implications on whether the compiler is allowed
to reorder memory access through raw pointers.
No awkward strict-aliasing here
C++ strict-aliasing is a patch on a wooden leg. C++ does not have any aliasing information, and the absence of aliasing information prevents a number of optimizations (as you noted here), therefore to regain some performance strict-aliasing was patched on...
Unfortunately, strict-aliasing is awkward in a systems language, because reinterpreting raw-memory is the essence of what systems language are designed to do.
And doubly unfortunately it does not enable that many optimizations. For example, copying from one array to another must assume that the arrays may overlap.
restrict (from C) is a bit more helpful, although it only applies to one level at a time.
Instead, we have scope-based aliasing analysis
The essence of the aliasing analysis in Rust is based on lexical scopes (barring threads).
The beginner level explanation that you probably know is:
if you have a &T, then there is no &mut T to the same instance,
if you have a &mut T, then there is no &T or &mut T to the same instance.
As suited to a beginner, it is a slightly abbreviated version. For example:
fn main() {
let mut i = 32;
let mut_ref = &mut i;
let x: &i32 = mut_ref;
println!("{}", x);
}
is perfectly fine, even though both a &mut i32 (mut_ref) and a &i32 (x) point to the same instance!
If you try to access mut_ref after forming x, however, the truth is unveiled:
fn main() {
let mut i = 32;
let mut_ref = &mut i;
let x: &i32 = mut_ref;
*mut_ref = 2;
println!("{}", x);
}
error[E0506]: cannot assign to `*mut_ref` because it is borrowed
|
4 | let x: &i32 = mut_ref;
| ------- borrow of `*mut_ref` occurs here
5 | *mut_ref = 2;
| ^^^^^^^^^^^^ assignment to borrowed `*mut_ref` occurs here
So, it is fine to have both &mut T and &T pointing to the same memory location at the same time; however mutating through the &mut T will be disabled for as long as the &T exists.
In a sense, the &mut T is temporarily downgraded to a &T.
So, what of pointers?
First of all, let's review the reference:
are not guaranteed to point to valid memory and are not even guaranteed to be non-NULL (unlike both Box and &);
do not have any automatic clean-up, unlike Box, and so require manual resource management;
are plain-old-data, that is, they don't move ownership, again unlike Box, hence the Rust compiler cannot protect against bugs like use-after-free;
lack any form of lifetimes, unlike &, and so the compiler cannot reason about dangling pointers; and
have no guarantees about aliasing or mutability other than mutation not being allowed directly through a *const T.
Conspicuously absent is any rule forbidding from casting a *const T to a *mut T. That's normal, it's allowed, and therefore the last point is really more of a lint, since it can be so easily worked around.
Nomicon
A discussion of unsafe Rust would not be complete without pointing to the Nomicon.
Essentially, the rules of unsafe Rust are rather simple: uphold whatever guarantee the compiler would have if it was safe Rust.
This is not as helpful as it could be, since those rules are not set in stone yet; sorry.
Then, what are the semantics for dereferencing raw pointers?
As far as I know1:
if you form a reference from the raw pointer (&T or &mut T) then you must ensure that the aliasing rules these references obey are upheld,
if you immediately read/write, this temporarily forms a reference.
That is, providing that the caller had mutable access to the location:
pub unsafe fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = *a;
*b = 1.0;
let y = *a;
(x, y)
}
should be valid, because *a has type i32, so there is no overlap of lifetime in references.
However, I would expect:
pub unsafe fn run_ptr_modified(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = &*a;
*b = 1.0;
let y = *a;
(*x, y)
}
To be undefined behavior, because x would be live while *b is used to modify its memory.
Note how subtle the change is. It's easy to break invariants in unsafe code.
1 And I might be wrong right now, or I may become wrong in the future
I have a small struct:
pub struct Foo {
pub a: i32,
pub b: i32,
pub c: i32,
}
I was using pairs of the fields in the form (a,b) (b,c) (c,a). To avoid duplication of the code, I created a utility function which would allow me to iterate over the pairs:
fn get_foo_ref(&self) -> [(&i32, &i32); 3] {
[(&self.a, &self.b), (&self.b, &self.c), (&self.c, &self.a)]
}
I had to decide if I should return the values as references or copy the i32. Later on, I plan to switch to a non-Copy type instead of an i32, so I decided to use references. I expected the resulting code should be equivalent since everything would be inlined.
I am generally optimistic about optimizations, so I suspected that the code would be equivalent when using this function as compared to hand written code examples.
First the variant using the function:
pub fn testing_ref(f: Foo) -> i32 {
let mut sum = 0;
for i in 0..3 {
let (l, r) = f.get_foo_ref()[i];
sum += *l + *r;
}
sum
}
Then the hand-written variant:
pub fn testing_direct(f: Foo) -> i32 {
let mut sum = 0;
sum += f.a + f.b;
sum += f.b + f.c;
sum += f.c + f.a;
sum
}
To my disappointment, all 3 methods resulted in different assembly code. The worst code was generated for the case with references, and the best code was the one that didn't use my utility function at all. Why is that? Shouldn't the compiler generate equivalent code in this case?
You can view the resulting assembly code on Godbolt; I also have the 'equivalent' assembly code from C++.
In C++, the compiler generated equivalent code between get_foo and get_foo_ref, although I don't understand why the code for all 3 cases is not equivalent.
Why did the compiler did not generate equivalent code for all 3 cases?
Update:
I've modified slightly code to use arrays and to add one more direct case.
Rust version with f64 and arrays
C++ version with f64 and arrays
This time the generated code between in C++ is exactly the same. However the Rust' assembly differs, and returning by references results in worse assembly.
Well, I guess this is another example that nothing can be taken for granted.
TL;DR: Microbenchmarks are trickery, instruction count does not directly translate into high/low performance.
Later on, I plan to switch to a non-Copy type instead of an i32, so I decided to use references.
Then, you should check the generated assembly for your new type.
In your optimized example, the compiler is being very crafty:
pub fn testing_direct(f: Foo) -> i32 {
let mut sum = 0;
sum += f.a + f.b;
sum += f.b + f.c;
sum += f.c + f.a;
sum
}
Yields:
example::testing_direct:
push rbp
mov rbp, rsp
mov eax, dword ptr [rdi + 4]
add eax, dword ptr [rdi]
add eax, dword ptr [rdi + 8]
add eax, eax
pop rbp
ret
Which is roughly sum += f.a; sum += f.b; sum += f.c; sum += sum;.
That is, the compiler realized that:
f.X was added twice
f.X * 2 was equivalent to adding it twice
While the former may be inhibited in the other cases by the use of indirection, the latter is VERY specific to i32 (and addition being commutative).
For example, switching your code to f32 (still Copy, but addition is not longer commutative), I get the very same assembly for both testing_direct and testing (and slightly different for testing_ref):
example::testing:
push rbp
mov rbp, rsp
movss xmm1, dword ptr [rdi]
movss xmm2, dword ptr [rdi + 4]
movss xmm0, dword ptr [rdi + 8]
movaps xmm3, xmm1
addss xmm3, xmm2
xorps xmm4, xmm4
addss xmm4, xmm3
addss xmm2, xmm0
addss xmm2, xmm4
addss xmm0, xmm1
addss xmm0, xmm2
pop rbp
ret
And there's no trickery any longer.
So it's really not possible to infer much from your example, check with the real type.