Is casting between integers expensive? - rust

I am working on a project where I am doing a lot of index-based calculation. I have a few lines like:
let mut current_x: usize = (start.x as isize + i as isize * delta_x) as usize;
start.x and i are usizes and delta_x is of type isize. Most of my data is unsigned, therefore storing it signed would not make much sense. On the other hand, when I manipulate an array I am accessing a lot I have to convert everything back to usize as seen above.
Is casting between integers expensive? Does it have an impact on runtime performance at all?
Are there other ways to handle index arithmetics easier / more efficiently?

It depends
It's basically impossible to answer your question in isolation. These types of low-level things can be aggressively combined with operations that have to happen anyway, so any amount of inlining can change the behavior. Additionally, it strongly depends on your processor; changing to a 64-bit number on an 8-bit microcontroller is probably pretty expensive!
My general advice is to not worry. Keep your types consistent, get the right answers, then profile your code and fix the issues you find.
Pragmatically, what are you going to do instead?
That said, here's some concrete stuff for x86-64 and Rust 1.18.0.
Same size, changing sign
Basically no impact. If these were inlined, then you probably would never even see any assembly.
#[inline(never)]
pub fn signed_to_unsigned(i: isize) -> usize {
i as usize
}
#[inline(never)]
pub fn unsigned_to_signed(i: usize) -> isize {
i as isize
}
Each generates the assembly
movq %rdi, %rax
retq
Extending a value
These have to sign- or zero-extend the value, so some kind of minimal operation has to occur to fill those extra bits:
#[inline(never)]
pub fn u8_to_u64(i: u8) -> u64 {
i as u64
}
#[inline(never)]
pub fn i8_to_i64(i: i8) -> i64 {
i as i64
}
Generates the assembly
movzbl %dil, %eax
retq
movsbq %dil, %rax
retq
Truncating a value
Truncating is again just another move, basically no impact.
#[inline(never)]
pub fn u64_to_u8(i: u64) -> u8 {
i as u8
}
#[inline(never)]
pub fn i64_to_i8(i: i64) -> i8 {
i as i8
}
Generates the assembly
movl %edi, %eax
retq
movl %edi, %eax
retq
All these operations boil down to a single instruction on x86-64. Then you get into complications around "how long does an operation take" and that's even harder.

Related

Performance difference between bitpacking bytes into a u32 vs storing them in a vec<u8>?

Intro:
I'm curious about the performance difference (both cpu and memory usage) of storing small numbers as bitpacked unsigned integers versus vectors of bytes
Example
I'll use the example of storing RGBA values. They're 4 Bytes so it is very tempting to store them as a u32.
However, it would be more readable to store them as a vector of type u8.
As a more detailed example, say I want to store and retrieve the color rgba(255,0,0,255)
This is how I would go about doing the two methods:
// Bitpacked:
let i: u32 = 4278190335;
//binary is 11111111 00000000 00000000 11111111
//In reality I would most likely do something more similar to:
let i: u32 = 255 << 24 + 255; //i think this syntax is right
// Vector:
let v: Vec<u8> = [255,0,0,255];
Then the two red values could be queried with
i >> 24
//or
&v[0]
//both expressions evaluate to 255 (i think. I'm really new to rust <3 )
Question 1
As far as I know, the values of v must be stored on the heap and so there are the performance costs that are associated with that. Are these costs significant enough to make bit packing worth it?
Question 2
Then there's the two expressions i >> 24 and &v[0]. I don't know how fast rust is at bit shifting versus getting values off the heap. I'd test it but I won't have access to a machine with rust installed for a while. Are there any immediate insights someone could give on the drawbacks of these two operations?
Question 3
Finally, is the difference in memory usage as simple as just storing 32 bits on the stack for the u32 versus storing 64 bits on the stack for the pointer v as well as 32 bits on the heap for the values of v?
Sorry if this question is a bit confusing
Using a Vec will be more expensive; as you mentioned, it will need to perform heap allocations, and access will be bounds-checked as well.
That said, if you use an array [u8; 4] instead, the performance compared with a bitpacked u32 representation should be almost identical.
In fact, consider the following simple example:
pub fn get_red_bitpacked(i: u32) -> u8 {
(i >> 24) as u8
}
pub fn get_red_array(v: [u8; 4]) -> u8 {
v[3]
}
pub fn test_bits(colour: u8) -> u8 {
let colour = colour as u32;
let i = (colour << 24) + colour;
get_red_bitpacked(i)
}
pub fn test_arr(colour: u8) -> u8 {
let v = [colour, 0, 0, colour];
get_red_array(v)
}
I took a look on Compiler Explorer, and the compiler decided that get_red_bitpacked and get_red_array were completely identical: so much so it didn't even bother generating code for the former. The two "test" functions obviously optimised to the exact same assembly as well.
example::get_red_array:
mov eax, edi
shr eax, 24
ret
example::test_bits:
mov eax, edi
ret
example::test_arr:
mov eax, edi
ret
Obviously this example was seen through by the compiler: for a proper comparison you should benchmark with actual code. That said, I feel fairly safe in saying that with Rust the performance of u32 versus [u8; 4] for these kinds of operations should be identical in general.
tl;dr use a struct:
struct Color {
r: u8,
g: u8,
b: u8,
a: u8,
}
Maybe use repr(packed) as well.
It gives you the best of all worlds and you can give the channels their name.
Are these costs significant enough to make bit packing worth it?
Heap allocation has a huge cost.
Are there any immediate insights someone could give on the drawbacks of these two operations?
Both are noise compared to allocating memory.

How to calculate u64 modulus u8 in Rust? [duplicate]

Editor's note: This question is from a version of Rust prior to 1.0 and references some items that are not present in Rust 1.0. The answers still contain valuable information.
What's the idiomatic way to convert from (say) a usize to a u32?
For example, casting using 4294967295us as u32 works and the Rust 0.12 reference docs on type casting say
A numeric value can be cast to any numeric type. A raw pointer value can be cast to or from any integral type or raw pointer type. Any other cast is unsupported and will fail to compile.
but 4294967296us as u32 will silently overflow and give a result of 0.
I found ToPrimitive and FromPrimitive which provide nice functions like to_u32() -> Option<u32>, but they're marked as unstable:
#[unstable(feature = "core", reason = "trait is likely to be removed")]
What's the idiomatic (and safe) way to convert between numeric (and pointer) types?
The platform-dependent size of isize / usize is one reason why I'm asking this question - the original scenario was I wanted to convert from u32 to usize so I could represent a tree in a Vec<u32> (e.g. let t = Vec![0u32, 0u32, 1u32], then to get the grand-parent of node 2 would be t[t[2us] as usize]), and I wondered how it would fail if usize was less than 32 bits.
Converting values
From a type that fits completely within another
There's no problem here. Use the From trait to be explicit that there's no loss occurring:
fn example(v: i8) -> i32 {
i32::from(v) // or v.into()
}
You could choose to use as, but it's recommended to avoid it when you don't need it (see below):
fn example(v: i8) -> i32 {
v as i32
}
From a type that doesn't fit completely in another
There isn't a single method that makes general sense - you are asking how to fit two things in a space meant for one. One good initial attempt is to use an Option — Some when the value fits and None otherwise. You can then fail your program or substitute a default value, depending on your needs.
Since Rust 1.34, you can use TryFrom:
use std::convert::TryFrom;
fn example(v: i32) -> Option<i8> {
i8::try_from(v).ok()
}
Before that, you'd have to write similar code yourself:
fn example(v: i32) -> Option<i8> {
if v > std::i8::MAX as i32 {
None
} else {
Some(v as i8)
}
}
From a type that may or may not fit completely within another
The range of numbers isize / usize can represent changes based on the platform you are compiling for. You'll need to use TryFrom regardless of your current platform.
See also:
How do I convert a usize to a u32 using TryFrom?
Why is type conversion from u64 to usize allowed using `as` but not `From`?
What as does
but 4294967296us as u32 will silently overflow and give a result of 0
When converting to a smaller type, as just takes the lower bits of the number, disregarding the upper bits, including the sign:
fn main() {
let a: u16 = 0x1234;
let b: u8 = a as u8;
println!("0x{:04x}, 0x{:02x}", a, b); // 0x1234, 0x34
let a: i16 = -257;
let b: u8 = a as u8;
println!("0x{:02x}, 0x{:02x}", a, b); // 0xfeff, 0xff
}
See also:
What is the difference between From::from and as in Rust?
About ToPrimitive / FromPrimitive
RFC 369, Num Reform, states:
Ideally [...] ToPrimitive [...] would all be removed in favor of a more principled way of working with C-like enums
In the meantime, these traits live on in the num crate:
ToPrimitive
FromPrimitive

Does moving ownership copy the `self` struct when calling a function?

In my example below does cons.push(...) ever copy the self parameter?
Or is rustc intelligent enough to realize that the values coming from lines #a and #b can always use the same stack space and no copying needs to occur (except for the obvious i32 copies)?
In other words, does a call to Cons.push(self, ...) always create a copy of self as ownership is being moved? Or does the self struct always stay in place on the stack?
References to documentation would be appreciated.
#[derive(Debug)]
struct Cons<T, U>(T, U);
impl<T, U> Cons<T, U> {
fn push<V>(self, value: V) -> Cons<Self, V> {
Cons(self, value)
}
}
fn main() {
let cons = Cons(1, 2); // #a
let cons = cons.push(3); // #b
println!("{:?}", cons); // #c
}
The implication in my example above is whether or not the push(...) function grows more expensive to call each time we add a line like #b at the rate of O(n^2) (if self is copied each time) or at the rate of O(n) (if self stays in place).
I tried implementing the Drop trait and noticed that both #a and #b were dropped after #c. To me this seems to indicate that self stays in place in this example, but I'm not 100%.
In general, trust in the compiler! Rust + LLVM is a very powerful combination that often produces surprisingly efficient code. And it will improve even more in time.
In other words, does a call to Cons.push(self, ...) always create a copy of self as ownership is being moved? Or does the self struct always stay in place on the stack?
self cannot stay in place because the new value returned by the push method has type Cons<Self, V>, which is essentially a tuple of Self and V. Although tuples don't have any memory layout guarantees, I strongly believe they can't have their elements scattered arbitrarily in memory. Thus, self and value must both be moved into the new structure.
Above paragraph assumed that self was placed firmly on the stack before calling push. The compiler actually has enough information to know it should reserve enough space for the final structure. Especially with function inlining this becomes a very likely optimization.
The implication in my example above is whether or not the push(...) function grows more expensive to call each time we add a line like #b at the rate of O(n^2) (if self is copied each time) or at the rate of O(n) (if self stays in place).
Consider two functions (playground):
pub fn push_int(cons: Cons<i32, i32>, x: i32) -> Cons<Cons<i32, i32>, i32> {
cons.push(x)
}
pub fn push_int_again(
cons: Cons<Cons<i32, i32>, i32>,
x: i32,
) -> Cons<Cons<Cons<i32, i32>, i32>, i32> {
cons.push(x)
}
push_int adds a third element to a Cons and push_int_again adds a fourth element.
push_int compiles to the following assembly in Release mode:
movq %rdi, %rax
movl %esi, (%rdi)
movl %edx, 4(%rdi)
movl %ecx, 8(%rdi)
retq
And push_int_again compiles to:
movq %rdi, %rax
movl 8(%rsi), %ecx
movl %ecx, 8(%rdi)
movq (%rsi), %rcx
movq %rcx, (%rdi)
movl %edx, 12(%rdi)
retq
You don't need to understand assembly to see that pushing the fourth element requires more instructions than pushing the third element.
Note that this observation was made for these functions in isolation. Calls like cons.push(x).push(y).push(...) are inlined and the assembly grows linearly with one instruction per push.
The ownership of cons in #a type Cons will be transferred in push(). Again the ownership will be transferred to Cons<Cons,i32>(Cons<T,U>) type which is shadowed variable cons in #b.
If struct Cons implement Copy, Clone traits it will be copy. Otherwise no copy and you cannot use the original vars after they are moved (or owned) by someone else.
Move semantics:
let cons = Cons(1, 2); //Cons(1,2) as resource in memory being pointed by cons
let cons2 = cons; // Cons(1,2) now pointed by cons2. Problem! as cons also point it. Lets prevent access from cons
println!("{:?}", cons); //error because cons is moved

What are the semantics for dereferencing raw pointers?

For shared references and mutable references the semantics are clear: as
long as you have a shared reference to a value, nothing else must have
mutable access, and a mutable reference can't be shared.
So this code:
#[no_mangle]
pub extern fn run_ref(a: &i32, b: &mut i32) -> (i32, i32) {
let x = *a;
*b = 1;
let y = *a;
(x, y)
}
compiles (on x86_64) to:
run_ref:
movl (%rdi), %ecx
movl $1, (%rsi)
movq %rcx, %rax
shlq $32, %rax
orq %rcx, %rax
retq
Note that the memory a points to is only read once, because the
compiler knows the write to b must not have modified the memory at
a.
Raw pointer are more complicated. Raw pointer arithmetic and casts are
"safe", but dereferencing them is not.
We can convert raw pointers back to shared and mutable references, and
then use them; this will certainly imply the usual reference semantics,
and the compiler can optimize accordingly.
But what are the semantics if we use raw pointers directly?
#[no_mangle]
pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = *a;
*b = 1.0;
let y = *a;
(x, y)
}
compiles to:
run_ptr_direct:
movl (%rdi), %ecx
movl $1065353216, (%rsi)
movl (%rdi), %eax
shlq $32, %rax
orq %rcx, %rax
retq
Although we write a value of different type, the second read still goes
to memory - it seems to be allowed to call this function with the same
(or overlapping) memory location for both arguments. In other words, a
const raw pointer does not forbid a coexisting mut raw pointer; and
its probably fine to have two mut raw pointers (of possibly different
types) to the same (or overlapping) memory location too.
Note that a normal optimizing C/C++-compiler would eliminate the second
read (due to the "strict aliasing" rule: modfying/reading the same
memory location through pointers of different ("incompatible") types is
UB in most cases):
struct tuple { int x; int y; };
extern "C" tuple run_ptr(int const* a, float* b) {
int const x = *a;
*b = 1.0;
int const y = *a;
return tuple{x, y};
}
compiles to:
run_ptr:
movl (%rdi), %eax
movl $0x3f800000, (%rsi)
movq %rax, %rdx
salq $32, %rdx
orq %rdx, %rax
ret
Playground with Rust code examples
godbolt Compiler Explorer with C example
So: What are the semantics if we use raw pointers directly: is it ok for
referenced data to overlap?
This should have direct implications on whether the compiler is allowed
to reorder memory access through raw pointers.
No awkward strict-aliasing here
C++ strict-aliasing is a patch on a wooden leg. C++ does not have any aliasing information, and the absence of aliasing information prevents a number of optimizations (as you noted here), therefore to regain some performance strict-aliasing was patched on...
Unfortunately, strict-aliasing is awkward in a systems language, because reinterpreting raw-memory is the essence of what systems language are designed to do.
And doubly unfortunately it does not enable that many optimizations. For example, copying from one array to another must assume that the arrays may overlap.
restrict (from C) is a bit more helpful, although it only applies to one level at a time.
Instead, we have scope-based aliasing analysis
The essence of the aliasing analysis in Rust is based on lexical scopes (barring threads).
The beginner level explanation that you probably know is:
if you have a &T, then there is no &mut T to the same instance,
if you have a &mut T, then there is no &T or &mut T to the same instance.
As suited to a beginner, it is a slightly abbreviated version. For example:
fn main() {
let mut i = 32;
let mut_ref = &mut i;
let x: &i32 = mut_ref;
println!("{}", x);
}
is perfectly fine, even though both a &mut i32 (mut_ref) and a &i32 (x) point to the same instance!
If you try to access mut_ref after forming x, however, the truth is unveiled:
fn main() {
let mut i = 32;
let mut_ref = &mut i;
let x: &i32 = mut_ref;
*mut_ref = 2;
println!("{}", x);
}
error[E0506]: cannot assign to `*mut_ref` because it is borrowed
|
4 | let x: &i32 = mut_ref;
| ------- borrow of `*mut_ref` occurs here
5 | *mut_ref = 2;
| ^^^^^^^^^^^^ assignment to borrowed `*mut_ref` occurs here
So, it is fine to have both &mut T and &T pointing to the same memory location at the same time; however mutating through the &mut T will be disabled for as long as the &T exists.
In a sense, the &mut T is temporarily downgraded to a &T.
So, what of pointers?
First of all, let's review the reference:
are not guaranteed to point to valid memory and are not even guaranteed to be non-NULL (unlike both Box and &);
do not have any automatic clean-up, unlike Box, and so require manual resource management;
are plain-old-data, that is, they don't move ownership, again unlike Box, hence the Rust compiler cannot protect against bugs like use-after-free;
lack any form of lifetimes, unlike &, and so the compiler cannot reason about dangling pointers; and
have no guarantees about aliasing or mutability other than mutation not being allowed directly through a *const T.
Conspicuously absent is any rule forbidding from casting a *const T to a *mut T. That's normal, it's allowed, and therefore the last point is really more of a lint, since it can be so easily worked around.
Nomicon
A discussion of unsafe Rust would not be complete without pointing to the Nomicon.
Essentially, the rules of unsafe Rust are rather simple: uphold whatever guarantee the compiler would have if it was safe Rust.
This is not as helpful as it could be, since those rules are not set in stone yet; sorry.
Then, what are the semantics for dereferencing raw pointers?
As far as I know1:
if you form a reference from the raw pointer (&T or &mut T) then you must ensure that the aliasing rules these references obey are upheld,
if you immediately read/write, this temporarily forms a reference.
That is, providing that the caller had mutable access to the location:
pub unsafe fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = *a;
*b = 1.0;
let y = *a;
(x, y)
}
should be valid, because *a has type i32, so there is no overlap of lifetime in references.
However, I would expect:
pub unsafe fn run_ptr_modified(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = &*a;
*b = 1.0;
let y = *a;
(*x, y)
}
To be undefined behavior, because x would be live while *b is used to modify its memory.
Note how subtle the change is. It's easy to break invariants in unsafe code.
1 And I might be wrong right now, or I may become wrong in the future

Lifetime differences between references to zero sized types

I came across an interesting case while playing with zero sized types (ZSTs). A reference to an empty array will mold to a reference with any lifetime:
fn mold_slice<'a, T>(_: &'a T) -> &'a [T] {
&[]
}
I thought about how that is possible, since basically the "value" here lives on the stack frame of the function, yet the signature promises to return a reference to a value with a longer lifetime ('a contains the function call). I came to the conclusion that it is because the empty array [] is a ZST which basically only exists statically. The compiler can "fake" the value the reference refers to.
So I tried this:
fn mold_unit<'a, T>(_: &'a T) -> &'a () {
&()
}
and then the compiler complained:
error: borrowed value does not live long enough
--> <anon>:7:6
|
7 | &()
| ^^ temporary value created here
8 | }
| - temporary value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the block at 6:40...
--> <anon>:6:41
|
6 | fn mold_unit<'a, T>(_: &'a T) -> &'a () {
| ^
It doesn't work for the unit () type, and it also does not work for an empty struct:
struct Empty;
// fails to compile as well
fn mold_struct<'a, T>(_: &'a T) -> &'a Empty {
&Empty
}
Somehow, the unit type and the empty struct are treated differently from the empty array. Are there any additional differences between those values besides just being ZSTs? Do the differences (&[] fitting any lifetime and &(), &Empty not) nothing to do with ZSTs at all?
Playground example
It's not that [] is zero-sized (though it is), it's that [] is a constant, compile-time literal. This means the compiler can store it in the executable, rather than having to allocate it dynamically on the heap or stack. This, in turn, means that pointers to it last as long as they want, because data in the executable isn't going anywhere.
Annoyingly, this doesn't extend to something like &[0], because Rust isn't quite smart enough to realise that [0] is definitely constant. You can work around this by using something like:
fn mold_slice<'a, T>(_: &'a T) -> &'a [i32] {
const C: &'static [i32] = &[0];
C
}
This trick also works with anything you can put in a const, like () or Empty.
Realistically, however, it'd be simpler to just have functions like this return a &'static borrow, since that can be coerced to any other lifetime automatically.
Edit: the previous version noted that &[] is not zero sized, which was a little tangential.
Do the differences (&[] fitting any lifetime and &(), &Empty not) nothing to do with ZSTs at all?
I think this is exactly the case. The compiler probably just treats arrays differently and there is no deeper reasoning behind it.
The only difference that could play a role is that &[] is a fat pointer, consisting of the data pointer and a length. This fat pointer itself expresses the fact that there is actually no data behind it (because length=0). &() on the other hand is just a normal pointer. Here, only the type system expresses the fact that it's not pointing to anything real. But I'm just guessing here.
To clarify: a referencing fitting any lifetime means that the reference has the 'static lifetime. So instead of introducing some lifetime 'a, we can just return a static reference and will have the same effect (&[] works, the others don't).
There is an RFC which specifies that references to constexpr rvalues will be stored in the static data section of the executable, instead of the stack. After this RFC has been implemented (tracking issue), all of your example will compile, as [], () and Empty are constexpr rvalues. References to it will always be 'static. But the important part of the RFC is that it works for non-ZSTs, too: e.g. &27 has the type &'static i32.
To have some fun, let's look at the generated assembly (I used the amazing Compiler Explorer)! First let's try the working version:
pub fn mold_slice() -> &'static [i32] {
&[]
}
Using the -O flag (meaning: optimizations enabled; I checked the unoptimized version, too, and it doesn't have significant differences), this is compiled down to:
mold_slice:
push rbp
mov rbp, rsp
lea rax, [rip + ref.0]
xor edx, edx
pop rbp
ret
ref.0:
The fat pointer is returned in the rax (data pointer) and rdx (length) registers. As you can see, the length is set to 0 (xor edx, edx) and the data pointer is set to this mysterious ref.0. The ref.0 is not actually referencing anything at all. It's just an empty marker. This means we return just some pointer to the data section.
Now let's just tell the compiler to trust us on &() in order to compile it:
pub fn possibly_broken() -> &'static () {
unsafe { std::mem::transmute(&()) }
}
Result:
possibly_broken:
push rbp
mov rbp, rsp
lea rax, [rip + ref.1]
pop rbp
ret
ref.1:
Wow, we pretty much see the same result! The pointer (returned via rax) points somewhere to the data section. So it actually is a 'static reference after code generation. Only the lifetime checker doesn't quite know that and still refuses to compile the code. Well... I guess this is nothing dramatic, especially since the RFC mentioned above will fix that in near future.

Resources