Should we use u32/i32 or it's lower variant (u8/i8, u16/i16) when dealing with limited range number like "days in month" which ranged from 1-30 or "score of a subject" which ranged from 0 to 100? Or why we shouldn't?
Is there any optimization or benefit on the lower variant (i.e. memory efficient)?
Summary
Correctness should be prioritized over performance and correctness-wise (for ranges like 1–100), all solutions (u8, u32, ...) are equally bad. The best solution would be to create a new type to benefit from strong typing.
The rest of my answer tries to justify this claim and discusses different ways of creating the new type.
More explanation
Let's take a look at the "score of subject" example: the only legal values are 0–100. I'd argue that correctness-wise, using u8 and u32 is equally bad: in both cases, your variable can hold values that are not legal in your semantic context; that's bad!
And arguing that the u8 is better, because there are less illegal values, is like arguing that wrestling a bear is better than walking through New York, because you only have one possibility of dying (blood loss by bear attack) as opposed to the many possibilities of death (car accident, knife attack, drowning, ...) in New York.
So what we want is a type that guarantees to hold only legal values. We want to create a new type that does exactly this. However, there are multiple ways to proceed; each with different advantages and disadvantages.
(A) Make the inner value public
struct ScoreOfSubject(pub u8);
Advantage: at least APIs are more easy to understand, because the parameter is already explained by the type. What is easier to understand:
add_record("peter", 75, 47) or
add_record("peter", StudentId(75), ScoreOfSubject(47))?
I'd say the latter one ;-)
Disadvantage: we don't actually do any range checking and illegal values can still occur; bad!.
(B) Make inner value private and supply a range checking constructor
struct ScoreOfSubject(pub u8);
impl ScoreOfSubject {
pub fn new(value: u8) -> Self {
assert!(value <= 100);
ScoreOfSubject(value)
}
pub fn get(&self) -> u8 { self.0 }
}
Advantage: we enforce legal values with very little code, yeah :)
Disadvantage: working with the type can be annoying. Pretty much every operation requires the programmer to pack & unpack the value.
(C) Add a bunch of implementations (in addition to (B))
(the code would impl Add<_>, impl Display and so on)
Advantage: the programmer can use the type and do all useful operations on it directly -- with range checking! This is pretty optimal.
Please take a look at Matthieu M.'s comment:
[...] generally multiplying scores together, or dividing them, does not produce a score! Strong typing not only enforces valid values, it also enforces valid operations, so that you don't actually divide two scores together to get another score.
I think this is a very important point I failed to make clear before. Strong typing prevents the programmer from executing illegal operations on values (operations that don't make any sense). A good example is the crate cgmath which distinguishes between point and direction vectors, because both support different operations on them. You can find additional explanation here.
Disadvantage: a lot of code :(
Luckily the disadvantage can be reduced by the Rust macro/compiler plugin system. There are crates like newtype_derive or bounded_integer that do this kind of code generation for you (disclaimer: I never worked with them).
But now you say: "you can't be serious? Am I supposed to spend my time writing new types?".
Not necessarily, but if you are working on production code (== at least somewhat important), then my answer is: yes, you should.
A no-answer answer: I doubt you would see any difference in benchmarks, unless you do A LOT of arithmetic or process HUGE arrays of numbers.
You should probably just go with the type which makes more sense (no reason to use negatives or have an upper bound in millions for a day of month) and provides the methods you need (e.g. you can't perform abs() directly on an unsigned integer).
There could be major benefits using smaller types but you would have to benchmark your application on your target platform to be sure.
The first and most easily realized benefit from the lower memory footprint is better caching. Not only is your data more likely to fit into the cache, but it is also less likely to discard other data in the cache, potentially improving a completely different part of your application. Whether or not this is triggered depends on what memory your application touches and in what order. Do the benchmarks!
Network data transfers have an obvious benefit from using smaller types.
Smaller data allows "larger" instructions. A 128-bit SIMD unit can handle 4 32-bit data OR 16 8-bit data, making certain operations 4 times faster. In benchmarks I´ve made these instructions do execute 4 times faster indeed BUT the whole application improved by less than 1%, and the code became more of a mess. Shaping your program into making better use of SIMD can be tricky.
As of signed/unsigned discussions unsigned has slightly better properties which a compiler may or may not take advantage of.
Related
I am learning rust and in the official tutorial, the author assigned the value 5 to a variable like so:
let x: i32 = 5;
I thought this was weird as one could use u8 as the type and the program would run fine. This got me thinking, are there any advantages to using a lower bit number? Is it faster?
The main advantage is that they use less memory. A vector<i32> with 1 billion elements will use 4GB, while a vector<u8> will use 1GB. This can be a significant advantage regardless of speed.
Arithmetic on smaller integer types on modern CPUs is not faster in general. There are some issues with using only part of a register but optimizers will almost certainly resolve these performance problems for you.
When you have a lot of integers and the optimizer can make use of vectorization (for example adding your 1 billion integers in the vector) then smaller types will typically yield better performance, because more of them fit in a SIMD register.
If you use them just as one scalar stack variable like in your example, I highly doubt there will be a difference in 99% of cases. Here other considerations are more important:
A bigger type will make overflows less likely, maybe you did calculate your maximal possible value wrong.
For public interfaces bigger types are more future proof.
Its better to cast from i8 to i32 than the other way round.
Copy means that the struct could be copied just by copying bytes as is. As a result, it should be easily possible to re-interpret such a struct as [u8]. What's the most idiomatic way to do so, preferably without involving unsafe.
I want to have an optimized struct which could be easily sent via processes/wire/disk. I understand, that there're a lot of details which needs to be taken care of, like alignment, and looking for a solution for such a high performance use case. I.e. I am looking for close to zero copy high performance serialization.
Copy means that the struct could be copied just by copying bytes as is.
This is true.
As a result, it should be easily possible to re-interpret such a struct as [u8].
This is not true, because Copy structs can still contain padding, which is not permitted to be read except incidentally while copying.
What's the most idiomatic way to do so, preferably without involving unsafe.
You should start with bytemuck. It is a library which provides trivial conversion to and from [u8] when it is safe to do so. In particular, it checks that there is no padding in the struct, and that the representation is well-defined (not subject to the whims of the compiler).
You will still need to consider alignment, and for that purpose may need to introduce explicit “padding” fields (whose value is explicitly set rather than being left undefined) so that the alignment of other fields is satisfied.
Your program's data will also not be compatible with machines of different endianness unless you take care. (However, it is possible to do so, in ways which have zero run-time overhead if not necessary, and most machines are little-endian today so that cost will almost never actually apply.)
For most data types, I follow the convention in https://stackoverflow.com/a/35391084/11963778 and have getters returning references:
trait HasName {
fn name(&self) -> &String;
fn name_mut(&mut self) -> &mut String;
}
However, for data types that have copy semantics and are smaller than (or around the size of) a pointer, should I have a getter method returning the value instead? It would look something like this:
trait HasNum {
fn num_v(&self) -> i32;
fn num(&self) -> &i32;
fn num_mut(&mut self) -> &mut i32;
}
Is it good practice to have a getter that returns the value instead? If so, then up to what size should I do this for small data types?
As a rule of thumb you can copy values held on a single cache line instead of using references. While cache lines are typically 64bytes on x86, Intel recommends limiting data to 16 bytes to reduce the chance of the value not being aligned.
So in other words, its probably fine to just copy anything around the size of [i32; 4] or less.
Note: While there is some reasoning behind it, I just made this rule up based on what I know about performance so far. If enough people were to look at this post, I'm sure someone else will have a better answer. That being said, even if my reasoning is a bit off I think it will likely still hold up in most cases when you aren't trying to optimize an extremely time critical piece of code or for a specific CPU.
In the time I spent writing this answer I also found a few more interesting points in this question.
https://stackoverflow.com/a/40185996/5987669
It is common, for example, for a machine to have an architecture (machine registers, memory architecture, etc) which result in a "sweet spot" - copying variables of some size is most "efficient", but copying larger OR SMALLER variables is less so. Larger variables will cost more to copy, because there may be a need to do multiple copies of smaller chunks. Smaller ones may also cost more, because the compiler needs to copy the smaller value into a larger variable (or register), do the operations on it, then copy the value back.
https://stackoverflow.com/a/49523201/5987669
This answer is specific to C, but I wouldn't be surprised if it applied to Rust as well
There is a certain GCC optimization called IPA SRA, that replaces "pass by reference" with "pass by value" automatically: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html (-fipa-sra)
...
So with this optimization enabled, using references for small types should be as fast as passing them by value.
On the other hand passing (for example) std::string by value could not be optimized to by-reference speed, as custom copy semantics are being involved.
I would like to create a new data type in Rust on the "bit-level".
For example, a quadruple-precision float. I could create a structure that has two double-precision floats and arbitrarily increase the precision by splitting the quad into two doubles, but I don't want to do that (that's what I mean by on the "bit-level").
I thought about using a u8-array or a bool-array but in both cases, I waste 7 bits of memory (because also bool is a byte large). I know there are several crates that implement something like bit-arrays or bit-vectors, but looking through their source code didn't help me to understand their implementation.
How would I create such a bit-array without wasting memory, and is this the way I would want to choose when implementing something like a quad-precision type?
I don't know how to implement new data types that don't use the basic types or are structures that combine the basic types, and I haven't been able to find a solution on the internet yet; maybe I'm not searching with the right keywords.
The question you are asking has no direct answer: Just like any other programming language, Rust has a basic set of rules for type layouts. This is due to the fact that (most) real-world CPUs can't address individual bits, need certain alignments when referencing memory, have rules regarding how pointer arithmetic works etc. etc.
For instance, if you create a type of just two bits, you'll still need an 8-bit byte to represent that type, because there is simply no way to address two individual bits on most CPU's opcodes; there is also no way to take the address of such a type because addressing works at least on the byte-level. More useful information regarding this can be found here, section 2, The Anatomy of a Type. Be aware that the non-wasting bit-level type you are thinking about needs to fulfill all the rules mentioned there.
It's a perfectly reasonable approach to represent what you want to do e.g. either as a single, wrapped u128 and implement all arithmetic on top of that type. Another, more generic, approach would be to use a Vec<u8>. You'll always do a relatively large amount of bit-masking, indirecting and such.
Having a look at rust_decimal or similar crates might also be a good idea.
I ran across a comment on reddit that indicates that using Cell<T> prevents certain optimizations from occurring:
Cell works with no memory overhead (Cell is the same size as T) and little runtime overhead (it "just" inhibits optimisations, it doesn't introduce extra explicit operations)
This seems counter to other things I've read about Cell<T>, in particular that it's "zero-cost." The first place I encountered this categorization is here.
With all that said, I'd like to understand the actual cost of using Cell<T>, including whatever optimizations it may prevent.
TL;DR Cell is Zero-Overhead Abstraction; that is, the same functionality implemented manually has the same cost.
The term Zero-Cost Abstractions is not English, it's jargon. The idea of Zero-Cost Abstractions is that the layer of abstraction itself does not add any cost compared to manually doing the same thing.
There are various misunderstandings that have sprung up: most notably, I have regularly seen zero-cost understood as "the operation is free", which is not the case.
To add to the confusion, the exception mechanism used by most C++ implementations, and which Rust uses for panic = unwind is called Zero-Cost Exceptions, and purports1 to add no overhead on the non-throwing path. It's a different kind of Zero-Cost...
Lately, my recommendation is to switch to using the term Zero-Overhead Abstractions: first because it's a distinct term from Zero-Cost Exceptions, so less likely to be mistaken, and second because it emphasizes that the Abstraction does not add Overhead, which is what we are trying to convey in the first place.
1 The objective is only partially achieved. While the same assembly executed with and without the possibility of throwing indeed has the same performance, the presence of potential exceptions may hinder the optimizer and cause it to generate sub-optimal assembly in the first place.
With all that said, I'd like to understand the actual cost of using Cell<T>, including whatever optimizations it may prevent.
On the memory side, there is no overhead:
sizeof::<Cell<T>>() == sizeof::<T>(),
given a cell of type Cell<T>, &cell == cell.as_ptr().
(You can peek at the source code)
On the access side, Cell<T> does incur a run-time cost compared to T; the cost of the extra functionality.
The most immediate cost is that manipulating the value through a &Cell<T> requires copying it back and forth1. This is a bitwise copy, so the optimizer may elide it, if it can prove that it is safe to do so.
Another notable cost is that UnsafeCell<T>, on which Cell<T> is based, breaks the rules that &T means that T cannot be modified.
When a compiler can prove that a portion of memory cannot be modified, it can optimize out further reads: read t.foo in a register, then use the register value rather than reading t.foo again.
In traditional Rust code, a &T gives such a guarantee: no matter if there are opaque function calls, calls to C code, etc... between two reads to t.foo, the second read will return the same value as the first, guaranteed. With a &Cell<T>, there is no such guarantee any longer, and thus unless the optimizer can prove beyond doubt that the value is unmodified2, then it cannot apply such optimizations.
1 You can manipulate the value at no cost through &mut Cell<T> or using unsafe code.
2 For example, if the optimizer knows that the value resides on the stack, and it never passed the address of the value to anyone else, then it can reasonably conclude that no one else can modify the value. Although a stack-smashing attack may, of course.