Rustc only warns when value that overflows is assigned - rust

I am finding what I think is a very strange behaviour. Rustc panics when when a variable overflows at runtime; this makes sense to me. However, it only raises a warning when value that overflows is assigned at compile time. Shouldn't that be a compile time error? Otherwise, two behaviours seem inconsistent.
I expect a compile time error:
fn main() {
let b: i32 = 3_000_000_000;
println!("{}", b);
}
Produces:
<anon>:2:18: 2:31 warning: literal out of range for i32, #[warn(overflowing_literals)] on by default
<anon>:2 let b: i32 = 3_000_000_000;
Playground 1
This makes sense to me:
fn main() {
let b: i32 = 30_000;
let c: i32 = 100_000;
let d = b * c;
println!("{}", d);
}
Produces:
thread '<main>' panicked at 'arithmetic operation overflowed', <anon>:4
playpen: application terminated with error code 101
Playground 2
Edit:
Given the comment by FrancisGagné, and me discovering that Rust implements operators that check for overflow during the operation, for example checked_mul, I see that one needs to implement overflow checks themselves. Which makes sense, because release version should be optimized, and constantly checking for overflows could get expensive. So I no longer see the "inconsistency". However, I am still surprised, that assigning a value that would overflow does not lead to compile time error. In golang it would: Go Playground

Actually, your comments are not consistent with the behavior you observe:
in your first example: you get a compile-time warning, which you ignore, and thus the compiler deduces that you want wrapping behavior
in your second example: you get a run-time error
The Go example is similar to the first Rust example (except that Go, by design, does not have warnings).
In Rust, an underflow or overflow results in an unspecified value which can be ! or bottom in computer science, a special value indicating that the control flow diverge which in general means either abortion or exception.
This specification allows:
instrumenting the Debug mode to catch all overflows at the very point at which they occur
not instrumenting1 the Release mode (and using wrapping arithmetic there)
and yet have both modes be consistent with the specification.
1 Not instrumenting by default, you can if you choose and for a relatively modest performance cost outside of heavy numeric code activate the overflow checks in Release with a simple flag.
On the cost of overflow checks: the current Rust/LLVM situation is helpful for debugging but has not really been optimized. Thus, in this framework, overflow checks cost. If the situation improves, then rustc might decide, one day, to activate overflow checking by default even in Release.
In Midori (a Microsoft experimental OS developed in a language similar to C#), overflow check was turned on even in Release builds:
In Midori, we compiled with overflow checking on by default. This is different from stock C#, where you must explicitly pass the /checked flag for this behavior. In our experience, the number of surprising overflows that were caught, and unintended, was well worth the inconvenience and cost. But it did mean that our compiler needed to get really good at understanding how to eliminate unnecessary ones.
Apparently, they improved their compiler so that:
it would reason about the ranges of variables, and statically eliminate bounds checks and overflow checks when possible
it would aggregate checks as much as possible (a single check for multiple potentially overflowing operations)
The latter is only to be done in Release (you lose precision) but reduces the number of branches.
So, what cost remain?
Potentially different arithmetic rules that get in the way of optimizations:
in regular arithmetic, 64 + x - 128 can be optimized to x - 64; with overflow checks activated the compiler might not be able to perform this optimization
vectorization can be hampered too, if the compiler does not have overflow checking vector built-ins
...
Still, unless the code is heavily numeric (scientific simulations or graphics, for example), then it might impact it indeed.

Related

Rust features which allow the optimizer to change the program's result?

In some languages, optimization is allowed to change the program execution result. For example,
C++11 has the concept of "copy-elision" which allows the optimizer to ignore the copy constructor (and its side-effects) in some circumstances.
Swift has the concept of "imprecise lifetimes" which allows the optimizer to release objects at any time after last usage before the end of lexical scope.
In both cases, optimizations are not guaranteed to happen, therefore the program execution result can be significantly different based on the optimizer implementations (e.g. debug vs. release build)
Copying can be skipped, object can die while a reference is alive. The only way to deal with these behaviors is by being defensive and making your program work correctly regardless if the optimizations happen or not. If you don't know about the existence of this behavior, it's impossible to write correct programs with the tools.
This is different from "random operations" which are written by the programmer to produce random results intentionally. These behaviors are (1) done by optimizer and (2) can randomize execution result regardless of programmer intention. This is done by the language designer's intention for better performance. A sort of trade-off between performance and predictability.
Does Rust have (or consider) any of this kind of behavior? Any optimization that is allowed to change program execution result for better performance. If it has any, what is the behavior and why is it allowed?
I know the term "execution result" could be vague, but I don't know a proper term for this. I'm sorry for that.
I'd like to collect every potential case here, so everyone can be aware of them and be prepared for them. Please post any case as an answer (or comment) if you think your case produces different results.
I think all arguable cases are worth to mention. Because someone can be helped a lot by reading the case details.
If you restrict yourself to safe Rust code, the optimizer shouldn't change the program result. Of course there are some optimizations that can be observable due to their very nature. For example removing unused variables can mean your code overflows the stack without optimizations, while everything will fit on the stack when compiled with optimizations. Or your code may just be too slow to ever finish when compiled without optimizations, which is also an observable difference. And with unsafe code triggering undefined behaviour anything can happen, including the optimizer changing the outcome of your code.
There are, however, a few cases where program execution can change depending on whether you are compiling in debug mode or in release mode:
Integer overflow will result in a panic in debug build, while integers wrap around according to the two's complement representation in release mode – see RFC 650 for details. This behaviour can be controlled with the -C overflow-checks codegen option, so you can disable overflow checks in debug mode or enable them in release mode if you want to.
The debug_assert!() macro defines assertions that are only executed in debug mode. There's again a manual override using the -C debug-assertions codegen option.
Your code can check whether debug assertions are enabled using the debug-assertions configuration option
These are all related to debug assertions in some way, but this list is not exhaustive. You can probably also inspect the environment to determine whether the code is compiled in debug or release mode, and change the behaviour based on this.
None of these examples really fall into the same category as your examples in the original question. Safe Rust code should generally behave the same regardless of whether you compile in debug mode or release mode.
There are far fewer foot-guns in Rust when compared to C++. In general, they revolve around unsafe, raw pointers and lifetimes derived from them or any form of undefined behavior, which is really undefined in Rust as well. However, if your code compiles (and, if in doubt, passes cargo miri test), you most likely won't see surprising behavior.
Two examples that come to mind which can be surprising:
The lifetime of a MutexGuard; the example comes from the book:
while let Ok(job) = receiver.lock().unwrap().recv() {
job();
}
One might think/hope that the Mutex on the receiver is released once a job has been acquired and job() executes while other threads can receive jobs. However, due to the way value-expressions in place-expressions contexts work in conjunction with temporary lifetimes (the MutexGuard needs an anonymous lifetime referencing receiver), the MutexGuard is held for the entirety of the while-block. This means only one thread will ever execute jobs.
If you do
loop {
let job = receiver.lock().unwrap().recv().unwrap();
job();
}
this will allow multiple threads to run in parallel. It's not obvious why this is.
Multiple times there have been questions regarding const. There is no guarantee by the compiler if a const actually exists only once (as an optimization) or is instantiated wherever it is used. The second case is the way one should think about const, there is no guarantee that this is what the compiler does, though. So this can happen:
const EXAMPLE: Option<i32> = Some(42);
fn main() {
assert_eq!(EXAMPLE.take(), Some(42));
assert_eq!(EXAMPLE, Some(42)); // Where did this come from?
}

What happens when casting a big float to a int?

I was wondering what would happen when I cast a very large float value to an integer. This is an example I wrote:
fn main() {
let x = 82747650246702476024762_f32;//-1_i16;
let y = x as u8;
let z = x as i32;
println!("{} {} {}", x, y, z);
}
and the output is:
$ ./casts
82747650000000000000000 0 -2147483648
Obviously the float wouldn't fit in any of the integers, but since Rust so strongly advertises that it is safe, I would have expected an error of some kind. These operations use the llvm fptosi and fptoui instructions, which produce a so called poison value if the value doesn't fit within the type it has been casted to. This may produce undefined behavior, which is very bad, especially when writing Rust code.
How can I be sure my float to int casts don't result in undefined behavior in Rust? And why would Rust even allow this (as it is known for creating safe code)?
In Rust 1.44 and earlier, if you use as to cast a floating-point number to an integer type and the floating-point number does not fit¹ in the target type, the result is an undefined value², and most things that you can do with it cause undefined behavior.
This serious issue (#10184) was fixed in Rust 1.45. Since that release, float-to-integer casts saturate instead (that is, values that are too large or small are converted to T::MAX or T::MIN, respectively; NaN is converted to 0).
In older versions of Rust, you can enable the new, safe behavior with the -Z saturating-float-casts flag. Note that saturating casts may be slightly slower since they have to check the type bounds first. If you really need to avoid the check, the standard library provides to_int_unchecked. Since the behavior is undefined when the number is out of range, you must use unsafe.
(There used to be a similar issue for certain integer-to-float casts, but it was resolved by making such casts always saturating. This change was not considered a performance regression and there is no way to opt in to the old behavior.)
Related questions
Can casting in safe Rust ever lead to a runtime error?
¹ "Fit" here means either NaN, or a number of such large magnitude that it cannot be approximated by the smaller type. 8.7654321_f64 will still be truncated to 8 by an as u8 cast, even though the value cannot be represented exactly by the destination type -- loss of precision does not cause undefined behavior, only being out of range does.
² A "poison" value in LLVM, as you correctly note in the question, but Rust itself does not distinguish between undef and poison values.
Part of your problem is "as" does lossy casts, i.e. you can cast a u8 to a u16 no problem but a u16 can't always fit in a u8, so the u16 is truncated. You can read about the behavior of "as" here.
Rust is designed to be memory safe, which means you can't access memory you shouldn't , or have data races, (unless you use unsafe) but you can still have memory leaks and other undesirable behavior.
What you described is unexpected behavior, but it is still well defined, these are very different things. Two different rust compilers would compile this to code that has the same result, if it was undefined behavior then the compiler implementer could have this compile to whatever they wanted, or not compile at all.
Edit: As pointed out in the comments and other answers, casting a float to an int using as currently causes undefined behavior, this is due to a known bug in the compiler.

Is signed integer overflow in safe Rust in release mode considered as undefined behavior?

Rust treats signed integer overflow differently in debug and release mode. When it happens, Rust panics in debug mode while silently performs two's complement wrapping in release mode.
As far as I know, C/C++ treats signed integer overflow as undefined behavior partly because:
At that time of C's standardization, different underlying architecture of representing signed integers, such as one's complement, might still be in use somewhere. Compilers cannot make assumptions of how overflow is handled in the hardware.
Later compilers thus making assumptions such as the sum of two positive integers must also be positive to generate optimized machine code.
So if Rust compilers do perform the same kind of optimization as C/C++ compilers regarding signed integers, why does The Rustonomicon states:
No matter what, Safe Rust can't cause Undefined Behavior.
Or even if Rust compilers do not perform such optimization, Rust programmers still do not anticipate seeing a signed integer wrapping around. Can't it be called "undefined behavior"?
Q: So if Rust compilers do perform the same kind of optimization as C/C++ compilers regarding signed integers
Rust does not. Because, as you noticed, it cannot perform these optimizations as integer overflows are well defined.
For an addition in release mode, Rust will emit the following LLVM instruction (you can check on Playground):
add i32 %b, %a
On the other hand, clang will emit the following LLVM instruction (you can check via clang -S -emit-llvm add.c):
add nsw i32 %6, %8
The difference is the nsw (no signed wrap) flag. As specified in the LLVM reference about add:
If the sum has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.
Because LLVM integers use a two’s complement representation, this instruction is appropriate for both signed and unsigned integers.
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.
The poison value is what leads to undefined behavior. If the flags are not present, the result is well defined as 2's complement wrapping.
Q: Or even if Rust compilers do not perform such optimization, Rust programmers still do not anticipate seeing a signed integer wrapping around. Can't it be called "undefined behavior"?
"Undefined behavior" as used in this context has a very specific meaning that is different from the intuitive English meaning of the two words. UB here specifically means that the compiler can assume an overflow will never happen and that if an overflow will happen, any program behavior is allowed. That's not what Rust specifies.
However, an integer overflow via the arithmetic operators is considered a bug in Rust. That's because, as you said, it is usually not anticipated. If you intentionally want the wrapping behavior, there are methods such as i32::wrapping_add.
Some additional resources:
RFC 560 specifies everything about integer overflows in Rust. In short: panic in debug mode, 2's complement wrap in release mode.
Myths and Legends about Integer Overflow in Rust. Nice blog post about this topic.

Should I use i32 or i64 on 64bit machine?

main.rs
#![feature(core_intrinsics)]
fn print_type_of<T>(_: &T) {
println!("{}", unsafe { std::intrinsics::type_name::<T>() });
}
fn main() {
let x = 93;
let y = 93.1;
print_type_of(&x);
print_type_of(&y);
}
If I compile with "rustc +nightly ./main.rs", i got this output:
$ ./main
i32
f64
I run a x86_64 Linux machine. Floating point variables are double precision by default, which is good.
Why integers are only 4 bytes? Which should I use? If I don't need i64 should I use i32? Are i32 better for performance?
Are i32 better for performance?
That's actually kind of a subtle thing. If we look up some recent instruction-level benchmarks for example for SkylakeX, there is for the most part a very clear lack of difference between 64bit and 32bit instructions. An exception to that is division, 64bit division is slower than 32bit division, even when dividing the same values (division is one of the few variable-time instructions that depend on the values of its inputs).
Using i64 for data also makes auto-vectorization less effective - this is also one of the rare places where data smaller than 32bit has a use beyond data-size optimization. Of course the data size also matters for the i32 vs i64 question, working with sizable arrays of i64's can easily be slower just because it's bigger, therefore costing more space in the caches and (if applicable) more bandwidth. So if the question is [i32] vs [i64], then it matters.
Even more subtle is the fact that using 64bit operations means that the code will contains more REX prefixes on average, making the code slightly less dense meaning that less of it will fit in the L1 code cache at once. This is a small effect though. Just having some 64bit variables in the code is not a problem.
Despite all that, definitely don't overuse i32, especially in places where you should really have an usize. For example, do not do this:
// don't do this
for i in 0i32 .. data.len() as i32 {
sum += data[i as usize];
}
This causes a large performance regression. Not only is there a pointless sign-extension in the loop now, it also defeats bounds check elimination and auto-vectorization. But of course there is no reason to write code like that in the first place, it's unnatural and harder than doing it right.
The Rust Programming Language says:
[...] integer types default to i32: this type is generally the fastest, even on 64-bit systems.
And (in the next section):
The default type is f64 because on modern CPUs it’s roughly the same speed as f32 but is capable of more precision.
However, this is fairly simplified. What integer type you should use depends a lot on your program. Don't think about speed when initially writing the program, unless you already know that speed will be a problem. In the vast majority of code, speed doesn't matter: even in performance critical applications, most code is cold code. In contrast, correctness always matters.
Also note that only unconstrained numeric variables default to i32/f64. As soon as you use the variable in a context where a specific numeric type is needed, the compiler uses that type.
First of all, you should design your application for your needs/requirements. I.e., if you need „large“ integers, use large types. If you don’t need them, you should use small types.
IF you encounter any performance issues (AND ONLY THEN) should you adjust types to something you may not need by requirement.

Is it expected that a too large bitshift is undefined behavior in Rust?

When you run this code:
#![allow(exceeding_bitshifts)]
fn main() {
const NUMBER: u64 = 0b_10101010;
fn print_shift(i: u32) {
println!("{:b}", NUMBER >> i);
}
print_shift(65);
println!("{:b}", NUMBER >> 65);
}
You see that shifting the bits of a number with a value that exceed the bit length produces different behavior when doing at compile time or runtime.
Is it a normal behavior? Is it documented somewhere? This is not in the list of documented undefined behavior.
No, this is not expected, but it is not undefined behavior. This is "just" a bug.
There should be no difference between how the constant is computed at compile time and how the value is computed at runtime. This is a hard problem in general as the machine performing the compilation and the machine running the code might have completely different architectures.
When talking about debug vs release builds, the behavior of "too large" bitshifts is expected, and is also not undefined behavior. The clue is in the error message:
attempt to shift right with overflow
Integer overflow is neither unsafe nor undefined:
The Rust compiler does not consider the following behaviors unsafe,
though a programmer may (should) find them undesirable, unexpected, or
erroneous.
...
Integer overflow
See also:
How can integer overflow protection be turned off?

Resources