I'm aware that single-stepping through code in release build can cause the arrow indicating the current code execution point to skip around to some (at least superficially) weird and misleading places. My question is: is there anything predictable and intelligible going on that one can read about, and which might help solve issues that occur in release build only, but not in debug build?
Concrete example I'm trying to get to the bottom of: (works in debug, not in release)
void Function1( void )
{
if ( someGlobalCondition )
Function2( 10 );
else
Function2();
}
void Function2( const int parameter = 1 ) // see note below
{
DoTheActualWork(); // any code at all
}
// and finally lets call Function1() from somewhere...
Function1();
Note: for brevity I've skipped the fact that both functions are declared in header files, and implemented separately in .cpp files. So the default parameter notation in Function2() has had some liberties taken with it.
OK - so in Debug, this works just fine. In Release, even though there is a clear dependence on someGlobalCondition, the code pointer always skips completely over the body of Function1() and executes Function2() directly, but always uses the default parameter (and never 10). This kinda suggests that Function1() is being optimised away at compile time... but is there any basis for drawing such conclusions? Is there any way to know for certain if the release build has actually checked someGlobalCondition?
P.S. (1) No this is not an XY question. I'm giving the context of Y so I can make question X make more sense. (2) No I will not post my actual code, because that would emphasise the Y question, which has extraordinarily low value to anyone but me, whereas the X question is something that has bugged me (and possibly others) for years.
Rather than seek documentation to explain the apparently inexplicable, switch to the Disassembly view when you single step. You will get to see exactly how the compiler has optimised your code and how it has taken account of everything... because it will have taken account of everything.
If you see no evidence of run-time tests for conditions (e.g. jne or test or cmp etc) then you can safely assume that the compiler has determined your condition(s) to be constant, and you can investigate why that is. If you see evidence of conditions being tested, but never being satisfied, then that will point you in another direction.
Also, if you feel the benefits of optimisation don't outweigh the costs of unintelligible code execution point behaviour, then you can always turn optimisation off.
Well, in the absence of your actual code, all we can surmise is that the compiler is figuring out that someGlobalCondition is never true.
That's the only circumstance in which it could correctly optimise out Function1 and always call Function2 directly with a 1 parameter.
If you want to be certain that this optimisation is happening, your best bet is to analyse either the assembler code or machine code generated by the compiler.
Related
In some languages, optimization is allowed to change the program execution result. For example,
C++11 has the concept of "copy-elision" which allows the optimizer to ignore the copy constructor (and its side-effects) in some circumstances.
Swift has the concept of "imprecise lifetimes" which allows the optimizer to release objects at any time after last usage before the end of lexical scope.
In both cases, optimizations are not guaranteed to happen, therefore the program execution result can be significantly different based on the optimizer implementations (e.g. debug vs. release build)
Copying can be skipped, object can die while a reference is alive. The only way to deal with these behaviors is by being defensive and making your program work correctly regardless if the optimizations happen or not. If you don't know about the existence of this behavior, it's impossible to write correct programs with the tools.
This is different from "random operations" which are written by the programmer to produce random results intentionally. These behaviors are (1) done by optimizer and (2) can randomize execution result regardless of programmer intention. This is done by the language designer's intention for better performance. A sort of trade-off between performance and predictability.
Does Rust have (or consider) any of this kind of behavior? Any optimization that is allowed to change program execution result for better performance. If it has any, what is the behavior and why is it allowed?
I know the term "execution result" could be vague, but I don't know a proper term for this. I'm sorry for that.
I'd like to collect every potential case here, so everyone can be aware of them and be prepared for them. Please post any case as an answer (or comment) if you think your case produces different results.
I think all arguable cases are worth to mention. Because someone can be helped a lot by reading the case details.
If you restrict yourself to safe Rust code, the optimizer shouldn't change the program result. Of course there are some optimizations that can be observable due to their very nature. For example removing unused variables can mean your code overflows the stack without optimizations, while everything will fit on the stack when compiled with optimizations. Or your code may just be too slow to ever finish when compiled without optimizations, which is also an observable difference. And with unsafe code triggering undefined behaviour anything can happen, including the optimizer changing the outcome of your code.
There are, however, a few cases where program execution can change depending on whether you are compiling in debug mode or in release mode:
Integer overflow will result in a panic in debug build, while integers wrap around according to the two's complement representation in release mode – see RFC 650 for details. This behaviour can be controlled with the -C overflow-checks codegen option, so you can disable overflow checks in debug mode or enable them in release mode if you want to.
The debug_assert!() macro defines assertions that are only executed in debug mode. There's again a manual override using the -C debug-assertions codegen option.
Your code can check whether debug assertions are enabled using the debug-assertions configuration option
These are all related to debug assertions in some way, but this list is not exhaustive. You can probably also inspect the environment to determine whether the code is compiled in debug or release mode, and change the behaviour based on this.
None of these examples really fall into the same category as your examples in the original question. Safe Rust code should generally behave the same regardless of whether you compile in debug mode or release mode.
There are far fewer foot-guns in Rust when compared to C++. In general, they revolve around unsafe, raw pointers and lifetimes derived from them or any form of undefined behavior, which is really undefined in Rust as well. However, if your code compiles (and, if in doubt, passes cargo miri test), you most likely won't see surprising behavior.
Two examples that come to mind which can be surprising:
The lifetime of a MutexGuard; the example comes from the book:
while let Ok(job) = receiver.lock().unwrap().recv() {
job();
}
One might think/hope that the Mutex on the receiver is released once a job has been acquired and job() executes while other threads can receive jobs. However, due to the way value-expressions in place-expressions contexts work in conjunction with temporary lifetimes (the MutexGuard needs an anonymous lifetime referencing receiver), the MutexGuard is held for the entirety of the while-block. This means only one thread will ever execute jobs.
If you do
loop {
let job = receiver.lock().unwrap().recv().unwrap();
job();
}
this will allow multiple threads to run in parallel. It's not obvious why this is.
Multiple times there have been questions regarding const. There is no guarantee by the compiler if a const actually exists only once (as an optimization) or is instantiated wherever it is used. The second case is the way one should think about const, there is no guarantee that this is what the compiler does, though. So this can happen:
const EXAMPLE: Option<i32> = Some(42);
fn main() {
assert_eq!(EXAMPLE.take(), Some(42));
assert_eq!(EXAMPLE, Some(42)); // Where did this come from?
}
I am interested in how the following is propagated:
void foo(int __attribute__((aligned(16)))* p) { ... }
In this case the “alignedness” of the pointer is available at the MC level, but it is evidently not using the LLVM-IR metadata approach to achieve this. The alignment information is very important to some targets which will change code-generation dependent on this value, and I think that what I need is more like this attribute.
How difficult would it be to add a new attribute such that it propagates through the compiler in the same way as ‘aligned’? So, I already added a new element to the LLVM-IR to do this. I also expect that the hardest part would be making other parts of LLVM ignore this new element when they don’t care about it.
It really is a pity that LLVM does not have a generic target independent way of passing target dependent information from parser to back-end.
Using the ‘DebugLoc’ approach was suggested in a similar question, but I think it’s a bit-of-a-hack since this is not related to debugging. But if the implementation is less difficult this way, then the hack might be acceptable.
UPDATE:
Would inline assembly instead of the use of a new attribute work here? If yes, what are the pros/cons?
As you have demonstrated, alignment is not using metadata.
To anyone who doesn't know: alignment is mentioned (implicitly or explicitly) in all relevant instructions, so for example that function in the question will be compiled to something like this (notice the aligns):
define void #foo(i32*) {
%2 = alloca i32*, align 16 ; Allocate a 16-aligned pointer
store i32* %0, i32** %2, align 16 ; An aligned store to place the arg there
...
Now, if you want to attach some information to existing instructions and have most of the rest of the compiler ignore them, using metadata is a good idea. However, since metadata is a compiler-internal abstract thing, at some point you'll have to actually do something with it. Typically, by adding a pass of your own to consume it and do something accordingly.
As for where to place your pass and how to implement it, it really depends on the actual information you're trying to pass and its intended effect.
I've been wanting to make a block type game for a while now but have never understood how to actually make one. I have googled forever and there is not much and what is there comes with a stipulation that I am not wanting to bother with (gpl license, entire code base, AND the license in any project, bleh). So I took to a forums with my problem. I did not know it, but I was trying to make a Puyo Puyo type game. With blocks dropping from the ceiling and then clearing if there's a match of 3 or more. I had no idea on how to do the matching. Which is what I wanted to know. A very nice, charming, and intelligent fellow provided me with this:
http://hastebin.com/ziyejejoxu.js
Granted, that's quite a lot, but the way he managed to code it allowed me to somewhat grasp it. However, there is a single infuriating problem. One, exactly ONE, line of code does not compile and breaks. I asked him if I could email him about it and he said okay. I haven't go a response yet so I may not be getting one so I'm taking this here. Here is how I am using the code so far. There are two parts, the play state, and the puzzle piece:
http://pastebin.com/SvMR9mMb
The program breaks in the playstate, giving this error:
source/PlayState.hx:291: characters 33-52 : Array access is not allowed on x : Int -> Int
What I have tried:
I had assumed that it was not allowed because the puzzle piece x is a float, and of course, you can't push a float into an int array. So what I did was simply in the puzzle piece first, convert the the float to an int. That did not work. THEN in the state, I switched the float to an int. That did not work. As an exercise, I attempted to convert a Flixel game to HaxeFlixel to see if I could learn anything. I probably did it wrong and did not.
So the question is: Why does that line not compile and what do I need to do to make it compile or to achieve it's intended purpose?
The syntax is wrong. push is a function, and function calls use (). [] is for array access (hence the error message).
This should work:
if (this_piece_is_in_a_match) matched_pieces.push(_i);
If you look at the call stack of a program and treat each return pointer as a token, what kind of automata is needed to build a recognizer for the valid states of the program?
As a corollary, what kind of automata is needed to build a recognizer for a specific bug state?
(Note: I'm only looking at the info that could be had from this function.)
My thought is that if these form regular languages than some interesting tools could be built around that. E.g. given a set of crash/failure dumps, automatically group them and generate a recognizer to identify new instances of know bugs.
Note: I'm not suggesting this as a diagnostic tool but as a data management tool for turning a pile of crash reports into something more useful.
"These 54 crashes seem related, as do those 42."
"These new crashes seem unrelated to anything before date X."
etc.
It would seem that I've not been clear about what I'm thinking of accomplishing, so here's an example:
Say you have a program that has three bugs in it.
Two bugs that cause invalid args to be passed to a single function tripping the same sanity check.
A function that if given a (valid) corner case goes into an infinite recursion.
Also as that when the program crashes (failed assert, uncaught exception, seg-V, stack overflow, etc.) it grabs a stack trace, extracts the call sites on it and ships them to a QA reporting server. (I'm assuming that only that information is extracted because 1, it's easy to get with a one time per project cost and 2, it has a simple, definite meaning that can be used without any special knowledge about the program)
What I'm proposing would be a tool that would attempt to classify incoming reports as connected to one of the known bugs (or as a new bug).
The simplest thing would be to assume that one failure site is one bug, but in the first example, two bugs get detected in the same place. The next easiest thing would be to require the entire stack to match, but again, this doesn't work in cases like the second example where you have multiple pieces of (valid) valid code that can trip the same bug.
The return pointer on the stack is just a pointer to memory. In theory if you look at the call stack of a program that just makes one function call, the return pointer (for that one function) can have different value for every execution of the program. How would you analyze that?
In theory you could read through a core dump using a map file. But doing so is extremely platform and compiler specific. You would not be able to create a general tool for doing this with any program. Read your compiler's documentation to see if it includes any tools for doing postmortem analysis.
If your program is decorated with assert statements, then each assert statement defines a valid state. The program statements between the assertions define the valid state changes.
A program that crashes has violated enough assertions that something broken.
A program that's incorrect but "flaky" has violated at least one assertion but hasn't failed.
It's not at all clear what you're looking for. The valid states are -- sometimes -- hard to define but -- usually -- easy to represent as simple assert statements.
Since a crashed program has violated one or more assertions, a program with explicit, executable assertions, doesn't need an crash debugging. It will simply fail an assert statement and die visibly.
If you don't want to put in assert statements then it's essentially impossible to know what state should have been true and which (never-actually-stated) assertion was violated.
Unwinding the call stack to work out the position and the nesting is trivial. But it's not clear what that shows. It tells you what broke, but not what other things lead to the breakage. That would require guessing what assertions where supposed to have been true, which requires deep knowledge of the design.
Edit.
"seem related" and "seem unrelated" are undefinable without recourse to the actual design of the actual application and the actual assertions that should be true in each stack frame.
If you don't know the assertions that should be true, all you have is a random puddle of variables. What can you claim about "related" given a random pile of values?
Crash 1: a = 2, b = 3, c = 4
Crash 2: a = 3, b = 4, c = 5
Related? Unrelated? How can you classify these without knowing everything about the code? If you know everything about the code, you can formulate standard assert-statement conditions that should have been true. And then you know what the actual crash is.
O Groovy Gurus,
This code snippet runs in around 1 second
for (int i in (1..10000000)) {
j = i;
}
while this one takes almost 9 second
for (int i = 1; i < 10000000; i++) {
j = i;
}
Why is it so?
Ok. Here is my take on why?
If you convert both scripts to bytecode, you will notice that
ForInLoop uses Range. Iterator is used to advance during each loop. Comparison (<) is made directly to int (or Integer) to determine whether the exit condition has been met or not
ForLoop uses traditional increment, check condition, and perform action. For checking condition i < 10000000 it uses Groovy's ScriptBytecodeAdapter.compareLessThan. If you dig deep into that method's code, you will find both sides of comparison is taken in as Object and there are so many things going on, casting, comparing them as object, etc.
ScriptBytecodeAdapter.compareLessThan --> ScriptBytecodeAdapter.compareTo --> DefaultTypeTransformation.compareTo
There are other classes in typehandling package which implements compareTo method specifically for math data types, not sure why they are not being used, (if they are not being used)
I am suspecting that is the reason second loop is taking longer.
Again, please correct me if I am wrong or missing something...
In your testing, be sure to "warm" the JVM up before taking the measure, otherwise you may wind up triggering various startup actions in the platform (class loading, JIT compilation). Run your tests many times in a row too. Also, if you did the second test while a garbage collect was going on, that might have an impact. Try running each of your tests 100 times and print out the times after each test, and see what that tells you.
If you can eliminate potential artifacts from startup time as Jim suggests, then I'd hazard a guess that the Java-style for loop in Groovy is not so well implemented as the original Groovy-style for loop. It was only added as of v1.5 after user requests, so perhaps its implementation was a bit of an afterthought.
Have you taken a look at the bytecode generated for your two examples to see if there are any differences? There was a discussion about Groovy performance here in which one of the comments (from one 'johnchase') says this:
I wonder if the difference you saw related to how Groovy uses numbers (primitives) - since it wraps all primitives in their equivalent Java wrapper classes (int -> Integer), I’d imagine that would slow things down quite a bit. I’d be interested in seeing the performance of Java code that loops 10,000,000 using the wrapper classes instead of ints.
So perhaps the original Groovy for loop does not suffer from this? Just speculation on my part really though.