I know question like this have been asked before, but there isn't the exact answer I'm searching for.
Today I was writing ACM-ICPC contest with my team. Usually we are using GNU C++ 4.8.1 compilator (which was available on contest). We had written code, which had time limit exceeded on test case 10. At the end of contest, less then 2 minutes remaining, I sent the exactly same submission with Visual C++ 2013 (same source file, different language) it got accepted and worked. There were more than 60 test cases and our code passed them all.
Once more I say that there were no differences between the source codes.
Now I'm just interested why it happened.
Anyone knows what the reason is?
Without knowing the exact compiler options you used, this answer might be a bit difficult to answer. Usually, compilers come with many options and provide some default values which are used as long as the user does not override them. This is also true for code optimization options. Both mentioned compilers are capable to significantly improve the speed of the generated binary when being told so. A wild guess would be that in our case, the optimization settings used by the GNU compiler did not improve the executable performance so much but the VC++ settings did. For example because not any flags were used in one case. Another wild guess would be that one compiler was generating a debug binary and the other did not (check for the option -g with GCC which switches debug symbol generation on).
On the other hand, depending on the program you created, it could of course be that VC++ was simply better in performing the optimization than g++.
If you are interested in easy increasing the performance, have a look at the high-level optimization flags at https://gcc.gnu.org/onlinedocs/gnat_ugn/Optimization-Levels.html or for the full story, at the complete list at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.
More input on comparing compilers:
http://willus.com/ccomp_benchmark2.shtml?p1
http://news.dice.com/2013/11/26/speed-test-2-comparing-c-compilers-on-windows
Related
I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)
I have found some piece of code (function) in library which could be improved by the optimization of compiler (as the main idea - to find good stuff to go deep into compilers). And I want to automate measurement of time execution of this function by script. As it's low-level function in library and get arguments it's difficult to extract this one. Thus I want to find out the way of measurement exactly this function (precise CPU time) without library/application/environment modifications. Have you any ideas how to achieve that?
I could write wrapper but I'll need in near future much more applications for performance testing and I think to write wrapper for every one is very ugly.
P.S.: My code will run on ARM (armv7el) architecture, which has some kind of "Performance Monitor Control" registers. I have learned about "perf" in linux kernel. But don't know is it what I need?
It is not clear if you have access to the source code of the function you want to profile or improve, i.e. if you are able to recompile the considered library.
If you are using a recent GCC (that is 4.6 at least) on a recent Linux system, you could use profilers like gprof (assuming you are able to recompile the library) or better oprofile (which you could use without recompiling), and you could customize GCC for your needs.
Be aware that like any measurements, profiling may alter the observed phenomenon.
If you are considering customizing the GCC compiler for optimization purposes, consider making a GCC plugin, or better yet, a MELT extension, for that purpose (MELT is a high-level domain specific language to extend GCC). You could also customize GCC (with MELT) for your own specific profiling purposes.
Probably my question sounds weird, but my point is: i have to compile a program using GCC, if i compile GCC from the source i will get a slight edge in terms of performances from a software compiled with the fresh new GCC? What I should expect?
You won't get any faster programs out of a compiler built with optimizing flags. Since a program is the compilers' output, and optimizations don't change the output of a correct program, the programs stay the same.
You might, however, profit from new available options if your distributor ships an incomplete compiler. Look through the GCC manual for any options you want to enable (like certain target architecture variants), and if you can't enable them in your current compiler build, there might be potential in a custom-built compiler. However, it is unlikely that it's worth it.
Not unless you're building a newer version of gcc, or enabling cloog, graphite, etc.
the performance difference usually is nothing or is negligible.
in a very rare, really very rare cases you can see noticeable difference, but not always performance improvement. degradation is possible too.
I'm re-writing a build that produces a number of things (shared/static libraries, jars, executables, etc). The question came up whether there's a way to verify that the results are functionally equivalent without doing a full top-to-bottom test of the resulting software.
However, that is proving to be more difficult to do than I anticipated.
As an example, I expected that the md5 of two objects produced from the same source (sun studio C++ compiler) and command-line parameters would have the same md5 hash, but that isn't the case. I can build the file, rename it, build again, and they have different hashes.
With that said ... is there a way do a quick check to verify that two files produced from separate build architectures of the same source tree (eg, two shared objects) are functionally equivalent?
edit I am sorry, I neglected to mention this is for a debug build ... when debugging flags aren't used the binaries are identical, but they've been using debugging flags by default for so many years their stuff breaks when you remove the debugging flags (part of the reason I'm re-writing the build is to take that particular 'feature' out of the build so we can get some proper testing going)
Windows DLLs have a link timestamp (TimeDateStamp) as part of PE image.
Looking at linker options, I don't see an option to suppress that. So re-linking a DLL (or an EXE) will always produce a different binary.
You could write a tool to zero out these timestamps (always at a fixed offset from file start), and compare MD5s afterwards. But you'll likely discover lots of other differences as well. In particular, any program that uses __DATE__ or __TIME__ builtins will give you trouble.
We've had to work quite hard to achieve bit-identical rebuilds (using GNU toolchain). It's possible (at least for open-source tools, on Linux), but not easy (as you've discovered).
I forgot about this question; I'm revisiting so I can give the answer I came up with.
objcopy can be used to produce a new binary file in different formats. It's been a few years since I worked on this, so the specifics escape me, but here's what I recall:
objcopy can strip various things out (debug info, symbol information, etc), but even after stripping stuff out I was still seeing different hashes between objects.
In the end I found I could convert it from ELF to other formats. I ended up dumping it to another format (I think I chose SREC) that consistently provided the same MD5 for objects built at different times with identical source/flags.
I'm betting I could have done this a better way with objcopy (or perhaps another binutils tool), but it was good enough to satisfy our concerns.
I have automatic generated code (around 18,000 lines, basically a wrap of data) and other about 2,000 lines code in a C++ project. The project turned on the link-time-optimization operation. /O2 and fast-code optimization. To compile the code, VC++ 2008 express takes incredibly long time (around 1.5 hours). After all, it has only 18,000 lines, why the compiler takes so much time?
a little explanation to the 18,000 code. It is plain C, not even C++ which includes many unpacked for-loop, a sample would be:
a[0].a1 = 0.1284;
a[0].a2 = 0.32186;
a[0].a3 = 0.48305;
a[1].a1 = 0.543;
..................
It basically fill a complex struct. But not so complex to compiler I guess.
The Debug mode is fast, only the Relase mode has this issue. Before I have the 18,000 lines of code, they are all fine. (that time the data is in external location). However, the release mode does many work which reduce the size of exe from 1,800kb to 700kb.
this issue does happen in link stage because all .obj files are generated. I have suspect on link-time-code-generation too but cannot figure out where is wrong.
Several factors influence link time, including but not limited to:
Computer speed, especially available memory
Libraries included in the build.
Programming paradigm - are you using boost by any chance?
18,000 lines of template metaprogramming compiling on even a new quad-core and 1.5 hours of linking wouldn't completely surprise me.
Historically, a common cause of slow C++ computation is excessive header file inclusion, usually a result of poor modularization. You can get a lot of redundant compilation by including the same big headers in lots of small source files. The usual reference in these cases is Lakos.
You don't state whether you are using the pre-compiled header, which is the quick and dirty substitute for a header file refactoring.
That's why we generate lots of DLLs for our debug builds, but generally link them in for our release builds. It's easier (for our particular purposes) to deal with more monolithic executables, but it takes a long time to link.
As said in one of the comments, you probably have Link Time Code Generation (/LTCG) enabled, which moves the majority of code generation and optimization to the link stage.
This enables some amazing optimizations, but also increases link times significantly.
The C++ team says the've significantly optimized the linker for VS 2010.