Rcpp: Platform differences in output - linux

i have the following problem (and cannot really produce a minimal test)--
i am porting a package from C++ via Rcpp to R.
the tests (i am testing if the output matrix is exactly what i
would get if calling c++ directly) under linux and osx are absolutely equal, no difference.
but when testing either via build_win() or via a win 8.1 virtual machine i get different results (but the results between both are consistent, so i have linux/osx vs win results)
i already replaced the one rand() call with the corresponding Rcpp sugar, so this should be no problem (i hope at least).
as calling the tests via "R -d valgrind" also produce no error, i am a bit puzzled how to proceed.
all tests are done with R 3.2.0 (local machines) and latest unstable (via build_win())
so my questions are:
are there any known Rcpp differences when compiling (e.g. the compiler provided by Rtools on windows is too old and therefore numeric computations (using STL, no other library like boost/eigen etc) are expected to be slightly different?
is there a good way to debug the problem? i would need to trace basically the C++ code line by line, i am even not sure how to do that except for heavy std::couts.
thanks.

the truth about the 32bit/64bit problem is indeed written up here
different behaviour or sqrt when compiled with 64 or 32 bits
adding the -ffloat-store option did fix my problem.
never expected that, thought the problem is in the source code.

Related

Gnu fortran compiler write option

I use FORTRAN gnu compiler to compile a piece of code written using fortran(.f90). Unlike in other compilers the output of write statement are not displayed in the screen rather written in the output file.
For example I have placed "write(*,*) 'Check it here'" in the middle of the source code so that this message is displayed in the screen when someone runs the compiled version of the code.
I dont understand why this message is not displayed in the terminal window while running the code, but it is written in the output file.
I would appreciate your help to resolve this !!
>
I am compiling these source codes:
https://github.com/firemodels/fds/tree/master/Source
makefile that I am using to compile the code is located here:
https://github.com/firemodels/fds/tree/master/Build/mpi_intel_linux_64
I run the program using a executable that makefile creates
The version of the compiler that I am using is
GNU Fortran (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
>
Thank you.
Way bigger picture: Is there a reason you're building FDS from source rather than downloading binaries directly from NIST i.e. from https://pages.nist.gov/fds-smv/downloads.html ?
Granted, if you're qualifying the code for safety-related use, you may need to compile from source rather than use someone else's binaries. You may need to add specific info to a header page such as code version, date of run, etc. to satisfy QA requirements.
If you're just learning about FDS (practicing fire analysis, learning about CFD, evaluating the code), I'd strongly suggest using NIST's binaries. If you need/want to compile it from source, we'll need more info to diagnose the problem.
That said, operating on the assumption that you have a use case that requires that you build the code, your specific problem seems to be that writing to the default output unit * isn't putting the output where you expect.
Modern Fortran provides the iso_fortran_env module which formalizes a lot of the obscure trivia of Fortran, in this case, default input and output units.
In the module you're editing, look for something like:
use iso_fortran_env
or
use iso_fortran_env, only: output_unit
or
use, intrinsic:: iso_fortran_env, only: STDOUT => output_unit
If you see an import of output_unit or (as in the last case) an alias to it, write to that unit instead of to *.
If you don't an import from iso_fortran_env, add the last line above to the routine or module you're printing from and write to STDOUT instead of *.
That may or may not fix things, depending on if the FDS authors do something strange to redirect IO. They might; I'm not sure how writing to screen works in an MPI environment where the code may run in parallel on a number of networked machines (I'd write to a networked logging system in that case, but that's just me). But in a simple case of a single instance of the code running, writing to output_unit is more precise than writing to * and more portable and legible than writing to 6.
Good luck with FDS; I tried using it briefly to model layer formation from a plume of hydrogen gas in air. FDS brought my poor 8 CPU machine to its knees so I went back to estimating it by hand instead of trying to make CFD work...

Porting duktape, getting duk_create_heap error during JS compilation of builtin initjs

This question might be too detailed for this forum, but I could not find a mailing list for duktape. Maybe this question will be useful for others trying to get duktape running on more obscure hardware.
I am trying to get duktape to work on an old ColdFire CPU, using an OLD gcc compiler (2.95.3). The board has limited resources (flash/RAM) but I seem to have enough of both. I must live with the old compiler.
I believe the duk_config.h is calculating the right options regarding endianness, etc. I am using a number of the duktape options to reduce code and data size. I have successfully used the same configuration on 64 and 32 bit Ubuntu and it works fine.
The "properties string" that is formed and set in duk_hthread_create_builtin_objects() is:
"bb u pnRHSBOL p2 a8 generic linux gcc" which seems correct (not sure of the effect of the "generic" tag for architecture).
I am getting a failure when calling duk_create_heap(). I have isolated the problem to a what I believe is a JS compile error related to duk_initjs. If I undef DUK_USE_BUILTIN_INITJS, initialization works. The error is a syntax error (not sure where yet). By running "strings" on my executable, I can see that the javascript program source string is there. As a side issue, when this error occurs, the longjmp doesn't work (setjmp never called?) so my fatal handler gets called, but I don't care about for now.
I thought it might be my small C stack (as it appears the js compiler uses recursion) but making the stack much larger didn't help.
I am starting to dig into the JS compiler, but this must be an issue with the architecture or my environment. Any suggestions appreciated!
EDIT: I just now noticed a post of a similar issue, and there was a request to repeat with "-DDUK_OPT_DEBUG -DDUK_OPT_DPRINT -DDUK_OPT_ASSERTIONS -DDUK_OPT_SELF_TESTS" I will try to use these options (if possible, I am very close to a relocation limit on my executable).
There was a bug in 1.4.0 release (https://github.com/svaarala/duktape/pull/550) which caused duk_config.h to incorrectly end up with an unpacked value representation even when the architecture supported packed representation. This might be an issue in your case - try adding and explicit -DDUK_OPT_PACKED_TVAL (which forces Duktape to use packed representation) to see if it helps.

Swift on Linux: how to specify compiler optimizations

Several threads on stackoverflow (e.g. this one) discuss the different optimization levels (Onone, O, Ounchecked...) when compiling Swift applications.
However, those postings are related to the development on OSX. It seems that those optimizations can be set directly via Xcode or xcrun (xcrun swift -O3).
I'm wondering how to switch the different optimization levels when using the Swift compiler directly on Linux (Ubuntu 15.10). Currently, I'm building the application just by invoking swift build as it is shown in the docs, but I found no way no change the optimization level.
It is possible to provide the -O, -Onone, and -Ounchecked optimization flags to the Swift compiler, swiftc. However, it appears that there is currently no way to specify additional flags to swift build. See, for example, the following link, even though it is not directly related:
https://bugs.swift.org/browse/SR-397. The same bug report suggests that the team is actively working on adding this missing functionality.
One way that I found to work around the problem is to run swift build -v, find the first command that references -Onone, copy it and all the commands that follow it to a shell script, edit the script to use the desired optimization level instead of -Onone, and run the script. This should re-compile the Swift sources using the desired optimization level and rebuild the executable.
In my testing I found that a simple example involving sorting an array runs a couple of orders of magnitude faster if built using -O or -Ounchecked instead of -Onone.

Why does the compiler change the result of a linear programming solution? (GLPLK/GLPSOL)

Attempting to solve linear programming problems using GLPLK's GLPSOL we've come upon a snag, namely that in very specific cases, the results between glpsol executables created with different compilers are different.
The situation is that we have a problem with several valid solutions. To put it simply, we have a table where each row (X) can be assigned only one column (Y), and viceversa. As such, all combinations that assign unique column/row pairs are valid.
Example, for a 2x2 table, these are valid:
{(X0,Y0),(X1,Y1)} {(X0,Y1),(X1,Y0)}
Now, the original glpsol binary we used under windows, returned the results in order, something like this:
{(X0,Y0),(X1,Y1)...(Xn,Yn)}
We noticed an issue with the Linux binary, in that it returned the solution in a different order, something like this:
{(X0,Y0),(Xn,Y1),(X1,Y2) ....}
Note that the order is not random, every execution follows the same pattern.
After much investigation I discovered that the issue lies in which compiler was used to create each binary. In our example above, the Windows binary was compiled using Visual C++, while the Linux binary used GCC.
I've verified this by recompiling the Windows binary using GCC, resulting in the same pattern. Compiling with Borland results in a different pattern.
So the question is, mainly, why is this happening?
I'm guessing it might be the result of how each compiler optimizes the binary, but I'm not sure, and my objective is to obtain the same results we had with the original executable (the one compiled with Visual C++) both for Windows and Linux. And I am suspecting cross-compiling with the Visual C++ toolchain won't be an option.
Note: I managed to determine the compiler used by each binary by opening them as text and locating text strings within the executable referencing Visual C++ and GNU GCC respectively.
Thanks!
Versions of the solver built with different compilers can take different paths during the optimization process which can result in the behavior you observe. Things that can affect this are: differences in floating-point semantics (possibly caused by -ffast-math), different implementations of sort (qsort is normally not a stable sort) - this is mentioned by Ben Voigt, different implementations of random number generators in standard libraries.
If both solutions are optimal, I wouldn't be too much concerned about this.

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

Resources