I am working on a quite large code base and some snippets I added are causing weird memory behaviour so I considered using the -fsanitize=address option for clang++.
However it seems I cannot make it compile the whole library, because of a whole bunch of errors caused by the linker.
I am not really aware of the inner workings of the AddressSanitizer, but is it generally possible to apply it only to a small portion of a library's code base (let's say to a single header) and if so how?
Related
Many discussions like this and this have warned us with examples that trying to dlopen a PIE could never be correct. The reasons are various: copy relocations, TLS, etc.
However, these problems can be circumvented if we loose the restriction. This question showed us compiling with fPIC can eliminate copy relocation, and TLS seems to work alright.
This brings up the question about how far we are from correctly dynamic loading a PIE. I agree with the idea again in link 1:
Bottom line: this was never designed to work, and you just happened to not step on many of the land-mines, so you thought it is working, when in fact you were exercising undefined behavior.
But I'm more interesting about WHY we could not do that, instead of another failing example.
More specifically, users could write their own runtime dynamic linker as this comment suggest, which could make some strong assumptions or compromises just for this purpose. Yet this requires extremely broad knowledge on compiling, linking and loading, some of which are known to be poorly documented.
So again, how do users correctly dynamic load PIEs, or at least how can they try to find a way to do that(or not to do that)?
But I'm more interesting about WHY we could not do that, instead of another failing example.
Because the designers of GLIBC didn't intend to allow for this to happen and don't consider this to be a valid use case.
More specifically, users could write their own runtime dynamic linker
Absolutely. You are free to design your own libc and the dynamic loader to allow for this use case. That requirement will add some complexity, but there is no fundamental reason it can't be done.
You may also find an existing alternate libc implementation which doesn't have this restriction (either because it has been designed in, or because the designers forgot to enforce it, as was the case with GLIBC before this patch).
how do users correctly dynamic load PIEs
They don't.
how can they try to find a way to do that(or not to do that)?
The usual solution is to "not do that", and in fact the need to "do that" seems to be very esoteric.
Why do you need to dlopen a PIE executable in the first place?
I was searching for writing an emulator, and its techniques. But following paragraph made me wondered, I think I couldn't figure out which scenario can be present, if you write a self-modifying code to be static-recompilation emulated.
In this technique, you take a program written in the emulated code and attempt to translate it into the assembly code of your computer. The result will be a usual executable file which you can run on your computer without any special tools. While static recompilation sounds very nice, it is not always possible. For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it. To avoid such situations, you may try combining static recompiler with an interpreter or a dynamic recompiler.
Here is what I was reading, and this line made me wondered:
For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it
A good explanation with examples will be so instructive, thanks.
Edit: By the way, I know the meaning of self-modifying, I just wonder what problems and where will we get problems after statically-recompilation, which thing will make our self-modifying code broken.
Self-modifying code heavily relies on the instruction set encoding of the original CPU. For example, it could flip some bits in a specific memory location to turn one instruction into another. With static recompilation, flipping those same bits will have an entirely different effect since the instructions are encoded completely differently for the host CPU.
Perhaps it's just better to describe my problem.
I'm developing a Haskell library. But part of the library is written in C, and another part actually in raw LLVM. To actually get GHC to spit out the code I want I have to follow this process:
Run ghc -emit-llvm on both the code that uses the Haskell module and the "Main" module.
Run clang -emit-llvm on the C file
Now I've got three .ll files from above. I add the part of the library I've handwritten in raw LLVM and llvm-link these into one .ll file.
I then run LLVM's opt on the linked file.
Lastly, I feed the LLVM bitcode fileback into GHC (which pleasantly accepts it) and produces an executable.
This process (with appropriate optimisation settings of course) seems to be the only way I can inline code from C, removing the function call overhead. Since many of these C functions are very small this is significant.
Anyway, I want to be able to distribute the library and for users to be able to use it as painlessly as possible, whilst still gaining the optimisations from the process above. I understand it's going to be a bit more of a pain than an ordinary library (for example, you're forced to compile via LLVM) but as painlessly as possible is what I'm looking for advice for.
Any guidance will be appreciated, I don't expect a step by step answer because I think it will be complex, but just some ideas would be helpful.
I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)
I tried fay-jquery and the included sample test.hs file results in whooping 150 kb of js file.
Even with closure compiling it is still 20 kb.
I understand that it must carry a runtime, stdlib and jquery wrappers with it.
I can tell fay not to generate stdlib (--no-stdlib and --no-builtins flags).
But i do not know how to tell it not to include jquery code.
So my question is, how can i split those static parts into a separate js file and only generate module specific code?
This way large parts of code will be loaded only once (and cached) and i can create many smaller js files for separate web pages.
Yes it's safe to split modules up, as of Fay 0.16 all modules can exist standalone (before that you could still have the runtime and fay-base separate). There are some flags for this, --print-runtime and --no-stdlib. Compile with optimizations (-O, this increases the output size, but closure will be able to minimize it even better).
Also remember that the web server should gzip this. That brings the code size down to 4.5kiB. That's pretty decent, right?
You might want to consider putting all of your javascript in one file, that means a slower initial load but then users will have it cached for future page loads.
The reason the file size is so big is that fay-jquery has a lot of FFI bindings which produce a lot of transcoding information. I think fay-jquery could be optimized a lot here to for instance use Ptr JQuery rather than just JQuery in the types, or by figuring out that a lot of this is unnecessary while compiling, or abstracting the conversions more in the compiler's output.
Another possible issue I realized a couple of days ago is that the output is now in the global scope rather than in a closure, which might mean that google closure can't remove redundant code as well as previously (haven't had time to investigate this yet). The modules generation should perhaps be changed to produce a closure for each module.
Also see Reducing output size on the wiki.