Copying (generational) garbage collection offers the best performance of any form of automatic memory management, but requires pointers to relocated chunks of data be fixed up. This is enabled, in languages which support this memory management technique, by disallowing pointer arithmetic and making sure all pointers are to the beginning of identifiable objects.
If you're generating code at run time with a JIT compiler, things look a bit trickier because return addresses on the call stack will point to, not the beginning of code blocks, but random locations within them, so fixup is a problem.
How is this typically solved?
Quite often, you don't relocate code. This is both because it is indeed complicated to fix the stack and other addresses (think jumps across code fragments), and because you don't actually need garbage collection for such code (as it is only manipulated by code you write anyway, so you can do manual memory management). You also don't expect to create a whole lot of machine code (compared to application objects), so fragmentation etc. is not a concern.
If you insist on moving machine code and fixing up the stack, there is a way, I think: Similar to Mark-Compact, build a "break table" (I have no idea where this name comes from; "relocation table" might be clearer) that tells you the amount by which you should adjust pointers to moved objects. Now, walk the stack for return addresses (highly platform-specific, of course) and fix them if they refer to relocated code. Instead of looking for exact matches, search for the highest address lower than the return address you're currently replacing. You can check that this address indeed refers to some machine code that moved by looking at the object size (you have a pointer to the start of the object, after all). This approach isn't feasible for all objects, for various reasons.
There are other reasons to do something similar though. Some JIT compilers feature on-stack replacement, which means creating a new version (e.g. more optimized, or less optimized) of some machine code and replacing all occurrences of the old version with it. This is far more complicated than just fixing the return addresses though. You have to ensure the new version logically continue where the old one was left hanging. I am not familiar with how this is implemented, so I will not go into detail.
Related
I understand that this question verges into implementation specific domains, but at this point, Rakudo/MoarVM specific answers would help me too.
I am working on some NativeCall modules, and wondering how to debug memory leaks. Some memory is handled in the C library, and I have a good handle over there. I know that domain is my responsibility and there is nothing that MoarVM can do over there. What can I do in the MoarVM domain? what is the best way to check for dangling objects, circular references, etc.?
Is there a way at the end of a series of operations, where I think all of my Perl objects are out of scope to say "Run Garbage Collection and tell me about anything left"?
Is there some Rakudo/NQP/MoarVM specific code I can run to help me? This isn't to release in production, just for testing/diagnostics while I am developing.
Garbage Collection in MoarVM gives a tantalizing overview, but not enough information for me to do anything with it.
Firstly, while leaked memory on the C-side isn't your problem in this case, it's worth knowing that Rakudo installs a perl6-valgrind-m that runs the program under valgrind. I've used this a number of times to figure out segfaults and leaks when writing native library bindings.
For looking into objects managed by MoarVM, it's possible to get the VM to dump heap snapshots. They are taken after each GC run, and an extra GC run is forced and a final snapshot taken at the end of the program. To record snapshots, run with --profile=heap. The output file can then be fed to moar-ha, which can be installed using zef install App::MoarVM::HeapAnalyzer (it's implemented in Perl 6, which may be worth knowing should you wish to extend it in some way to help you solve you problems).
If you have any idea of what kind of objects might be leaking, then it can be useful to search for objects of that type with the find command. There is then a path command that shows how that object is being kept alive. It can also be useful to look at counts of objects between different heap snapshots, to see what is growing in use. Unfortunately there's not yet a snapshot diff feature.
One thing to note is that the snapshots include everything that runs atop of the VM. That means the Perl 6 compiler will be in memory, as well as a bunch of objects for things from the language built-ins. (The tool was developed to help track down managed leaks in the compiler and built-ins, so this is considered a feature. :-) Some kind of filtering may be feasible in the future, however.)
Finally, you mentioned circular references. These are not a problem in Perl 6, since GC is done through tracing, not reference counting.
I am working on an obfuscated binary as a part of a crackme challenge. It has got a sequence of push, pop and nop instructions (which repeats for thousands of times). Functionally, these chunks do not have any effect on the program. But, they make generation of CFGs and the process of reversing, very hard.
There are solutions on how to change the instructions to nop so that I can remove them. But in my case, I would like to completely strip off those instructions, so that I can get a better view of the CFG. If instructions are stripped off, I understand that the memory offsets must be modified too. As far as I could see, there were no tools available to achieve this directly.
I am using IDA Pro evaluation version. I am open to solutions using other reverse engineering frameworks too. It is preferable, if it is scriptable.
I went through a similar question but, the proposed solution is not applicable in my case.
I would like to completely strip off those instructions ... I understand that the memory offsets must be modified too ...
In general, this is practically impossible:
If the binary exports any dynamic symbols, you would have to update the .dynsym (these are probably the offsets you are thinking of).
You would have to find every statically-assigned function pointer, and update it with the new address, but there is no effective way to find such pointers.
Computed GOTOs and switch statements create function pointer tables even when none are present in the program source.
As Peter Cordes pointed out, it's possible to write programs that use delta between two assembly labels, and use such deltas (small immediate values directly encoded into instructions) to control program flow.
It's possible that your target program is free from all of the above complications, but spending much effort on a technique that only works for that one program seems wasteful.
In languages with automatic garbage collection like Haskell or Go, how can the garbage collector find out which values stored on the stack are pointers to memory and which are just numbers? If the garbage collector just scans the stack and assumes all addresses to be references to objects, a lot of objects might get incorrectly marked as reachable.
Obviously, one could add a value to the top of each stack frame that described how many of the next values are pointers, but wouldn't that cost a lot of performance?
How is it done in reality?
Some collectors assume everything on the stack is a potential pointer (like Boehm GC). This turns out to be not as bad as one might expect, but is clearly suboptimal. More often in managed languages, some extra tagging information is left with the stack to help the collector figure out where the pointers are.
Remember that in most compiled languages, the layout of a stack frame is the same every time you enter a function, therefore it is not that hard to ensure that you tag your data in the right way.
The "bitmap" approach is one way of doing this. Each bit of the bitmap corresponds to one word on the stack. If the bit is a 1 then the location on the stack is a pointer, and if it is a 0 then the location is just a number from the point of view of the collector (or something along those lines). The exceptionally well written GHC runtime and calling conventions use a one word layout for most functions, such that a few bits communicate the size of the stack frame, with the rest serving as the bitmap. Larger stack frames need a multi word structure, but the idea is the same.
The point is that the overhead is low, since the layout information is computed at compile time, and then included in the stack every time a function is called.
An even simpler approach is "pointer first", where all the pointers are located at the beginning of the stack. You only need to include a length prior to the pointers, or a special "end" word after them, to tell which words are pointers given this layout.
Interestingly, trying to get this management information on to the stack produces a host of problem related to interop with C. For example, it is sub optimal to compile high level languages to C, since even though C is portable, it is hard to carry this kind of information. Optimizing compilers designed for C like languages (GCC,LLVM) may restructure the stack frame, producing problems, so the GHC LLVM backend uses its own "stack" rather than the LLVM stack which costs it some optimizations. Similarly, the boundary between C code, and "managed" code needs to be constructed carefully to keep from confusing the GC.
For this reason, when you create a new thread on the JVM you actually create two stacks (one for Java, one for C).
The Haskell stack uses a single word of memory in each stack frame describing (with a bitmap) which of the values in that stack frame are pointers and which are not. For details, see the "Layout of the stack" article and the "Bitmap layout" article from the GHC Commentary.
To be fair, a single word of memory really isn't much cost, all things considered. You can think of it as just adding a single variable to each method; that's not all that bad.
There exist GCs that assume that every bit pattern that is the address of something the GC is managing is in fact a pointer (and so don't release the something). This can actually work pretty well, because calls pointers are usually bigger than small common integers, and usually have to be aligned. But yes, this can cause collection of some objects to be delayed. The Boehm collector for C works this way, because it's library-based and so don't get any specific help from the compiler.
There are also GCs that are more tightly coupled to the language they're used in, and actually know the structure of the objects in memory. I've never read up specifically in stack frame handling, but you could record information to help the GC if the compiler and GC are designed to work together. One trick would be putting all the pointer references together and using one word per stack frame to record how many there are, which is not such a huge overhead. If you can work out what function corresponds to each stack frame without adding a word saying so, then you could have a per-function "stack frame layout map" compiled in. Another option would be to use tagged words, where you set the low order bit of words that are not pointers to 1, which (due to address alignment) is never needed for pointers, so you can tell them apart. That means you have to shift unboxed values in order to use them though.
It's important to realize that GHC maintains its own stack and does not use the C stack (other than for FFI calls). There's no portable way to access all of the contents of the C stack (for instance, in a SPARC some of it is hidden away in register windows), so GHC maintains a stack where it has full control. Once you maintain your own stack you can pick any scheme to distinguish pointers from non-pointers on the stack (like a using a bitmap).
When studying Java I learned that Strings were not safe for storing passwords, since you can't manually clear the memory associated with them (you can't be sure they will eventually be gc'ed, interned strings may never be, and even after gc you can't be sure the physical memory contents were really wiped). Instead, I were to use char arrays, so I can zero-out them after use. I've tried to search for similar practices in other languages and platforms, but so far I couldn't find the relevant info (usually all I see are code examples of passwords stored in strings with no mention of any security issue).
I'm particularly interested in the situation with browsers. I use jQuery a lot, and my usual approach is just the set the value of a password field to an empty string and forget about it:
$(myPasswordField).val("");
But I'm not 100% convinced it is enough. I also have no idea whether or not the strings used for intermediate access are safe (for instance, when I use $.ajax to send the password to the server). As for other languages, usually I see no mention of this issue (another language I'm interested in particular is Python).
I know questions attempting to build lists are controversial, but since this deals with a common security issue that is largely overlooked, IMHO it's worth it. If I'm mistaken, I'd be happy to know just from JavaScript (in browsers) and Python then. I was also unsure whether to ask here, at security.SE or at programmers.SE, but since it involves the actual code to safely perform the task (not a conceptual question) I believe this site is the best option.
Note: in low-level languages, or languages that unambiguously support characters as primitive types, the answer should be obvious (Edit: not really obvious, as #Gabe showed in his answer below). I'm asking for those high level languages in which "everything is an object" or something like that, and also for those that perform automatic string interning behind the scenes (so you may create a security hole without realizing it, even if you're reasonably careful).
Update: according to an answer in a related question, even using char[] in Java is not guaranteed to be bulletproof (or .NET SecureString, for that matter), since the gc might move the array around so its contents might stick in the memory even after clearing (SecureString at least sticks in the same RAM address, guaranteeing clearing, but its consumers/producers might still leave traces).
I guess #NiklasB. is right, even though the vulnerability exists, the likelyhood of an exploit is low and the difficulty to prevent it is high, that might be the reason this issue is mostly ignored. I wish I could find at least some reference of this problem concerning browsers, but googling for it has been fruitless so far (does this scenario at least have a name?).
The .NET solution to this is SecureString.
A SecureString object is similar to a String object in that it has a text value. However, the value of a SecureString object is automatically encrypted, can be modified until your application marks it as read-only, and can be deleted from computer memory by either your application or the .NET Framework garbage collector.
Note that even for low-level languages like C, the answer isn't as obvious as it seems. Modern compilers can determine that you are writing to the string (zeroing it out) but never reading the values you read out, and just optimize away the zeroing. In order to prevent optimizing away the security, Windows provides SecureZeroMemory.
For Python, there's no way to do that that, according to this answer. A possibility would be using lists of characters (as length-1 strings or maybe code units as integers) instead of strings, so you can overwrite that list after use, but that would require every code that touches it to support this format (if even a single one of them creates a string with its contents, it's over).
There is also a mention to a method using ctypes, but the link is broken, so I'm unaware of its contents. This other answer also refers to it, but there's not a lot of detail.
I've heard the theory. Address Space Location Randomization takes libraries and loads them at randomized locations in the virtual address space, so that in case a hacker finds a hole in your program, he doesn't have a pre-known address to execute a return-to-libc attack against, for example. But after thinking about it for a few seconds, it doesn't make any sense as a defensive measure.
Let's say that our hypothetical TargetLib (libc or anything else the hacker is looking for) is loaded at a randomized address instead of a deterministic one. Now the hacker doesn't know ahead of time where TargetLib and the routines inside it are, but neither does the application code. It needs to have some sort of lookup table somewhere in the binary in order to find the routines inside of TargetLib, and that has to be at a deterministic location. (Or at a random location, pointed to by something else. You can add as many indirections as you want, but eventually you have to start at a known location.)
This means that instead of pointing his attack code at the known location of TargetLib, all the hacker needs to do is point his attack code at the application's lookup table's entry for TargetLib and dereference the pointer to the target routine, and the attack proceeds unimpeded.
Is there something about the way ASLR works that I don't understand? Because as described, I don't see how it's anything more than a speed bump, providing the image of security but no actual substance. Am I missing something?
I believe that this is effective because it changes the base address of the shared library. Recall that imported functions from a shared library are patched into your executable image when it is loaded, and therefore there is no table per se, just specific addresses pointing at data and code scattered throughout the program's code.
It raises the bar for an effective attack because it makes a simple buffer overrun (where the return address on the stack can be set) into one where the overrun must contain the code to determine the correct location and then jmp to it. Presumably this just makes it harder.
Virtually all DLLs in Windows are compiled for a base address that they will likely not run at and will be moved anyway, but the core Windows ones tend to have their base address optimized so that the relocation is not needed.
I don't know if get you question correctly but I'll explain when ASLR is effective and when not.
Let's say that we have app.exe and TargetLib.dll.
app.exe is using(linked to) TargetLib.dll.
To make the explanation simple, let's assume that the virtual address space only has these 2 modules.
If both are ALSR enabled, app.exe's base address is unknown. It may resolve some function call addresses when it is loaded but an attacker knows neither where the function is nor where the resolved variables are. The same thing happens when TargetLib.dll is loaded.
Even though app.exe has a lookup table, an attacker does not know where the table is.
Since an attacker cannot tell what is the content of specific address he must attack the application without using any fixed address information. It is usually harder if he uses usual attacking method like stack overflow, heap overflow, use-after-free...
On the other hand, if app.exe is NOT ASLR enabled, it is much easier for an attacker to exploit the application. Because there may be a function call to a interesting API at specific address in app.exe and the attacker can use the address as a target address to jump. (Attacking an application usually starts from jumping to arbitrary address.).
Supplementation:
You may already understand it but I want to make one thing clear.
When an attacker exploit an application by vulnerability like memory corruption he is usually forced to usefixed address jump instruction. They cannot use relative address jump instruction to exploit. This is the reason why ALSR is really effective to such exploits.