Removing assembly instructions from ELF - linux

I am working on an obfuscated binary as a part of a crackme challenge. It has got a sequence of push, pop and nop instructions (which repeats for thousands of times). Functionally, these chunks do not have any effect on the program. But, they make generation of CFGs and the process of reversing, very hard.
There are solutions on how to change the instructions to nop so that I can remove them. But in my case, I would like to completely strip off those instructions, so that I can get a better view of the CFG. If instructions are stripped off, I understand that the memory offsets must be modified too. As far as I could see, there were no tools available to achieve this directly.
I am using IDA Pro evaluation version. I am open to solutions using other reverse engineering frameworks too. It is preferable, if it is scriptable.
I went through a similar question but, the proposed solution is not applicable in my case.

I would like to completely strip off those instructions ... I understand that the memory offsets must be modified too ...
In general, this is practically impossible:
If the binary exports any dynamic symbols, you would have to update the .dynsym (these are probably the offsets you are thinking of).
You would have to find every statically-assigned function pointer, and update it with the new address, but there is no effective way to find such pointers.
Computed GOTOs and switch statements create function pointer tables even when none are present in the program source.
As Peter Cordes pointed out, it's possible to write programs that use delta between two assembly labels, and use such deltas (small immediate values directly encoded into instructions) to control program flow.
It's possible that your target program is free from all of the above complications, but spending much effort on a technique that only works for that one program seems wasteful.

Related

What is wrong with self-modifying codes with static-recompilation emulations?

I was searching for writing an emulator, and its techniques. But following paragraph made me wondered, I think I couldn't figure out which scenario can be present, if you write a self-modifying code to be static-recompilation emulated.
In this technique, you take a program written in the emulated code and attempt to translate it into the assembly code of your computer. The result will be a usual executable file which you can run on your computer without any special tools. While static recompilation sounds very nice, it is not always possible. For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it. To avoid such situations, you may try combining static recompiler with an interpreter or a dynamic recompiler.
Here is what I was reading, and this line made me wondered:
For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it
A good explanation with examples will be so instructive, thanks.
Edit: By the way, I know the meaning of self-modifying, I just wonder what problems and where will we get problems after statically-recompilation, which thing will make our self-modifying code broken.
Self-modifying code heavily relies on the instruction set encoding of the original CPU. For example, it could flip some bits in a specific memory location to turn one instruction into another. With static recompilation, flipping those same bits will have an entirely different effect since the instructions are encoded completely differently for the host CPU.

Is rbp/ebp(x86-64) register still used in conventional way?

I have been writing a small kernel lately based on x86-64 architecture. When taking care of some user space code, I realized I am virtually not using rbp. I then looked up at some other things and found out compilers are getting smarter these days and they really dont use rbp anymore. (I could be wrong here.)
I was wondering if conventional use of rbp/epb is not required anymore in many instances or am I wrong here. If that kind of usage is not required then can it be used like a general purpose register?
Thanks
It is only needed if you have variable-length arrays in your stack frames (recording the array length would require more memory and more computations). It is no longer needed for unwinding because there is now metadata for that.
It is still useful if you are hand-writing entire assembly functions, but who does that? Assembly should only be used as glue to jump into a C (or whatever) function.

Copying garbage collection of generated code

Copying (generational) garbage collection offers the best performance of any form of automatic memory management, but requires pointers to relocated chunks of data be fixed up. This is enabled, in languages which support this memory management technique, by disallowing pointer arithmetic and making sure all pointers are to the beginning of identifiable objects.
If you're generating code at run time with a JIT compiler, things look a bit trickier because return addresses on the call stack will point to, not the beginning of code blocks, but random locations within them, so fixup is a problem.
How is this typically solved?
Quite often, you don't relocate code. This is both because it is indeed complicated to fix the stack and other addresses (think jumps across code fragments), and because you don't actually need garbage collection for such code (as it is only manipulated by code you write anyway, so you can do manual memory management). You also don't expect to create a whole lot of machine code (compared to application objects), so fragmentation etc. is not a concern.
If you insist on moving machine code and fixing up the stack, there is a way, I think: Similar to Mark-Compact, build a "break table" (I have no idea where this name comes from; "relocation table" might be clearer) that tells you the amount by which you should adjust pointers to moved objects. Now, walk the stack for return addresses (highly platform-specific, of course) and fix them if they refer to relocated code. Instead of looking for exact matches, search for the highest address lower than the return address you're currently replacing. You can check that this address indeed refers to some machine code that moved by looking at the object size (you have a pointer to the start of the object, after all). This approach isn't feasible for all objects, for various reasons.
There are other reasons to do something similar though. Some JIT compilers feature on-stack replacement, which means creating a new version (e.g. more optimized, or less optimized) of some machine code and replacing all occurrences of the old version with it. This is far more complicated than just fixing the return addresses though. You have to ensure the new version logically continue where the old one was left hanging. I am not familiar with how this is implemented, so I will not go into detail.

How to modify an ELF file in a way that changes data length of parts of the file?

I'm trying to modify the executable contents of my own ELF files to see if this is possible. I have written a program that reads and parses ELF files, searches for the code that it should update, changes it, then writes it back after updating the sh_size field in the section header.
However, this doesn't work. If I simply exchange some bytes, with other bytes, it works. However, if I change the size, it fails. I'm aware of that some sh_offsets are immediately adjacent to each other; however this shouldn't matter when I'm reducing the size of the executable code.
Of course, there might be a bug in my program (or more than one), but I've already painstakingly gone through it.
Instead of asking for help with debugging my program I'm just wondering, is there anything else than the sh_size field I need to update in order to make this work (when reducing the size)? Is there anything that would make changing the length fail other than that field?
Edit:
It seems that Andy Ross was perfectly correct. Even in this very simple program I have come across some indirect addressing in __libc_start_main that I cannot trivially modify to update the offset it will reach.
I was curious though, what would be the best approach to still trying to get as far as possible with this problem? I know I cannot solve this in every case, but for some simple programs, it should be possible to update what is required to make it run? Should I try writing my own virtual machine or try developing a "debugger" that would replace each suspected problem instruction with INT 3? Any ideas?
The text segment is likely internally linked with relative offsets. So one function might be trying to jump to, say, "current address plus 194 bytes". If you move things around such that the jump target is now 190 bytes, you will obviously break things. The same is true of constant data on some architectures (e.g. x86-64 but not i686). There is no simple way short of a complete disassembly to know where the internal references are, and in fact it's computationally undecidable to find them all (i.e. trying to figure out all possible jump targets of a runtime-computed branch is the Halting Problem).
Basically, this isn't solvable in the general case, so if you have an ELF binary from someone else you're trying to patch, you'll need to try other techniques. But with (great!) care it's possible to produce a library where all internal references go through the GOT/PLT which can be sliced up and relinked like this. What are you trying to accomplish?
is there anything else than the sh_size field I need to update in order to make this work
It sounds like you are patching a fully-linked binary (ET_EXEC or ET_DYN). Please note that .sh_size is not used for anything after the static link is done. You can strip the entire section table, and the binary will continue to work fine. What matters at runtime are the segments in the ELF, not sections.
ELF stands for executable and linking format, and the executable and linking form "dual nature" of the ELF. Sections are used at (static) link time to combine into segments; which are used at execution time (aka runtime, aka dynamic linking time).
Of course you haven't told us exactly what your patching strategy is when you are shrinking your binary, and in what way the result is broken. It is very likely that Andy Ross's answer is the real cause of your breakage.

Verifying two different build architectures (one a re-write of the other) are functionally equivalent

I'm re-writing a build that produces a number of things (shared/static libraries, jars, executables, etc). The question came up whether there's a way to verify that the results are functionally equivalent without doing a full top-to-bottom test of the resulting software.
However, that is proving to be more difficult to do than I anticipated.
As an example, I expected that the md5 of two objects produced from the same source (sun studio C++ compiler) and command-line parameters would have the same md5 hash, but that isn't the case. I can build the file, rename it, build again, and they have different hashes.
With that said ... is there a way do a quick check to verify that two files produced from separate build architectures of the same source tree (eg, two shared objects) are functionally equivalent?
edit I am sorry, I neglected to mention this is for a debug build ... when debugging flags aren't used the binaries are identical, but they've been using debugging flags by default for so many years their stuff breaks when you remove the debugging flags (part of the reason I'm re-writing the build is to take that particular 'feature' out of the build so we can get some proper testing going)
Windows DLLs have a link timestamp (TimeDateStamp) as part of PE image.
Looking at linker options, I don't see an option to suppress that. So re-linking a DLL (or an EXE) will always produce a different binary.
You could write a tool to zero out these timestamps (always at a fixed offset from file start), and compare MD5s afterwards. But you'll likely discover lots of other differences as well. In particular, any program that uses __DATE__ or __TIME__ builtins will give you trouble.
We've had to work quite hard to achieve bit-identical rebuilds (using GNU toolchain). It's possible (at least for open-source tools, on Linux), but not easy (as you've discovered).
I forgot about this question; I'm revisiting so I can give the answer I came up with.
objcopy can be used to produce a new binary file in different formats. It's been a few years since I worked on this, so the specifics escape me, but here's what I recall:
objcopy can strip various things out (debug info, symbol information, etc), but even after stripping stuff out I was still seeing different hashes between objects.
In the end I found I could convert it from ELF to other formats. I ended up dumping it to another format (I think I chose SREC) that consistently provided the same MD5 for objects built at different times with identical source/flags.
I'm betting I could have done this a better way with objcopy (or perhaps another binutils tool), but it was good enough to satisfy our concerns.

Resources