What is wrong with self-modifying codes with static-recompilation emulations?

What is wrong with self-modifying codes with static-recompilation emulations? - emulation

I was searching for writing an emulator, and its techniques. But following paragraph made me wondered, I think I couldn't figure out which scenario can be present, if you write a self-modifying code to be static-recompilation emulated.
In this technique, you take a program written in the emulated code and attempt to translate it into the assembly code of your computer. The result will be a usual executable file which you can run on your computer without any special tools. While static recompilation sounds very nice, it is not always possible. For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it. To avoid such situations, you may try combining static recompiler with an interpreter or a dynamic recompiler.
Here is what I was reading, and this line made me wondered:
For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it
A good explanation with examples will be so instructive, thanks.
Edit: By the way, I know the meaning of self-modifying, I just wonder what problems and where will we get problems after statically-recompilation, which thing will make our self-modifying code broken.

Self-modifying code heavily relies on the instruction set encoding of the original CPU. For example, it could flip some bits in a specific memory location to turn one instruction into another. With static recompilation, flipping those same bits will have an entirely different effect since the instructions are encoded completely differently for the host CPU.

Related

Running a brief asm script inline for dynamic analysis

Is there any good reason not to run a brief unknown (30 line) assembly script inline in a usermode c program for dynamic analysis directly on my laptop?
There's only one system call to time, and at this point I can tell that it's a function that takes a c string and it's length, and performs some sort of encryption on it in a loop, which only iterates through the string as long as the length argument tells it.
I know that the script is (supposed to be) from a piece of malicious code, but for the life of me I can't think of any way it could possibly pwn my computer barring some sort of hardware bug (which seems unlikely given that the loop is ~ 7 instructions long and the strangest instruction in the whole script is a shr).
I know it sounds bad running an unknown piece of assembly code directly on the metal, but given my analysis up to this point I can't think of any way it could bite me or escape.

Yes, you can but I won't recommend it.
The problem is not how dangerous is the code this time (assuming you really understand all of the code and you can predict the outcome of any system call), the problem is that it's a slippery slope and it's not worth it considering what's at stake.
I've done quite a few malware analysis and rarely happened that a piece of code caught me off guard but it happened.
Luckily I was working on a virtual machine within an isolated network: I just restored the last snapshot and stepped through the code more carefully.
If you do this analysis on your real machine you may take the habit and one day this will bite you back.
Working with VMs, albeit not as comfortable as using your OS native GUI, is the way to go.
What could go wrong with running a 7 lines assembly snippet?
I don't know, it really depends on the code but a few things to be careful about:
Exceptions. An instruction may intentionally fault to pass the control to an exception handler. This is why it very important that you totally understand the code: both the instruction and the data.
System calls exploits. A specially crafted input to a system call may trigger a 0-day or an unpatched vulnerability in your system. This is why is important that you can predict the outcome of every system call.
Anti debugger techniques. There are a lot of way a piece of code could escape a debugger (I'm thinking Windows debugging here), it's hard to remember them all, be suspicious of everything.
I've just named a few, it's catastrophically possible that an hardware bug could lead to privileged code execution but if that's really a possibility then nothing but a spare sacrificable machine will do.
Finally, if you are going to run the malware (because I assume the work of extracting the code and its context is too much of a burden) up to a breakpoint on your machine, think of what's at stake.
If you place the break point on the wrong spot, if the malware takes another path or if the debugger has a glitchy GUI, you may loose your data or the confidentiality of your machine.
I'n my opinion is not worth it.
I had to make this premise for generality sake but we all sin something, don't we?
I've never run a piece of malware on my machine but I've stepped through some with a virtual machine directly connected on the company network.
It was a controlled move, nothing happened, the competent personnel was advised and it was an happy ending.
This may very well be your case: it can just be a decryption algorithm and nothing more.
However, only you have the final responsibility to judge if it is acceptable to run the piece of code or not.
As I remarked above, in general it is not a good idea and it presupposes that you really understand the code (something that is hard to do and be honest about).
If you think these prerequisites are all satisfied then go ahead and do it.
Before that I would:
Create an unprivileged user and deny it access to my data and common folders (ideally deny it everything but what's is necessary to make the program work).
Backup the the critical data, if any.
Optionally
Make a restore point.
Take an hash of the system folders, a list of installed services and the value of the usual startup registry keys (Sysinternals have a tool to enum them all).
After the analysis, you can check that nothing important system-wide has changed.
It may be helpful to subst a folder and put the malware there so that a dummy path traversal stops in that folder.
Isn't there better solution?
I like using VMs for their snapshotting capabilities, though you may stumble into an anti-VM check (but they are really dumb checks, so it's easy to skip them).
For a 7-line assembly I'd simply rewrite it as a JS function and run it directly in a browser console.
You can simply transform each register in a variable and transcript the code, you don't need to understand it globally but only locally (i.e. each instruction).
JS is handy if you don't have to work with 64-bit quantities because you have an interpreter in front of you right now :)
Alternatively I use any programming language I have at hand (One time even assembly it self, it seems paradoxical but due to a nasty trick I had to convert a 64-bit piece of code to a 32-bit one and patch the malware with it).

You can use Unicorn to easily emulate a CPU (if the architecture is supported) and play with your shellcode without any risk.

Removing assembly instructions from ELF

I am working on an obfuscated binary as a part of a crackme challenge. It has got a sequence of push, pop and nop instructions (which repeats for thousands of times). Functionally, these chunks do not have any effect on the program. But, they make generation of CFGs and the process of reversing, very hard.
There are solutions on how to change the instructions to nop so that I can remove them. But in my case, I would like to completely strip off those instructions, so that I can get a better view of the CFG. If instructions are stripped off, I understand that the memory offsets must be modified too. As far as I could see, there were no tools available to achieve this directly.
I am using IDA Pro evaluation version. I am open to solutions using other reverse engineering frameworks too. It is preferable, if it is scriptable.
I went through a similar question but, the proposed solution is not applicable in my case.

I would like to completely strip off those instructions ... I understand that the memory offsets must be modified too ...
In general, this is practically impossible:
If the binary exports any dynamic symbols, you would have to update the .dynsym (these are probably the offsets you are thinking of).
You would have to find every statically-assigned function pointer, and update it with the new address, but there is no effective way to find such pointers.
Computed GOTOs and switch statements create function pointer tables even when none are present in the program source.
As Peter Cordes pointed out, it's possible to write programs that use delta between two assembly labels, and use such deltas (small immediate values directly encoded into instructions) to control program flow.
It's possible that your target program is free from all of the above complications, but spending much effort on a technique that only works for that one program seems wasteful.

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.

GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.

Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

Convert object file to another architecture

I am trying to use a Wifi-Dongle with a Raspberry Pi. The vendor of the dongle provides a Linux driver that I can compile successfully on the ARM-architecture, however, one object file, that comes with the driver, was precompiled for a x86-architecture, which causes the linker to fail.
I know it would be much easier to compile that (quite big) file again, but I don't have access to the source code.
Is it possible to convert that object file from a x86-architecture to an ARM-architecture?
Thank you!

Um, no, it looks to me like a waste of time. Wi-Fi driver is complex, and you say this one troublesome object file is 'large'. Lots of pain to translate, and chance of successful debug slim to none. Also, any parameter passing between this one object file and the rest of the system would not translate directly between x86 and ARM.

In theory, yes. Doing it on a real kernel driver without access to source code will be difficult.
If you had high quality dis-assembly of the object file, and the code in the object file is "well behaved" (using standard calling conventions, no self modifying code) then you could automatically translate the X86 instructions into arm instructions. However, you probably don't have high quality dis-assembly. In particular, there can be portions of the object file that you will not be able to properly classify as code or data doing normal recursive descent dis-assembly. If you misinterpret data as code, it will be translated to ARM code, rather than copied as is, and so will have the wrong values. That will likely cause the code to not work correctly.
Even if you get lucky, and can properly classify all of the addresses in the object file, there are several issues that will trip you up:
The calling conventions on X86 are different than the calling conventions on ARM. This means you will have to identify patterns related to X86 calling conventions and change them to use ARM calling conventions. This is a non trivial rewrite.
The hardware interface on ARM is different than on X86. You will have to understand how the driver works in order to translate the code. That would require either a substantial X86 hardware comparability layer, or reverse engineering of how the driver works. If you can reverse engineer the driver, then you don't need to translate it. You could just write an arm version.
The internal kernel APIS are different between ARM and X86. You will have to understand those difference and how to translate between them. That's likely non trivial.
The Linux Kernel uses an "alternatives" mechanism, which will rewrite machine code dynamically when code is first loaded into the kernel. For example, on uni-processor machines, locks are often replaced with no-ops to improve perf. Instructions like "popcnt" are replaced with function calls on machines that don't support it, etc. It's use in the Kernel is extremely common. This means there's a good chance the code in the object is file is not "well behaved", according to the definition given above. You would have to either verify that the object file doesn't use that mechanism, or find a way to translate uses of it.
X86 uses a different memory model than ARM does. To "safely" translate X86 code to ARM (without introducing race conditions) you would have to introduce memory fences after every memory access. That would result in REALLY BAD performance on an ARM chip. Figuring out when you need to introduce memory fences (without doing it everywhere) is an EXTREMELY hard problem. The most successful attempts at that sort of analysis require custom type systems, which you won't have in the object file.
Your best bet (quickest route to success) would be to try and reverse engineer what the object file in question does, and then just replace it.

There is no reasonable way of doing this. Contact the manufacturer and ask if they can provide the relevant code in ARM code, as x86 is useless to you. If they are not able to do that, you'll have to find a different supplier of either the hardware [that has an ARM version, or fully open source, of all the components], or supplier of the software [assuming there is another source of that].

You could translate the x86 assembly manually by installing x86 GNU binutils and disassemble
the object file with objdump. Probably some addresses will differ but should be straight forward.

Yes, you could most definitely do a static binary translation. x86 disassembly is painful though, if this was compiled from high level then it isnt as bad as it could be.
Is it really worth the effort? Might try an instruction set simulator instead. Have you done an analysis of the number of instructions used? System calls required, etc?
How far have you gotten so far on the disassembly?

Maybe the file only contains a binary dump of the wifi firmware? If so you need no instruction translation and a conversion can be done using objcopy.
You can you use objdump -x file.o and look if any real executable code is inside the obj-file or if it's only data.

If you have access to IDA with Hex-Rays decompiler, you can (with some work) decompile the object file into C code and then try to recompile it for ARM.

How to modify an ELF file in a way that changes data length of parts of the file?

I'm trying to modify the executable contents of my own ELF files to see if this is possible. I have written a program that reads and parses ELF files, searches for the code that it should update, changes it, then writes it back after updating the sh_size field in the section header.
However, this doesn't work. If I simply exchange some bytes, with other bytes, it works. However, if I change the size, it fails. I'm aware of that some sh_offsets are immediately adjacent to each other; however this shouldn't matter when I'm reducing the size of the executable code.
Of course, there might be a bug in my program (or more than one), but I've already painstakingly gone through it.
Instead of asking for help with debugging my program I'm just wondering, is there anything else than the sh_size field I need to update in order to make this work (when reducing the size)? Is there anything that would make changing the length fail other than that field?
Edit:
It seems that Andy Ross was perfectly correct. Even in this very simple program I have come across some indirect addressing in __libc_start_main that I cannot trivially modify to update the offset it will reach.
I was curious though, what would be the best approach to still trying to get as far as possible with this problem? I know I cannot solve this in every case, but for some simple programs, it should be possible to update what is required to make it run? Should I try writing my own virtual machine or try developing a "debugger" that would replace each suspected problem instruction with INT 3? Any ideas?

The text segment is likely internally linked with relative offsets. So one function might be trying to jump to, say, "current address plus 194 bytes". If you move things around such that the jump target is now 190 bytes, you will obviously break things. The same is true of constant data on some architectures (e.g. x86-64 but not i686). There is no simple way short of a complete disassembly to know where the internal references are, and in fact it's computationally undecidable to find them all (i.e. trying to figure out all possible jump targets of a runtime-computed branch is the Halting Problem).
Basically, this isn't solvable in the general case, so if you have an ELF binary from someone else you're trying to patch, you'll need to try other techniques. But with (great!) care it's possible to produce a library where all internal references go through the GOT/PLT which can be sliced up and relinked like this. What are you trying to accomplish?

is there anything else than the sh_size field I need to update in order to make this work
It sounds like you are patching a fully-linked binary (ET_EXEC or ET_DYN). Please note that .sh_size is not used for anything after the static link is done. You can strip the entire section table, and the binary will continue to work fine. What matters at runtime are the segments in the ELF, not sections.
ELF stands for executable and linking format, and the executable and linking form "dual nature" of the ELF. Sections are used at (static) link time to combine into segments; which are used at execution time (aka runtime, aka dynamic linking time).
Of course you haven't told us exactly what your patching strategy is when you are shrinking your binary, and in what way the result is broken. It is very likely that Andy Ross's answer is the real cause of your breakage.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string