How to add/remove x86 instruction in linux executables without spoiling the alignment - linux

I'm new to binary and assembly, and I'm curious about how to directly edit binary executables. I tried to remove an instruction from a binary file (according to disassembled instructions provided by objdump), but after doing that the "executable" seems no longer in an executable format (segmentation fault when running; gdb cannot recognize). I heard that this is due to instruction alignment issue. (Is it?)
So, is it possible to add/remove single x86 instructions directly in linux executables? If so, how? Thanks in advance.

If you remove a chunk of binary file without adjusting file headers accordingly, it will become invalid.
Fortunately, you can replace instructions with NOP without actually removing them. File size remains the same, and if there is no checksum or signature (or if it's not actually checked), there is nothing more to do.
There is no universal way to insert the instructions, but generally you overwrite the original code with a JMP to another location, where you reproduce what the original code did, do your own things as you wanted, then JMP back. Finding room for your new code might be impossible without changing the size of the binary, so I would instead patch the code after executable is loaded (perhaps using a special LD_PRELOADed library).

Yes. Just replace it with a NOP instruction (0x90) - or multiple ones if the instruction spans across multiple bytes. This is an old trick.

Related

Can we convert elf from a cpu architecture to another, in linux? [duplicate]

How I can run x86 binaries (for example .exe file) on arm?As I see on Wikipedia,I need to convert binary data for the emulated platform into binary data suitable for execution on the targeted platform.but question is:How I can do it?I need to open file in hex editor and change?Or something else?
To successfully do this, you'd have to do two things.. one relatively easy, one very hard. Neither of which you want to do by hand in a hex editor.
Convert the machine code from x86 to ARM. This is the easy one, because you should be able to map each x86 opcode to one or more ARM opcodes. There are different ways to do this, some more efficient than others, but it can be done with a pretty straightforward mapping.
Remap function calls (and other jumps). This one is hard, because monkeying with the opcodes is going to change all the offsets for the jump and return points. If you have dynamically linked libraries (.so), and we assume that all the libraries are available at exactly the same version in both places (a sketchy assumption at best), you'd have to remap the loads.
It's essentially a machine->machine compiler and linker.
So, can you do it? Sure.
Is it easy? No.
There may be a commercial tool out there, but I'm not aware of it.
You can not do this with a binary;note1 here binary means an object with no symbol information like an elf file. Even with an elf file, this is difficult to impossible. The issue is determining code from data. If you resolve this issue, then you can make de-compilers and other tools.
Even if you haven an elf file, a compiler will insert constants used in the code in the text segment. You have to look at many op-codes and do a reverse basic block to figure out where a function starts and ends.
A better mechanism is to emulate the x86 on the ARM. Here, you can use JIT technology to do the translation as encountered, but you approximately double code space. Also, the code will execute horribly. The ARM has 16 registers and the x86 is register starved (usually it has hidden registers). A compilers big job is to allocate these registers. QEMU is one technology that does this. I am unsure if it goes in the x86 to ARM direction; and it will have a tough job as noted.
Note1: The x86 has an asymmetric op-code sizing. In order to recognize a function prologue and epilogue, you would have to scan an image multiple times. To do this, I think the problem would be something like O(n!) where n is the bytes of the image, and then you might have trouble with in-line assembler and library routines coded in assembler. It maybe possible, but it is extremely hard.
To run an ARM executable on an X86 machine all you need is qemu-user.
Example:
you have busybox compiled for AARCH64 architecture (ARM64) and you want to run it on an X86_64 linux system:
Assuming a static compile, this runs arm64 code on x86 system:
$ qemu-aarch64-static ./busybox
And this runs X86 code on ARM system:
$ qemu-x86_64-static ./busybox
What I am curioous is if there is a way to embed both in a single program.
read x86 binary file as utf-8,then copy from ELF to last character�.Then go to arm binary and delete as you copy with x86.Then copy x86 in clip-board to the head.i tried and it's working.

How do different commands get executed in CPU x86-64 registers?

Years ago a teacher once said to class that 'everything that gets parsed through the CPU can also be exploited'.
Back then I didn't know too much about the topic, but now the statement is nagging on me and I
lack the correct vocabulary to find an answer to this question in the internet myself, so I kindly ask you for help.
We had the lesson about 'cat', 'grep' and 'less' and she said that in the worst case even those commands can cause harm if we parse the wrong content through it.
I don't really understand how she meant that. I do know how CPU registers work, we also had to write an educational buffer overflow so I have seen assembly code in the registers aswell.
I still don't get the following:
How do commands get executed in the CPU at all? e.g. I use 'cat' so somehwere there will be a call of the command. But how does the data I enter get parsed to the CPU? If I 'cat' a .txt file which contains 'hello world' - can I find that string in HEX somewhere in the CPU registers? And if yes:
How does the CPU know that said string is NOT to be executed?
Could you think of any scencario where the above commands could get exploited? Afaik only text gets parsed through it, how could that be exploitable? What do I have to be careful about?
Thanks alot!
Machine code executes by being fetched by the instruction-fetch part of the CPU, at the address pointed to by RIP, the instruction-pointer. CPUs can only execute machine code from memory.
General-purpose registers get loaded with data from data load/store instructions, like mov eax, [rdi]. Having data in registers is totally unrelated to having it execute as machine code. Remember that RIP is a pointer, not actual machine-code bytes. (RIP can be set with jump instructions, including indirect jump to copy a GP register into it, or ret to pop the stack into it).
It would help to learn some basics of assembly language, because you seem to be missing some key concepts there. It's kind of hard to answer the security part of this question when the entire premise seems to be built on some misunderstanding of how computers work. (Which I don't think I can easily clear up here without writing a book on assembly language.) All I can really do is point you at CPU-architecture stuff that answers part of the title question of how instructions get executed. (Not from registers).
Related:
How does a computer distinguish between Data and Instructions?
How instructions are differentiated from data?
Modern Microprocessors
A 90-Minute Guide! covers the basic fetch/decode/execute cycle of simple pipelines. Modern CPUs might have more complex internals, but from a correctness / security POV are equivalent. (Except for exploits like Spectre and Meltdown that depend on speculative execution).
https://www.realworldtech.com/sandy-bridge/3/ is a deep-dive on Intel's Sandybridge microarchitecture. That page covering instruction-fetch shows how things really work under the hood in real CPUs. (AMD Zen is fairly similar.)
You keep using the word "parse", but I think you just mean "pass". You don't "parse content through" something, but you can "pass content through". Anyway no, cat usually doesn't involve copying or looking-at data in user-space, unless you run cat -n to add line numbers.
See Race condition when piping through x86-64 assembly program for an x86-64 Linux asm implementation of plain cat using read and write system calls. Nothing in it is data-dependent, except for the command-line arg. The data being copied is never loaded into CPU registers in user-space.
Inside the kernel, copy_to_user inside Linux's implementation of a read() system call on x86-64 will normally use rep movsb for the copy, not a loop with separate load/store, so even in kernel the data gets copied from the page-cache, pipe buffer, or whatever, to user-space without actually being in a register. (Same for write copying it to whatever stdout is connected to.)
Other commands, like less and grep, would load data into registers, but that doesn't directly introduce any risk of it being executed as code.
Most of the things have already been answered by Peter. However i would like to add a few things.
How do commands get executed in the CPU at all? e.g. I use 'cat' so somehwere there will be a call of the command. But how does the data I enter get parsed to the CPU? If I 'cat' a .txt file which contains 'hello world' - can I find that string in HEX somewhere in the CPU registers?
cat is not directly executed by the CPU cat.c. You could check the source code and get and in-depth view. .
What actually happens is that each instruction is converted to assembly instruction and they get executed by the CPU. The instructions are not vulnerable because what they do is just move some data and switch some bits. Most of the vulnerability are due to memory management and cat has been vulnerable in the past Check this for more detail
How does the CPU know that said string is NOT to be executed?
It does not. Its the job of the operating system to tell what is to be executed and what not.
Could you think of any scencario where the above commands could get exploited? Afaik only text gets parsed through it, how could that be exploitable? What do I have to be careful about?
You have to be careful about how you are passing the text file to the memory. You could even make your own interpreter that would execute txt file and then the interpreter will be telling the CPU about how to execute that instruction.

Using ndisasm in files of different architectures

I would like to use ndisasm for a huge number of files of different architectures (x86 or x64). I do not know if -b16 would gave me correct outputs for all the files or if I have to specify the correct option for each file, like -b32 or -b64. What I am running right know from the command line:
for file in *; do ndisasm -b16 -07c00h -a -s7c3eh "$file" > "/my-path/$file"; done
I'd recommend not using ndisasm unless you really do have flat binaries. It treats the whole file, including metadata, as instructions.
x86 machine code is variable-length, and needs to be decoded from the correct starting address to be "in sync". e.g. if the last couple bytes of metadata decode as the start of a long instruction, that's how ndisasm will decode them. This will consume the first few bytes of what was supposed to be the first instruction(s) of machine code in the object or executable file. After that, the current position may be in the middle of another instruction.
Decoding will often get back into sync fairly quickly and line up with how the instructions will actually execute, but if you're going to run a big batch disassembly you might as well use tools that will do it correctly.
Both of the following disassemblers understand object-file formats and selected a mode based on the file type. (e.g. x86-64 mode for x86-64 ELF or PE-COFF objects / executables).
objdump -drwC -Mintel (from GNU binutils) makes pretty nice output, but it uses GNU .intel_syntax noprefix which is MASM-like. (See the intel-syntax tag wiki for more about MASM-style vs. NASM-style).
Agner Fog's objconv disassembler is quite good, and can disassemble into NASM / YASM syntax, or MASM, or AT&T. Example of using it. The output has all extra info as comments, so you can feed it to an assembler and get a binary similar to what you started with, including different sections.
(But special encodings aren't preserved, e.g. the .plt normally uses push imm32 for padding even with small immediates, but you will get the push imm8 form when NASM assembles push 0x1, because objconv doesn't disassemble it to push strict dword 0x1.) Still, it's very good most of the time, and even puts labels on branch targets so you can easily find the tops of loops.
If some but not all of your binaries are flat, maybe use file to find the ones that aren't and feed them to objconv. For the flat binaries, you'll probably have to try disassembling multiple ways and use human judgement to decide whether the code looks "sane" or not.
One major sign of 32-bit code being disassembled as 16 is when the end of a 32-bit immediate or addressing-mode displacement gets decoded as the start of a new instruction. Often this is an add instruction (opcode 00).
For 64 vs. 32-bit code, one big difference is REX prefixes vs. single-byte dec / inc instructions. If you see weird dec / inc instructions in 32-bit disassembly, it's probably actually 64-bit machine-code. If you see weird REX prefixes (especially when the disassembler says rex add eax, ecx or something to show you there's a useless REX prefix), it was probably a separate inc instruction in 32-bit machine code.

How to inform GCC to not use a particular register

Assume I have a very big source code and intend to make the rdx register totally unused during the execution, i.e., while generating the assembly code, all I want is to inform my compiler (GCC) that it should not use rdx at all.
NOTE: register rdx is just an example. I am OK with any available Intel x86 register.
I am even happy to update the source code of the compiler and use my custom GCC. But which changes to the source code are needed?
You tell GCC not to allocate a register via the -ffixed-reg option (gcc docs).
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).
reg must be the name of a register. The register names accepted are machine-specific and are defined in the REGISTER_NAMES macro in the machine description macro file.
For example, gcc -ffixed-r13 will make gcc leave it alone entirely. Using registers that are part of the calling convention, or required for certain instructions, may be problematic.
You can put some global variable to this register.
For ARM CPU you can do it this way:
register volatile type *global_ptr asm ("r8")
This instruction uses general purpose register "r8" to hold
the value of global_ptr pointer.
See the source in U-Boot for real-life example:
http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/include/asm/global_data.h;h=4e3ea55e290a19c766017b59241615f7723531d5;hb=HEAD#l83
File arch/arm/include/asm/global_data.h (line ~83).
#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
I don't know whether there is a simple mechanism to tell that to gcc at run time. I would assume that you must recompile. From what I read I understand that there are description files for the different CPUs, e.g. this file, but what exactly needs to be changed in order to prevent gcc from using the register, and what potential side effects such a change could have, is beyond me.
I would ask on the gcc mailing list for assistence. Chances are that the modification is not so difficult per se, except that building gcc isn't trivial in my experience. In your case, if I analyze the situation correctly, a caveat applies. You are essentially cross-compiling, i.e building for a different architecture. In particular I understand that you have to build your system and other libraries which your program uses because their code would normally use that register. If you intend to link dynamically you probably would also have to build your own ld.so (the dynamic loader) because starting a dynamically linked executable actually starts that loader which would use that register. (Therefore maybe linking statically is better.)
Consider the divq instruction - the dividend is represented by [rdx][rax], and, assuming the divisor (D) satisfies rdx < D, the quotient is stored in %rax and remainder in %rdx. There are no alternative registers that can be used here.
The same applies with the mul/mulq instructions, where the product is stored in [rdx][rax] - even the recent mulx instruction, while more flexible, still uses %rdx as a source register. (If memory serves)
More importantly, %rdx is used to pass parameters in the x86-64 ELF ABI. You could never call C library functions (or any other ELF library for that matter) - even kernel syscalls use %rdx to pass parameters - though the register use is not the same.
I'm not clear on your motivation - but the fact is, you won't be able to do anything practical on any x86[-64] platform (let alone an ELF/Linux platform) - at least in user-space.

Is a core dump executable by itself?

The Wikipedia page on Core dump says
In Unix-like systems, core dumps generally use the standard executable
image-format:
a.out in older versions of Unix,
ELF in modern Linux, System V, Solaris, and BSD systems,
Mach-O in OS X, etc.
Does this mean a core dump is executable by itself? If not, why not?
Edit: Since #WumpusQ.Wumbley mentions a coredump_filter in a comment, perhaps the above question should be: can a core dump be produced such that it is executable by itself?
In older unix variants it was the default to include the text as well as data in the core dump but it was also given in the a.out format and not ELF. Today's default behavior (in Linux for sure, not 100% sure about BSD variants, Solaris etc.) is to have the core dump in ELF format without the text sections but that behavior can be changed.
However, a core dump cannot be executed directly in any case without some help. The reason for that is that there are two things missing from a simple core file. One is the entry point, the other is code to restore the CPU state to the state at or just before the dump occurred (by default also the text sections are missing).
In AIX there used to be a utility called undump but I have no idea what happened to it. It doesn't exist in any standard Linux distribution I know of. As mentioned above (#WumpusQ) there's also an attempt at a similar project for Linux mentioned in above comments, however this project is not complete and doesn't restore the CPU state to the original state. It is, however, still good enough in some specific debugging cases.
It is also worth mentioning that there exist other ELF formatted files that cannot be executes as well which are not core files. Such as object files (compiler output) and .so (shared object) files. Those require a linking stage before being run to resolve external addresses.
I emailed this question the creator of the undump utility for his expertise, and got the following reply:
As mentioned in some of the answers there, it is possible to include
the code sections by setting the coredump_filter, but it's not the
default for Linux (and I'm not entirely sure about BSD variants and
Solaris). If the various code sections are saved in the original
core-dump, there is really nothing missing in order to create the new
executable. It does, however, require some changes in the original
core file (such as including an entry point and pointing that entry
point to code that will restore CPU registers). If the core file is
modified in this way it will become an executable and you'll be able
to run it. Unfortunately, though, some of the states are not going to
be saved so the new executable will not be able to run directly. Open
files, sockets, pips, etc are not going to be open and may even point
to other FDs (which could cause all sorts of weird things). However,
it will most probably be enough for most debugging tasks such running
small functions from gdb (so that you don't get a "not running an
executable" stuff).
As other guys said, I don't think you can execute a core dump file without the original binary.
In case you're interested to debug the binary (and it has debugging symbols included, in other words it is not stripped) then you can run gdb binary core.
Inside gdb you can use bt command (backtrace) to get the stack trace when the application crashed.

Resources